This article provides a walkthrough of STARInsight's research workflow.
Performing statistically intensive tertiary analysis with STARInsight is straightforward once you grasp a few key terms.
- Sample – a collection of genomic information about a particular donor, patient, or organism. For example, this could refer to the variants contained in a single germline variant call file.
- Analysis Set – a group of samples which you have combined based on some common piece of metadata (continental ancestry, disease state, etc.)
- Filter Set - a group of rules defining a list of variants (e.g., only those variants found in a certain BED file).
- App – a tertiary analysis tool available in STARInsight, like Principal Component Analysis or K-means.
- Jobs – You create a job every time that you run an app on an analysis set combined with a filter set.
- Results Report – When a job finishes running, STARInsight will display a results report that contains summary information about the job (like how many samples were included in the job), visualizations, and statistical output in tabular form.
Let’s take a look at each step of the workflow.
Your account will be linked to a STARInsight domain (for example, "acme.annailabs.com"). This domain is the URL you enter to go to the STARInsight login page. You share this domain with other users in your company or research organization.
After logging in to STARInsight, head to the Search page.
The sidebar on the left side of the Search page reveals metadata about the samples available to you. Selecting a checkbox grabs all of the samples flagged with that piece of metadata (we call these "facets").
Sample metadata is organized according to project (1000 Genomes Project, International Cancer Genome Consortium, etc.) Selecting a project in the top pane of the sidebar reveals that project's metadata in the bottom pane.
Selecting facets from two separate metadata fields creates an “AND” query. For example, in the screenshot below you have selected samples which are female AND which are members of one of three populations in the 1000 Genomes Project.
You can combine samples from multiple projects. As you add samples from new projects, new rows will appear in the query bar at the top of the screen. Once you’ve selected metadata fields, select the “Save” option to create an analysis set.
STARInsight will prompt you to give your analysis set a name and a short description. Once you’ve done this, the next step will be to create a filter set.
So far you have created an analysis set. Analysis sets contain samples based on the metadata facets that you selected on the Search page. By contrast, filter sets define lists of variant positions based on rules which you configure on the Pre-Processing page.
When you launch a job (below), you will be asked to specify both an Analysis Set and a Filter Set. The data included in your job can be thought of as the union between these two orthogonal objects.
Begin defining a filter set by navigating to the Pre-Processing page.
A separate article covers the topic of Pre-Processing in detail. To get started, we'll create a filter set with a single rule that selects 55 variant positions with high degrees of variability based on continental ancestry.
Click the "Create Filter Set" button at the top left of the screen, and provide your new filter set with a name and a short description. Next, drag the BED file widget onto the "Include" drop zone and upload a BED file containing a list of these positions (here is the BED file used in this simple example).
Once you've saved your filter set (the button is in the top right hand corner of the page), you can launch an analysis.
With both an analysis set and a filter set created, head over to the Apps page.
The page contains a menu of analysis apps available on your domain. Click an app to get started. In this case, you’ll employ Principal Component Analysis.
The resulting modal will prompt you to specify a couple pieces of information. First you'll need to pick the analysis set you just created from the dropdown. As you add more analysis sets, the number of options in this dropdown will grow.
When you open the "Pre-Processing" dropdown, you will notice the name of the filter set you just created. As you add more filter sets, the number of options in this dropdown will also grow.
More information about the options on the PCA modal can be found here. Once you've made your choices, launch the analysis by selecting "Submit."
Head over to the Job Queue Dashboard to view your analysis results report.
The dashboard shows a history of jobs you have submitted, split across three tables.
- Active Jobs – Jobs which are either running or which are queued and waiting to begin running.
- Successful Jobs – Jobs which have finished running successfully, and which have results available.
- Unsuccessful Jobs - Jobs which have either failed or which you killed before they completed.
Click the note icon next to a successful job to view its results report.
The first thing you’ll see on the Results Report page is a list of Summary Information about the job. This section gives you basic information about the scale of your analysis, and reminds you about inputs you used in this job.
Scrolling down the page reveals a results plot for this job.
You can zoom in to tightly-grouped data points and pan around the plot by holding SHIFT and dragging with your cursor. Samples you select on the plot can be saved as a new analysis set with the "Save" button above and to the right of the plot. This is useful for running re-analysis on outliers in your plot.
Data points on the plot are color-coded based on project-specific metadata fields that you select below the plot. Open one of these dropdown menus and you'll see that the values match those on the sidebar of the Search page. Selecting new values in the dropdown will re-color the plot, potentially revealing interesting relationships among the samples.
The last section of the results report contains the job’s data output in tabular form. The results report will only preview a subset of the data output on this page. You can use the link at the bottom right hand corner of the report to get results in their entirety.
Your results report are available at any time from the dashboard.