The objective of this document is to provide instructions for administering a GNOS repository once it has been installed. It is not intended to be an installation guide, or to provide system requirements.
Administrators regulate access to files stored in GNOS by using a system of access keys and permissions lists. Access keys are used to identify users, and permissions lists govern which studies they can download files from or upload files to. To use an analogy from the physical world, think of access keys as the keycard that you use to enter an office building. The permissions lists determine which doors you can open once you're inside.
Assigning an access key to a new user requires a special script. That topic is covered here.
GNOS’ standard configuration is to manage permissions using a special command line interface (CLI). That CLI is discussed here.
Alternately, you can manually add or remove users by editing the “localup.csv” and “localdown.csv” files stored on the GT Exec machine. The localup file contains a list of all users with upload permissions to your various studies. The localdown file contains a list of all users with download permissions to your various studies.
If you wanted to add a user with email “firstname.lastname@example.org” with download permissions to the “melanoma” study, you would add this line to the localdown.csv file:
Then you must run the eraimporter command.
sudo /opt/gtexec/plugins/sbin/eraimporter/eraimporter -i -v
A GNOS repository can be divided into two or more studies. Each study can have its own metadata validation rules and permission lists. This section will explain how to create and delete studies.
To create a new study, you will need to add the name of the study to several different configuration files.
To keep things simple, imagine that your repository only has one set of validation rules, and that you only have one study called “melanoma”. You want to add a new study called “pulmonary.”
The validator configuration file is stored on the gtexec machine. It contains references to the xml files that you use to validate metadata for new files uploaded to the repository. Using the example above, it would look like this.
[DEFAULT] plugindir=/opt/gtexec/plugins [all] priority=1 match=access_group:^(melanoma).*$,meta_type:xyzmeta path=/opt/gtexec/plugins/lib/validator/validator module=validator
To add the new “pulmonary” study, we would insert that word with an “or” pipe symbol ( | ) in this line of the configuration file, and restart the supervisord service.
Performing this action instructs GNOS to use the rules contained in the “validator.xml” file to check the metadata when users upload files to the new “pulmonary” study.
You will also need to add the name of your new study to the validator.xml file itself. Think of this XML file as a list of all the acceptable metadata selections that GNOS will check against when a user is adding a new file to the repository.
The file can be located in the “config” directory on your gtexec machine. It will have several top level sections, including “Analysis” and “Experiment.” You will need to add the name of your new ”pulmonary” study to the “pattern” field of the “refname” attribute in these two sections. Separate each value with a pipe symbol ( | ), and precede each value with the carat symbol ( ^ ). If you wanted to add a “pulmonary” study to a repository that already had a “melanoma” study, the revised entry would look like this…
<attribute xpath = “STUDY_REF”
name = “refname”
pattern = “^melanoma|^pulmonary”
optional = “no”
validate = “yes”/>
Modify your repository’s upload and download permissions CSVs to allow users to access files contained in the new study. Check out the “Managing User Permissions” section above for details.
Your repository may utilize a “sample plugin.” A sample plugin will automatically append certain metadata after a successful upload. You can configure multiple sample plugins so that files loaded to different studies receive different metadata. References to the sample plugin(s) live in the sample.cfg file. Here’s an example of a sample.cfg:
[DEFAULT] plugindir=/opt/gtexec/plugins [pancancer] priority=1 match=access_group:^melanoma.*$ path=%(plugindir)s/lib/sample/xyzcorp module=xyzcorp config=/etc/gnos.d/xyzcorp.cfg
To add the “pulmonary” study to this list, add that word to this line of the configuration file preceded by the “or” symbol ( | ), and restart the supervisord service.
Deleting a Study
Before you delete a study, make sure that you have deleted all of the files the study contains, or move them to a different study (this process is covered below). Once you delete the study, you will not be able to access any files still contained there.
To delete a study, simply remove the study’s name from each of the files listed above: validator.cfg, validator.xml, sample.cfg, and the permissions CSVs.
Managing Analyses in GNOS
Syncing Analyses Between Repositories
Syncing the contents of one repository to other repos is possible with the reposync application. The app requires you to specify a starting repository, one or more destination repositories, and a list of the analyses that you would like to sync.
For example, you could use reposync to sync all files that fit the following metadata parameters.
- State = Live
- Assay = WGS
- Assembly = GRCh37
The reposync application is a python package which you install on the gtexec machine. You can get detailed instructions on its installation and use from this helpdesk article.
Suppressed files remain in your repository, but users may not download them or mount them with GTFuse. You can suppress files by running a script on the GT Exec machine.
This helpdesk article contains the file suppression script and instructions.
Deleted files are completely removed from the underlying storage media, and cannot be recovered. You can delete files by running a script on the GT Exec machine.
This helpdesk article contains the scripts and instructions for their use.
Editing Analysis Metadata
It is possible for an administrator to edit a file’s metadata by running a script on the GT Exec machine. Generally, BAM files’ metadata is spread across three separate XML files: analysis, experiment, run. Separate scripts are provided for updating each of these files. Running these scripts will initiate the metadata validation process again.
This helpdesk article contains the scripts and instructions for their use.
Note that it is NOT possible to use these scripts to edit the file’s name or file size. It is also not possible to use these scripts to change system-administered fields like last modified data or state.
Changing an Analysis' Study
The process of changing a file’s study involves updating the file’s metadata. However, this process is a little different from using the updateAnalysis script mentioned above.
This helpdesk article contains the script for changing a file’s study.
The normal process for uploading new files to GNOS is to use the gnossubmit and gtupload commands. However, it is possible for admins to bypass this process using the admin_import script.
This option is ideal when first migrating data into GNOS, and is explained in further detail in this helpdesk article.
By default, GNOS will create a .bam index file (.bai) when you upload a new bam file. The .bai file acts like a table of contents for the sequence data contained in the bam, allowing programs like GTFuse to jump directly to certain regions in the file.
If you would like to toggle automatic bam indexing off, you can do so by adding a “#” before “augment.bam” in this line of the gnos_attr.cfg file.
#augment.bam = bamindex:bai