Your domain has access to two directories inside the STARInsight Vault which can be used to store notebook results files, clinical metadata, genomic data like RNAseq files, or any other kind of file. These directories are accessible from the notebook.
By default, these directories are configured to have 20 GB of storage space each. Your domain may have more or less depending on your billing plan.
/scratch is intended for the short term storage of data. Your account administrator may have configured your domain’s
/scratch directory to clear at a designated interval. For the long term storage of data, please use
Note that each notebook paragraph defaults to your domain's default working directory. This means that changing a working directory to
"/data/clinical_trials" in one paragraph will not persist automatically in new paragraphs.
Useful R Directory Commands
%r at the top of each paragraph before entering a directory command.
Provides your current working directory
Changes your current working directory. Note that the notebook will not print the changed directory, so to confirm that you have changed your directory follow this command with
getwd() as indicated below.
%r setwd(“/data”) getwd()
Lists all files in your current working directory.
%r setwd(“/data”) list.files()
Lists all sub-directories in your current working directory.
%r setwd(“/data”) list.files()
Creates a new directory at the path you specify.
Useful Python Directory Commands
%pyspark” at the top of each paragraph and import os before using these commands. A comprehensive listing of os commands can be found here; these are a few that we use commonly. Note that many of these commands require you to enter
Provides the current working directory
%pyspark import os print os.getcwd()
Changes the working directory to the path you specify.
%pyspark import os os.chdir(“/data/trial_x_metadata”)
Lists the entries under the path that you specify.
%pyspark import os print os.listdir(“/data”)
Creates a directory at the path you specify.
%pyspark import os os.mkdir(“/data/trial_x_metadata”)
Removes a directory and its subdirectories recursively. Directories must be empty for this to work.
%pyspark import os os.removedirs(“/data/trial_x_metadata”)
Removes a file at the path specified.
%pyspark import os os.remove(“/data/trial_x_metadata/dosages.csv”)
Useful Bash Commands
The STARInsight notebook is equipped with a Bash interpreter; Bash commands are probably the most straightforward way to manage files and directories that you store in the Vault. Type
%sh at the top of each paragraph to use bash commands.
Prints your current working directory.
Changes your working directory to a directory you specify.
%sh cd “/data" pwd
Creates a directory that you specify
%sh mkdir “/data/trial_x_metadata"
Lists all of the files (or folders) in a directory that you specify.
%sh ls “/data/trial_x_metadata"
An alternate approach to ls for finding all files or subdirectories in a folder. The following command would list all sub-directories in the /data folder.
%sh cd “/data" find -type d
When used in conjunction with the -r option, this will delete a directory and all its contents recursively.
%sh rm -r /data/trial_x_metadata