Working with the Vault

Overview

Your domain has access to two directories inside the STARInsight Vault which can be used to store notebook results files, clinical metadata, genomic data like RNAseq files, or any other kind of file. These directories are accessible from the notebook.

/scratch

/data

By default, these directories are configured to have 20 GB of storage space each. Your domain may have more or less depending on your billing plan.

/scratch is intended for the short term storage of data. Your account administrator may have configured your domain’s /scratch directory to clear at a designated interval. For the long term storage of data, please use /data.

Note that each notebook paragraph defaults to your domain's default working directory. This means that changing a working directory to "/data/clinical_trials" in one paragraph will not persist automatically in new paragraphs.

 

Useful R Directory Commands

Type %r at the top of each paragraph before entering a directory command.

getwd()

Provides your current working directory

setwd()

Changes your current working directory. Note that the notebook will not print the changed directory, so to confirm that you have changed your directory follow this command with getwd() as indicated below.

%r
setwd(“/data”)
getwd()

list.files()

Lists all files in your current working directory.

%r
setwd(“/data”)
list.files()

list.dirs()

Lists all sub-directories in your current working directory.

%r
setwd(“/data”)
list.files()

dir.create()

Creates a new directory at the path you specify.

%r
dir.create(“/data/trial_x_metadata”)

Useful Python Directory Commands

Type %pyspark” at the top of each paragraph and import os before using these commands. A comprehensive listing of os commands can be found here; these are a few that we use commonly. Note that many of these commands require you to enter print before the command itself.

getcwd()

Provides the current working directory

%pyspark
import os
print os.getcwd()

chdir()

Changes the working directory to the path you specify.

%pyspark
import os
os.chdir(“/data/trial_x_metadata”)

listdir()

Lists the entries under the path that you specify.

%pyspark
import os
print os.listdir(“/data”)

mkdir()

Creates a directory at the path you specify.

%pyspark
import os
os.mkdir(“/data/trial_x_metadata”)

removedirs()

Removes a directory and its subdirectories recursively. Directories must be empty for this to work.

%pyspark
import os
os.removedirs(“/data/trial_x_metadata”)

remove()

Removes a file at the path specified.

%pyspark
import os
os.remove(“/data/trial_x_metadata/dosages.csv”)

Useful Bash Commands

The STARInsight notebook is equipped with a Bash interpreter; Bash commands are probably the most straightforward way to manage files and directories that you store in the Vault. Type %sh at the top of each paragraph to use bash commands.

pwd

Prints your current working directory.

cd

Changes your working directory to a directory you specify.

%sh
cd “/data"
pwd

mkdir

Creates a directory that you specify

%sh
mkdir “/data/trial_x_metadata"

ls

Lists all of the files (or folders) in a directory that you specify.

%sh
ls “/data/trial_x_metadata"

find

An alternate approach to ls for finding all files or subdirectories in a folder. The following command would list all sub-directories in the /data folder.

%sh
cd “/data"
find -type d

rm

When used in conjunction with the -r option, this will delete a directory and all its contents recursively.

%sh
rm -r /data/trial_x_metadata
Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk