Alternative content
#GATK is a toolkit developed by the broad institute focused primarily on variant discovery and genotyping. It is open source, hosted on github, and available under a BSD 3-clause license. First let’s download and unzip GATK from github. The creators of GATK recommend running GATK through conda which is a package, environment, and dependency management software, in essence conda basically creates a virtual environment from which to run software. The next step then is to tell conda to create a virtual environment for GATK by using the yaml file included within GATK as the instructions for creating the virtual environment. We do this with the command conda env create, we also use the -p option to specify where this environment should be stored. We will also make a symlink so the executable downloaded is available directly from our bin folder. To run GATK we must first start up the virtual environment with the command source activate, we can then run the program by providing the path to the executable. To exit the virtual environment run the command source deactivate.
# download and unzip
cd ~/workspace/bin
wget https://github.com/broadinstitute/gatk/releases/download/4.0.2.1/gatk-4.0.2.1.zip
unzip gatk-4.0.2.1.zip
# make sure ubuntu user can create their own conda environments
sudo chown -R ubuntu:ubuntu /home/ubuntu/.conda
# create conda environment for gatk
cd gatk-4.0.2.1/
conda env create -f gatkcondaenv.yml -p ~/workspace/bin/conda/gatk
# make symlink
ln -s ~/workspace/bin/gatk-4.0.2.1/gatk ~/workspace/bin/gatk
# test installation
source activate ~/workspace/bin/conda/gatk
~/workspace/bin/gatk
# to exit the virtual environment
source deactivate