Sequence read archive and other tools

hendra s ismanto
2 min readJun 6, 2020

To download and analyze SRA data, some protocol is needed. Here I describe the usual protocol to do the analysis.

(update of 21 February 2021)

If you use Linux (Ubuntu)
(WSL is also applicable)

SRA-toolkit

Use Conda

conda install -c bioconda parallel-fastq-dump

or Python pip

pip install parallel-fastq-dump

or Debian package repoditory

Open terminal
Install SRA toolkit by command:

$sudo apt-get install sra-toolkit

Input your password
Write Y when asked
Choose OK if some violet dialog box appears
SRA-toolkit is installed on your computer.

or download from the website

$wget url (from https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/)

To check whether SRA-toolkit is already installed or not:

$prefetch --help

HTTP download sometimes has trouble so FASP is better for download a high amount of data.

Download ASPERA connect
https://downloads.asperasoft.com/connect2/
Choose Linux installation

Extract file

$tar -xvzf ibm-aspera-connect-3.x.x.17xxxx-linux-g2.12–64.tar.gz

Install file

$bash ibm-aspera-connect-3.x.x.17xxxx-linux-g2.12–64.sh

add ascp to the path
open .bashrc file using nano or vim text editor
write the following at the end of the text
export PATH=/home/hendra/.aspera/connect/bin:$PATH
change hendra with your user name

Download from AWS is now available through HTTPS so using wget is also fast.

Single-end sequencing result

fastq-dump <SRA.sra>
fastq file checked by fastqc

Paired-end sequencing result

$ fastq-dump --split-files <SRA.sra>#this will results in two file with name like:
#SRA_1.fastq and SRA_2.fastq

FASTQC

(To check fastq data quality)
Still in terminal

$ sudo apt-get install fastqc

EA-UTILS

(To trim data)

$ sudo apt-get install ea-utils

or from

$ wget https://github.com/ExpressionAnalysis/ea-utils/archive/1.04.807.tar.gz

TCR/BCR sequence extraction

MIXCR

Download mixcr (in zip file) from release page in Github: https://github.com/milaboratory/mixcr/releases

$ wget https://github.com/milaboratory/mixcr/releases/mixcr-x.x.x.zip
$ unzip mixcr-x.x.x.zip

add mixcr to the path
open .bashrc using nano or vim text editor
write the following at the end of the text
PATH=$HOME/mixcr-3.0.10:$PATH
export PATH

To check the functionality of each software, we can do :
$ mixcr — help
(use a double hyphen before help)
This can be applied to other software as well

Alignment and gene expression analysis

HISAT2

--

--