Sequence read archive and other tools
To download and analyze SRA data, some protocol is needed. Here I describe the usual protocol to do the analysis.
(update of 21 February 2021)
If you use Linux (Ubuntu)
(WSL is also applicable)
SRA-toolkit
Use Conda
conda install -c bioconda parallel-fastq-dump
or Python pip
pip install parallel-fastq-dump
or Debian package repoditory
Open terminal
Install SRA toolkit by command:
$sudo apt-get install sra-toolkit
Input your password
Write Y when asked
Choose OK if some violet dialog box appears
SRA-toolkit is installed on your computer.
or download from the website
$wget url (from https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/)
To check whether SRA-toolkit is already installed or not:
$prefetch --help
HTTP download sometimes has trouble so FASP is better for download a high amount of data.
Download ASPERA connect
https://downloads.asperasoft.com/connect2/
Choose Linux installation
Extract file
$tar -xvzf ibm-aspera-connect-3.x.x.17xxxx-linux-g2.12–64.tar.gz
Install file
$bash ibm-aspera-connect-3.x.x.17xxxx-linux-g2.12–64.sh
add ascp to the path
open .bashrc file using nano or vim text editor
write the following at the end of the text
export PATH=/home/hendra/.aspera/connect/bin:$PATH
change hendra with your user name
Download from AWS is now available through HTTPS so using wget
is also fast.
Single-end sequencing result
fastq-dump <SRA.sra>
fastq file checked by fastqc
Paired-end sequencing result
$ fastq-dump --split-files <SRA.sra>#this will results in two file with name like:
#SRA_1.fastq and SRA_2.fastq
FASTQC
(To check fastq data quality)
Still in terminal
$ sudo apt-get install fastqc
EA-UTILS
(To trim data)
$ sudo apt-get install ea-utils
or from
$ wget https://github.com/ExpressionAnalysis/ea-utils/archive/1.04.807.tar.gz
TCR/BCR sequence extraction
MIXCR
Download mixcr (in zip file) from release page in Github: https://github.com/milaboratory/mixcr/releases
$ wget https://github.com/milaboratory/mixcr/releases/mixcr-x.x.x.zip
$ unzip mixcr-x.x.x.zip
add mixcr to the path
open .bashrc using nano or vim text editor
write the following at the end of the text
PATH=$HOME/mixcr-3.0.10:$PATH
export PATH
To check the functionality of each software, we can do :
$ mixcr — help
(use a double hyphen before help)
This can be applied to other software as well
Alignment and gene expression analysis
HISAT2