Snakemake pipeline to perform a high-performance batch RNA-seq quantification
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Introduction
The Snakemake pipeline presented here allow to align a batch of RNA-seq paired-end samples accounting for genomic features.
It has been thought to be runned in a High-performance computing environment, but it could be adapted depending on the computer capability.
It allows both GENCODE and RefSeq genome reference annotations.
Sequentially it executes:
-
Cutadapt : To find and removes adapters. It uses the
cutadapt_pe
mode. -
STAR : Spliced and referenced transcripts aligner.
-
Samtools : To interact with the sequencing data.
-
featureCounts : To count reads to genomic features such as genes or exons.
QSplice could also be runned after the pipeline in order to quantify the splice junctions coverage per transcript.
Installation
Run the silent installation of Miniconda in case you don't have this software in your Linux Environment
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
Once you have installed Miniconda/Anaconda, create a Python 3.7 environment. Then, install snakemake in your conda environment:
conda install -c conda-forge mamba
mamba create -c conda-forge -c bioconda -n snakemake snakemake
If Slurm has been installed in the computing environment, install with:
sbatch -o log.txt -e err.txt -J smk-install -c 2 --mem=2G -t 200 --wrap "mamba create -c conda-forge -c bioconda -y -n snakemake snakemake"
Usage
To execute the pipeline, first the user must prepare their samples.
git clone git@gitlab.com:fpozoc/appris_rnaseq.git
cd appris_rnaseq
Then, edit config and workflow as needed:
vim config/config.yaml
sample
config.yaml
, in which user must decide:
-
Which is the appropriate reference genome for this analysis. Please, read the Heng Li reference and this Biostar answer before taking a final decision.
-
Which
cutadapt
andSTAR
parameters desire to select -
Which RNA-seq samples user wants to align. In this case we are using E-MTAB-2836 .
annotations:
GRCh38:
g34:
url: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.primary_assembly.annotation.gtf.gz # GENCODE 34 GRCh38 gtf annotation file
enabled: True # Enabled to be runned
rs109:
url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GRCh38_major_release_seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_full_analysis_set.refseq_annotation.gtf.gz
enabled: True
GRCh37:
g19:
url: XXXX # GENCODE 19 GRCh37 gtf annotation file
enabled: False
genomes:
GRCh38:
url: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.28_GRCh38.p13/GRCh38_major_release_seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz
GRCh37:
url: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz
params:
cutadapt_pe: "option1"
STAR: "option2"
samples:
E-MTAB-XXXX: # proyect
ERRXXXXX: # sample
"1": # replicate
r1: ERR315325_1.fastq.gz # FastQ read 1
r2: ERR315325_2.fastq.gz # FastQ read 2
Finally, execute workflow, deploy software dependencies via conda:
snakemake -n --use-conda
If Slurm has been installed in the computing environment:
snakemake -n --use-conda --profile slurm
Directory structure
├── config
│ └── config.yaml
├── envs
│ └── environment.yaml
├── log
├── out
├── README.md
├── seq
└── Snakefile
Author information and license
Fernando Pozo ( @fpozoca – fpozoc@cnio.es) and Tomás Di Domenico.
Project initially forked from here .
Release History
- 1.0.0.
Contributing
-
Fork it ( https://gitlab.com/fpozoc/appris_rnaseq.git )
-
Create your feature branch (
git checkout -b feature/fooBar
) -
Commit your changes (
git commit -am 'Add some fooBar'
) -
Push to the branch (
git push origin feature/fooBar
) -
Create a new Pull Request
Code Snippets
36 37 38 | shell:""" curl {params.url} | gunzip > {output} """ |
49 50 51 52 | shell:""" curl {params.url} | gunzip > {output.main} grep -P '\tCDS\t' {output.main} > {output.cds} """ |
63 64 65 66 | shell:""" samtools faidx {input.fa} cut -f1,2 {input.fa}.fai > {output} """ |
79 80 | shell: "cat {input.r} > {output.merged} 2> {log.stderr}" |
104 105 | shell: "cutadapt -j {threads} {config[params][cutadapt_pe]} -o {output.r1} -p {output.r2} {input.r1} {input.r2} > {log.stdout} 2> {log.stderr}" |
150 151 152 153 | shell: """ STAR --sjdbGTFfile {input.annfile} --genomeDir {params.genome_idx} --outFileNamePrefix {params.prefix}/ --readFilesIn {input.r1} {input.r2} --runThreadN {threads} {config[params][STAR]} > {log.stdout} 2> {log.stderr} """ |
168 169 | wrapper: "0.30.0/bio/samtools/sort" |
178 179 | wrapper: "0.30.0/bio/samtools/index" |
195 196 197 | shell:""" featureCounts -O -M --fraction -T {threads} -t CDS -g gene_id -a {input.ann} -o {output} {input.bam} > {log.stdout} 2> {log.stderr} """ |
215 216 217 | shell:""" rsem-prepare-reference --gtf {input.gtf} --num-threads {threads} {input.fasta} {params.genome_dir} > {log.stdout} 2> {log.stderr} """ |
236 237 238 | shell:""" rsem-calculate-expression --num-threads {threads} --bam {input.bam} --no-bam-output --paired-end {params.genome_idx} {params.prefix} > {log.stdout} 2> {log.stderr} """ |
1 2 3 4 5 6 7 8 9 10 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" from snakemake.shell import shell shell("samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]}") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "koester@jimmy.harvard.edu" __license__ = "MIT" import os from snakemake.shell import shell prefix = os.path.splitext(snakemake.output[0])[0] shell( "samtools sort {snakemake.params} -@ {snakemake.threads} -o {snakemake.output[0]} " "-T {prefix} {snakemake.input[0]}") |
Support
- Future updates
Related Workflows





