miRNA Analysis Pipeline: NGS Data Analysis for miRNA Using Snakemake
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
miRNA analysis pipeline
This Snakemake-based workflow was designed with the aim of enable reproducible analysis of miRNA NGS data.
As the QIAseq miRNA Library Kit (QIAGEN) was used in our lab, the pipeline is configured for managing UMIs present in the read as suggested by the manufacturer. Anyway, few changes would allow data anlysis from different kits.
The pipeline can be considered as a hybrid solution between common state-of-the-art literature, as reported in Potla et al. and Qiagen analysis guidelines .
Requirements
To run miRNA pipeline, Conda must be present in your computer.
To install Conda, see
https://conda.io/miniconda.html
Installation
You can directly clone the repository in your working directory
git clone https://github.com/solida-core/miRNA.git
Then you need to create the base environment.
cd miRNA
conda create -q -n MYENV_NAME python=3.7
conda env update -q -n MYENV_NAME --file environment.yaml
Once the virtual environment is ready, activate it and check the correct installation of Snakmake
conda activate MYENV_NAME
snakemake --version
Run the Pipeline
To run the pipeline you need to edit manually the
config.yaml
file, providing paths for your references.
snakemake --configfile config.yaml --snakefile Snakefile --use-conda -d ANALYSIS_DIR
Generate Snakemake Report
When the analysis is completed, you can generate a Snakemake report with analysis overview, QC report and mmiRNA counts.
snakemake --configfile config.yaml --snakefile Snakefile --use-conda --report [--report-stylesheet path_to/custom.css] -d ANALYSIS_DIR
This produces the
report.html
file inside the
ANALYSIS_DIR
.
Pipeline Description
The workflow consists of 6 main steps:
-
Get UMI: Qiagen UMIs are integrated in the read sequence, near the 3' adapter. To identify the 12 bases sequence of the UMI we use UMI_tools which allow to search the adapter sequence, discard it and keep the UMI sequence, includeing it in the header of the read.
-
Trimming: TrimGalore is used for quality trimming and read length selection (default min read length is set to 16, max to 30). These values can be edited in the config.yaml file.
-
QC: a QC report is generated with MultiQC, including information from FastQC, TrimGalore and Mirtrace.
-
Mapping: reads are aligned against multiple databases. Only reads that do not map to a db undergo alignment against the succesive database.
-
Deduplication and Count: UMIs are used for the deduplication of mapped reads and then a table with counts for each miRNA is generated.
-
Discovery: WORK IN PROGRESS
Code Snippets
12 13 14 15 16 17 | shell: "umi_tools dedup " "-I {input.bam} " "-S {output} " "--method=unique " "--log={log} " |
35 36 37 | shell: "samtools idxstats {input.bam} | cut -f 1,3 " "> {output.counts} " |
52 53 54 55 56 57 | shell: "htseq-count " "params" "-q {input.bam} " "{input.gff} " ">{output.counts}" |
11 12 | shell: "cp {input} {output}" |
17 18 19 20 21 22 23 24 25 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log}" |
42 43 44 45 46 47 48 49 50 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log}" |
67 68 69 70 71 72 73 74 75 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log} " |
92 93 94 95 96 97 98 99 100 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log}" |
117 118 119 120 121 122 123 124 125 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log} " |
142 143 144 145 146 147 148 149 150 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log} " |
167 168 169 170 171 172 173 174 175 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log} " |
192 193 194 195 196 197 198 199 200 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log} " |
217 218 219 220 221 222 223 224 225 | shell: "bowtie " "{params.params} " "--threads {threads} " "{params.basename} " "{input} " "--un {output.fastq} " "-S {output.sam} " ">& {log} " |
14 15 16 17 18 19 | shell: "fastqc " "{input} " "--outdir {params.outdir} " "--quiet " ">& {log}" |
34 35 36 37 38 39 | shell: "fastqc " "{input} " "--outdir {params.outdir} " "--quiet " ">& {log}" |
54 55 56 57 58 59 | shell: "fastqc " "{input} " "--outdir {params.outdir} " "--quiet " ">& {log}" |
78 79 80 81 82 83 84 85 | shell: "mirtrace " "qc " "{params.species} " "-o {params.outdir} " "{input} " "{params.params} " ">& {log} " |
113 114 115 116 117 118 119 120 121 122 | shell: "multiqc " "{input} " "{params.fastqc} " "{params.trimming} " "{params.params} " "-o {params.outdir} " "-n {params.outname} " "--sample-names {params.reheader} " ">& {log}" |
14 15 16 17 18 19 20 | shell: "samtools view -b " "--threads {threads} " "-T {params.genome} " "-o {output.first} " "-O {params.output_fmt} " "{input.first} " |
35 36 37 38 39 40 41 | shell: "samtools view -b " "--threads {threads} " "-T {params.genome} " "-o {output.second} " "-O {params.output_fmt} " "{input.second} " |
58 59 60 61 62 63 64 65 | shell: "samtools sort " "--threads {threads} " "-T {params.tmp_dir} " "-O {params.output_fmt} " "--reference {params.genome} " "-o {output.first} " "{input.first} " |
83 84 85 86 87 88 89 90 | shell: "samtools sort " "--threads {threads} " "-T {params.tmp_dir} " "-O {params.output_fmt} " "--reference {params.genome} " "-o {output.second} " "{input.second} " |
108 109 110 111 112 113 114 | shell: "samtools merge " "--threads {threads} " "-O {params.output_fmt} " "--reference {params.genome} " "{output} " "{input.first} {input.second} " |
124 125 126 | shell: "samtools index " "{input}" |
136 137 138 | shell: "samtools index " "{input}" |
20 21 22 23 24 25 26 27 28 | shell: "umi_tools extract " "--stdin={input} " "--stdout={output} " "--log={log} " "--extract-method=regex " "--bc-pattern='.+(?P<discard_1>AACTGTAGGCACCATCAAT)" "{{s<=2}}" "(?P<umi_1>.{{12}})(?P<discard_2>.*)'" |
36 37 | shell: "cp {input.r1} {output.r1}" |
56 57 58 59 60 61 62 63 | shell: "mkdir -p qc/fastqc; " "trim_galore " "{params.extra} " "--cores {threads} " "-o {params.outdir} " "{input} " ">& {log}" |
71 72 | shell: "mv {input[0]} {output.r1} " |
Support
- Future updates
Related Workflows





