miRNA Analysis Pipeline: NGS Data Analysis for miRNA Using Snakemake

public public 1yr ago Version: 2 0 bookmarks

miRNA analysis pipeline

This Snakemake-based workflow was designed with the aim of enable reproducible analysis of miRNA NGS data.

As the QIAseq miRNA Library Kit (QIAGEN) was used in our lab, the pipeline is configured for managing UMIs present in the read as suggested by the manufacturer. Anyway, few changes would allow data anlysis from different kits.

The pipeline can be considered as a hybrid solution between common state-of-the-art literature, as reported in Potla et al. and Qiagen analysis guidelines .

Requirements

To run miRNA pipeline, Conda must be present in your computer.
To install Conda, see https://conda.io/miniconda.html

Installation

You can directly clone the repository in your working directory

git clone https://github.com/solida-core/miRNA.git

Then you need to create the base environment.

cd miRNA
conda create -q -n MYENV_NAME python=3.7
conda env update -q -n MYENV_NAME --file environment.yaml

Once the virtual environment is ready, activate it and check the correct installation of Snakmake

conda activate MYENV_NAME
snakemake --version

Run the Pipeline

To run the pipeline you need to edit manually the config.yaml file, providing paths for your references.

snakemake --configfile config.yaml --snakefile Snakefile --use-conda -d ANALYSIS_DIR

Generate Snakemake Report

When the analysis is completed, you can generate a Snakemake report with analysis overview, QC report and mmiRNA counts.

snakemake --configfile config.yaml --snakefile Snakefile --use-conda --report [--report-stylesheet path_to/custom.css] -d ANALYSIS_DIR

This produces the report.html file inside the ANALYSIS_DIR .

Pipeline Description

The workflow consists of 6 main steps:

  1. Get UMI: Qiagen UMIs are integrated in the read sequence, near the 3' adapter. To identify the 12 bases sequence of the UMI we use UMI_tools which allow to search the adapter sequence, discard it and keep the UMI sequence, includeing it in the header of the read.

  2. Trimming: TrimGalore is used for quality trimming and read length selection (default min read length is set to 16, max to 30). These values can be edited in the config.yaml file.

  3. QC: a QC report is generated with MultiQC, including information from FastQC, TrimGalore and Mirtrace.

  4. Mapping: reads are aligned against multiple databases. Only reads that do not map to a db undergo alignment against the succesive database.

  5. Deduplication and Count: UMIs are used for the deduplication of mapped reads and then a table with counts for each miRNA is generated.

  6. Discovery: WORK IN PROGRESS

Code Snippets

12
13
14
15
16
17
shell:
    "umi_tools dedup "
    "-I {input.bam} "
    "-S {output} "
    "--method=unique "
    "--log={log} "
35
36
37
shell:
    "samtools idxstats {input.bam} | cut -f 1,3 "
    "> {output.counts} "
52
53
54
55
56
57
shell:
    "htseq-count "
    "params"
    "-q {input.bam} "
    "{input.gff} "
    ">{output.counts}"
11
12
shell:
    "cp {input} {output}"
17
18
19
20
21
22
23
24
25
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log}"
42
43
44
45
46
47
48
49
50
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log}"
67
68
69
70
71
72
73
74
75
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log} "
 92
 93
 94
 95
 96
 97
 98
 99
100
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log}"
117
118
119
120
121
122
123
124
125
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log} "
142
143
144
145
146
147
148
149
150
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log} "
167
168
169
170
171
172
173
174
175
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log} "
192
193
194
195
196
197
198
199
200
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log} "
217
218
219
220
221
222
223
224
225
shell:
    "bowtie "
    "{params.params} "
    "--threads {threads} "
    "{params.basename} "
    "{input} "
    "--un {output.fastq} "
    "-S {output.sam} "
    ">& {log} "
14
15
16
17
18
19
shell:
    "fastqc "
    "{input} "
    "--outdir {params.outdir} "
    "--quiet "
    ">& {log}"
34
35
36
37
38
39
shell:
    "fastqc "
    "{input} "
    "--outdir {params.outdir} "
    "--quiet "
    ">& {log}"
54
55
56
57
58
59
shell:
    "fastqc "
    "{input} "
    "--outdir {params.outdir} "
    "--quiet "
    ">& {log}"
78
79
80
81
82
83
84
85
shell:
    "mirtrace "
    "qc "
    "{params.species} "
    "-o {params.outdir} "
    "{input} "
    "{params.params} "
    ">& {log} "
SnakeMake From line 78 of rules/qc.smk
113
114
115
116
117
118
119
120
121
122
shell:
    "multiqc "
    "{input} "
    "{params.fastqc} "
    "{params.trimming} "
    "{params.params} "
    "-o {params.outdir} "
    "-n {params.outname} "
    "--sample-names {params.reheader} "
    ">& {log}"
14
15
16
17
18
19
20
shell:
    "samtools view -b "
    "--threads {threads} "
    "-T {params.genome} "
    "-o {output.first} "
    "-O {params.output_fmt} "
    "{input.first} "
35
36
37
38
39
40
41
shell:
    "samtools view -b "
    "--threads {threads} "
    "-T {params.genome} "
    "-o {output.second} "
    "-O {params.output_fmt} "
    "{input.second} "
58
59
60
61
62
63
64
65
shell:
    "samtools sort "
    "--threads {threads} "
    "-T {params.tmp_dir} "
    "-O {params.output_fmt} "
    "--reference {params.genome} "
    "-o {output.first} "
    "{input.first} "
83
84
85
86
87
88
89
90
shell:
    "samtools sort "
    "--threads {threads} "
    "-T {params.tmp_dir} "
    "-O {params.output_fmt} "
    "--reference {params.genome} "
    "-o {output.second} "
    "{input.second} "
108
109
110
111
112
113
114
shell:
    "samtools merge "
    "--threads {threads} "
    "-O {params.output_fmt} "
    "--reference {params.genome} "
    "{output} "
    "{input.first} {input.second} "
124
125
126
shell:
    "samtools index "
    "{input}"
136
137
138
shell:
    "samtools index "
    "{input}"
20
21
22
23
24
25
26
27
28
shell:
    "umi_tools extract "
    "--stdin={input} "
    "--stdout={output} "
    "--log={log} "
    "--extract-method=regex "
    "--bc-pattern='.+(?P<discard_1>AACTGTAGGCACCATCAAT)"
    "{{s<=2}}"
    "(?P<umi_1>.{{12}})(?P<discard_2>.*)'"
36
37
shell:
    "cp {input.r1} {output.r1}"
56
57
58
59
60
61
62
63
shell:
    "mkdir -p qc/fastqc; "
    "trim_galore "
    "{params.extra} "
    "--cores {threads} "
    "-o {params.outdir} "
    "{input} "
    ">& {log}"
71
72
shell:
    "mv {input[0]} {output.r1} "
ShowHide 22 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/solida-core/miRNA
Name: mirna
Version: 2
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Other Versions:
Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...