RNA-Seq Data Analysis Workflow using Snakemake and Salmon for Gene Expression Quantification and Differential Expression Analysis

public public 1yr ago 0 bookmarks

A simple Snakemake workflow for processing RNA-Seq using snakemake in conjunction with salmon . Data is assumed to be paired-end , and a two-treatment design is assumed, although multiple paired comparisons are also possible The steps implemented are:

  1. Prepare the reference

    • Download the fa and gtf files for the relevant build

    • Prepare the set of decoy transcripts

    • Index the reference using the set of decoy transcripts

  2. Pre-process raw fq files

    • Run QC and prepare a report on the raw fq files ( FastQC / ngsReports )

    • Identify adapters ( bbtools )

    • Trim samples ( AdapterRemoval )

  3. Pre-process raw fq files trimmed samples

    • Run QC and prepare a report on the raw fq files ( FastQC / ngsReports )
  4. Align and quantify ( salmon quant )

  5. Gene Level Differential Expression Analysis

    • Normalise using cqn

    • Compare groups using quasi-likelihood approaches and glmTreat

    • Perform an alternative analysis using limma-voom

Required Setup

Prior to executing the workflow, please ensure the following steps have been performed

  1. Place unprocessed fastq (or fastq.gz ) files in data/raw/fastq . This path is hard-wired into the workflow and cannot be changed. If samples have been run across multiple lanes, please merge prior to running this workflow as all samples are expected to be in a single pair of files.

  2. Prepare a tab-separated file, usually samples.tsv and place this in the config directory. The column name sample is mandatory, however any other columns required in the analysis can be specified in config.yml . When entering filenames in the sample column, you can leave off any suffixes (e.g. fastq.gz) as these are specified in config.yml . File names are expected to finish in _R[12].fastq.gz and only the section preceding this should be used for the samples column

  3. Edit config.yml as required. Fields cannot be changed

Snakemake implementation

The basic workflow is written snakemake and can be called using the following steps.

Firstly, set-up the required conda environments

snakemake \	--use-conda \	--conda-prefix '/home/steveped/mambaforge/envs/' \	--conda-create-envs-only \	--cores 1

Secondly, create and inspect the rulegraph

snakemake --rulegraph > workflow/rules/rulegraph.dot
dot -Tpdf workflow/rules/rulegraph.dot > workflow/rules/rulegraph.pdf

Finally, the workflow itself can be run using:

snakemake \	-p \	--use-conda \	--conda-prefix '/home/steveped/mambaforge/envs/' \	--notemp \	--keep-going \	--cores 16
  • All data (salmon quants, trimmed fastq etc) will be written to the data directory. Trimmed fastq and salmon quant aux_info files are marked as temporary and can be safely deleted after completion of the analysis

  • RMarkdown files will be added to the anaylsis directory using modules provided in the workflow

  • Compiled html files will be written to the docs folder, along with any topTable files

Note that this creates common environments able to be called by other workflows and is dependent on the user. For me, my global conda environments are stored in /home/steveped/mambaforge/envs/ . For other users, this path will need to be modified.

Code Snippets

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
shell:
    """
    Rscript --vanilla {input} &>> {log}
    if [[ {params.git} == "True" ]]; then
            TRIES={params.tries}
            while [[ -f .git/index.lock ]]
            do
                if [[ "$TRIES" == 0 ]]; then
                    echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                    exit 1
                fi
                sleep {params.interval}
                ((TRIES--))
            done
            git add {output}
    fi
    """
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
shell:
    """
    ## Make sure existing files are not overwritten,
    ## unless explicitly requested
    if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then
        cp {input.rmd} {output.rmd}
    fi

    R -e "rmarkdown::render_site('{output.rmd}')" &>> {log}
    if [[ {params.git} == "True" ]]; then
            TRIES={params.tries}
            while [[ -f .git/index.lock ]]
            do
                if [[ "$TRIES" == 0 ]]; then
                    echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                    exit 1
                fi
                sleep {params.interval}
                ((TRIES--))
            done
            git add {output}
    fi
    """
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
shell:
    """
    ## Make sure existing files are not overwritten,
    ## unless explicitly requested
    if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then
        cp {input.rmd} {output.rmd}
    fi

    R -e "rmarkdown::render_site('{output.rmd}')" &>> {log}
    if [[ {params.git} == "True" ]]; then
            TRIES={params.tries}
            while [[ -f .git/index.lock ]]
            do
                if [[ "$TRIES" == 0 ]]; then
                    echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                    exit 1
                fi
                sleep {params.interval}
                ((TRIES--))
            done
            git add {output}
    fi
    """
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
shell:
    """
    ## Make sure existing files are not overwritten,
    ## unless explicitly requested
    if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then
        cp {input.rmd} {output.rmd}
    fi

    R -e "rmarkdown::render_site('{output.rmd}')" &>> {log}
    if [[ {params.git} == "True" ]]; then
            TRIES={params.tries}
            while [[ -f .git/index.lock ]]
            do
                if [[ "$TRIES" == 0 ]]; then
                    echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                    exit 1
                fi
                sleep {params.interval}
                ((TRIES--))
            done
            git add {output}
        fi
    """
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
shell:
    """
    ## Make sure existing files are not overwritten,
    ## unless explicitly requested
    if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then
        cp {input.rmd} {output.rmd}
    fi

    R -e "rmarkdown::render_site('{output.rmd}')" &>> {log}
    if [[ {params.git} == "True" ]]; then
        TRIES={params.tries}
        while [[ -f .git/index.lock ]]
        do
            if [[ "$TRIES" == 0 ]]; then
                echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                exit 1
            fi
            sleep {params.interval}
            ((TRIES--))
        done
        git add {output}
    fi
    """
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
shell:
    """
    ## Make sure existing files are not overwritten,
    ## unless explicitly requested
    if [[ ! -f {output.rmd} || {params.overwrite} == 'True' ]]; then
        Rscript --vanilla \
            {input.script} \
            {params.comp} \
            {output.rmd} &>>{log}
    fi

    if [[ {params.git} == "True" ]]; then
            TRIES={params.tries}
            while [[ -f .git/index.lock ]]
            do
                if [[ "$TRIES" == 0 ]]; then
                    echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                    exit 1
                fi
                sleep {params.interval}
                ((TRIES--))
            done
            git add {output}
    fi
    """
SnakeMake From line 17 of rules/dge.smk
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
shell:
    """
    R -e "rmarkdown::render_site('{input.rmd}')" &>> {log}
    if [[ {params.git} == "True" ]]; then
            TRIES={params.tries}
            while [[ -f .git/index.lock ]]
            do
                if [[ "$TRIES" == 0 ]]; then
                    echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                    exit 1
                fi
                sleep {params.interval}
                ((TRIES--))
            done
            git add {output}
    fi
    """
SnakeMake From line 65 of rules/dge.smk
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
shell:
    """
    # Write to a separate temp directory for each run to avoid I/O clashes
    TEMPDIR=$(mktemp -d -t fqcXXXXXXXXXX)
    fastqc \
      {params.extra} \
      -t {threads} \
      --outdir $TEMPDIR \
      {input.fq} &> {log}
    # Move the files
    mv $TEMPDIR/*html $(dirname {output.html})
    mv $TEMPDIR/*zip $(dirname {output.zip})
    # Clean up the temp directory
    rm -rf $TEMPDIR

    ## Update the git repo if one is active
    if [[ {params.git} == "True" ]]; then
        TRIES={params.tries}
        while [[ -f .git/index.lock ]]
        do
            if [[ "$TRIES" == 0 ]]; then
                echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                exit 1
            fi
            sleep {params.interval}
            ((TRIES--))
        done
        git add {output}
    fi
    """
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
    shell:
        """
        bbmerge.sh \
            threads={threads} \
            in1={input.r1} \
            in2={input.r2} \
            outa={output.fa} 2> {log}

        if [[ {params.git} == "True" ]]; then
			TRIES={params.tries}
			while [[ -f .git/index.lock ]]
			do
				if [[ "$TRIES" == 0 ]]; then
					echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
					exit 1
				fi
				sleep {params.interval}
				((TRIES--))
			done
			git add {output}
		fi
        """
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
    shell:
        """
        AdapterRemoval \
            --adapter1 {params.adapter1} \
            --adapter2 {params.adapter2} \
            --file1 {input.r1} \
            --file2 {input.r2} \
            {params.extra} \
            --threads {threads} \
            --maxns {params.maxns} \
            --minquality {params.minqual} \
            --minlength {params.minlength} \
            --output1 {output.r1} \
            --output2 {output.r2} \
            --discarded /dev/null \
            --singleton /dev/null \
            --settings {output.log} &> {log}

        if [[ {params.git} == "True" ]]; then
			TRIES={params.tries}
			while [[ -f .git/index.lock ]]
			do
				if [[ "$TRIES" == 0 ]]; then
					echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
					exit 1
				fi
				sleep {params.interval}
				((TRIES--))
			done
			git add {output.log}
		fi
        """
44
45
46
47
48
shell:
	"""
	## Download and compress to bgzip on the fly
	curl {params.url} 2> {log} | zcat | bgzip -c > {output}
	"""
58
59
60
61
62
63
64
shell:
	"""
	samtools faidx \
	  --gzi-idx {output.gzi} \
	  --fai-idx {output.fai} \
	  {input}
	"""
76
77
78
79
80
81
shell:
	"""
	samtools faidx {input.fa} \
		1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 MT X Y | \
		gzip > {output}
	"""
90
91
92
93
shell:
	"""
	curl {params.url} 2> {log} | zcat | bgzip -c > {output}
	"""
103
104
105
106
107
108
109
shell:
	"""
	samtools faidx \
	  --gzi-idx {output.gzi} \
	  --fai-idx {output.fai} \
	  {input}
	"""
124
125
126
127
128
129
130
131
132
133
134
shell:
	"""
	zcat {input.fa} | \
		egrep "{params.regexp}" | \
		cut -f1 -d\ | \
		sed -r 's/^>//g' > {output.ids}
	samtools faidx \
	  -r {output.ids} \
	  {input.fa} | \
	  bgzip -c > {output.fa}
	"""
143
144
145
146
shell:
	"""
	curl -o {output} {params.url} 2> {log}
	"""
152
153
154
155
156
shell:
	"""
	grep "^>" <(gunzip -c {input}) | cut -d " " -f 1 > {output}
	sed -i.bak -e 's/>//g' {output}
	"""
164
165
166
167
shell:
	"""
	cat {input.trans_fa} {input.ref_fa} > {output}
	"""
177
178
179
180
181
182
183
184
shell:
	"""
	salmon index \
		-t {input.gentrome} \
		-d {input.decoys} \
		-p {threads} \
		-i {output} 2> {log}
	"""
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
shell:
    """
    salmon quant \
        -i {input.index} \
        -l {params.lib} \
        -p {threads} \
        --numBootstraps {params.nBoot} \
        --numGibbsSamples {params.nGibbs} \
        --thinningFactor {params.thin} \
        {params.extra} \
        -o {params.dir} \
        -1 {input.r1} \
        -2 {input.r2} &>> {log}

    if [[ {params.git} == "True" ]]; then
        TRIES={params.tries}
        while [[ -f .git/index.lock ]]
        do
            if [[ "$TRIES" == 0 ]]; then
                echo "ERROR: Timeout while waiting for removal of git index.lock" &>> {log}
                exit 1
            fi
            sleep {params.interval}
            ((TRIES--))
        done
        git add {output.quant} {output.json} {output.meta}
    fi
    """
ShowHide 16 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/steveped/snakemake-salmon
Name: snakemake-salmon
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...