RNA-seq Preprocessing Pipeline with Snakemake for Differential Expression Analysis

public public 1yr ago 0 bookmarks

Snakemake_RNAseq_Preprocess_pipeline

Basic snakemake workflow for preprocessing the bulk RNAseq data from fastq files to counts table which can be used as an input for Differential expression analysis. The workflow can be executed just by installing snakemake and conda , required dependencies gets automatically installed after executing snakemake. This workflow can be implemented easily just by cloning the repository and making the necessary changes to the config file without changing anything in the Snakefile.

rulegraph

Tools:

fatsqc: fastqc

Trim_adapters: fastp

Alignment: STAR

Quantification: Subread/featureCounts

Quality_control: MultiQC

Code Snippets

21
22
23
24
25
shell:
    """
    mkdir -p {params.outfolder}
    fastqc {input.fastq1} {input.fastq2} -o {params.outfolder}
    """    
42
43
44
45
shell: 
    """
    fastp -i {input.fastq1} -I {input.fastq2} -o {output.trimmed_fastq_R1} -O {output.trimmed_fastq_R2} -h trimmed_fastq/{wildcards.fastq}_fastp.html -j trimmed_fastq/{wildcards.fastq}_fastp.json
    """
61
62
63
64
65
66
67
68
69
shell: 
    """
    STAR --runMode alignReads \
    --genomeDir {params.genome_dir} \
    --readFilesIn {input.trimmed_fastq_R1} {input.trimmed_fastq_R2} \
    --readFilesCommand zcat \
    --outSAMtype BAM SortedByCoordinate \
    --outFileNamePrefix {params.prefix}
    """
83
84
85
86
87
88
shell:
    """
    featureCounts -a {params.GTF} \
    -o {output.counts} \
     {input.bam_files}
    """
 97
 98
 99
100
101
102
103
104
105
106
107
shell:
    """
    # removing 2nd column to 6th column
    awk '{{$2=$3=$4=$5=$6=""; print $0}}' {input.counts} > {output.cleaned_counts}
    # removing the first line
    sed -i '1d' {output.cleaned_counts}
    # replacing the alignedbam in column name
    sed -i 's/_Aligned.sortedByCoord.out.bam//g' {output.cleaned_counts}
    # replacing bam/ in the column name
    sed -i 's+bam/++g' {output.cleaned_counts}
    """
SnakeMake From line 97 of main/Snakefile
125
126
127
128
129
shell:
    """
    multiqc \
        --dirs {params.fastqc_dir} {params.fastp_dir} {params.bam_dir} {params.subreads_dir}
    """
ShowHide 1 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/ChandanPavuluri/Snakemake_RNAseq_Preprocess_pipeline
Name: snakemake_rnaseq_preprocess_pipeline
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...