rMATs and IsoformSwitchAnalyeR workflow for multi-group multi-contrasts scenarios

public public 1yr ago Version: v0.1.1 0 bookmarks

ASE (alternate splicing events) are identified and quantified using the ASCENT ( A lternate S pli C ing E ve N t T ools) pipeline. This workflow can be used for multi-group multi-contrasts scenarios.

ASCENT currently runs:

NOTE : More ASE tools will be added to ASCENT in the future as and when needed. Please contact Vishal Koparde to make requests to add specific ASE tool(s).

Usage:

 % bash /data/Ziegelbauer_lab/Pipelines/CCBR_ASE/v0.1.1/run
Pipeline Dir: /gpfs/gsfs12/users/Ziegelbauer_lab/Pipelines/CCBR_ASE/v0.1.1
Git Commit/Tag: a41b33007b6063b1d95744808f527a126fb1f0dc	v0.1.1
/gpfs/gsfs12/users/Ziegelbauer_lab/Pipelines/CCBR_ASE/v0.1.1/run
--> run CCBR Alternate Splicing Events Pipeline
USAGE:
 bash /data/Ziegelbauer_lab/Pipelines/CCBR_ASE/v0.1.1/run -m/--runmode=<RUNMODE> -w/--workdir=<WORKDIR>
Required Arguments:
1. RUNMODE: [Type: String] Valid options:
 *) init : initialize workdir
 *) run : run with slurm
 *) reset : DELETE workdir dir and re-init it
 *) dryrun : dry run snakemake to generate DAG
 *) unlock : unlock workdir if locked by snakemake
 *) runlocal : run without submitting to sbatch
2. WORKDIR: [Type: String]: Absolute or relative path to the output folder with write permissions.

DISCLAIMER :

IsoformSwitchAnalyzeR fails if replicates are absent.

References:

Code Snippets

22
23
24
25
26
27
28
29
30
31
32
33
34
    shell:"""
set -exuf -o pipefail
if [ ! -d {params.parentdir} ];then
    mkdir -p {params.parentdir}
fi
for i in {input}; do
    bn=$(basename $i)
    sn=$(echo $bn|awk -F".RSEM" '{{print $1}}')
    mkdir -p {params.parentdir}/${{sn}}
    ln $i {params.parentdir}/${{sn}}/${{bn}}
done
touch {output.dummy}
"""
SnakeMake From line 22 of rules/isa.smk
50
51
52
53
54
55
56
57
58
59
60
61
    shell:"""
set -exuf -o pipefail
c1=$(echo "{params.contrast}"|awk -F"_vs_" '{{print $1}}')
c2=$(echo "{params.contrast}"|awk -F"_vs_" '{{print $2}}')
Rscript {params.rscript} \
 -p {params.parentdir} \
 -g {params.gtf} \
 -s {params.samplestsv} \
 -c $c1 \
 -d $c2 \
 -o {params.outdir}
"""
SnakeMake From line 50 of rules/isa.smk
74
75
76
77
    shell:"""
set -exuf -o pipefail
Rscript {params.rscript} -d {params.resultsdir}
"""
SnakeMake From line 74 of rules/isa.smk
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
    shell:"""
set -euf -o pipefail
outdir=$(dirname {output.summary})
cp "{input.b1}" $outdir/
cp "{input.b2}" $outdir/
python ${{RMATS_SRC}}/rmats.py \
    --b1 "{input.b1}" \
    --b2 "{input.b2}" \
    --od "$outdir" \
    --gtf "{params.gtf}" \
    --readLength {params.rl} \
    --bi "{params.starindexdir}" \
    --nthread {threads} \
    -t "paired" \
    --novelSS \
    --libType "fr-secondstrand" \
    --tmp /lscratch/${{SLURM_JOB_ID}}/
"""
60
61
62
63
    shell:"""
set -exuf -o pipefail
Rscript {params.rscript} -d {params.resultsdir}
"""
11
12
13
14
15
16
    shell:"""
set -euf -o pipefail
cd {params.rsemindexdir}
rsem-prepare-reference -p {threads} --gtf {input.gtf} {input.reffa} ref
rsem-generate-ngvector ref.transcripts.fa ref.transcripts
"""
27
28
29
30
31
32
33
34
35
    shell:"""
set -euf -o pipefail
cd {params.rsemindexdir}
genesgtf={input.gtf}
bn=$(basename $genesgtf)
gtfToGenePred -genePredExt -geneNameAsName2 $genesgtf ${{bn}}.tmp
awk -v OFS="\\t" '{{print $2,$4,$5,$1,"0",$3,$6,$7,"0",$8,$9,$10}}' ${{bn}}.tmp > {output.bed12}
rm -f ${{bn}}.tmp
"""
13
14
15
16
17
18
19
20
21
22
    shell:"""
set -euf -o pipefail
mkdir -p {params.stardir}
STAR \
    --runThreadN {threads} \
    --runMode genomeGenerate \
    --genomeDir {params.stardir} \
    --genomeFastaFiles {input.reffa} \
    --outTmpDir /lscratch/${{SLURM_JOB_ID}}/tmp_{params.rl}
"""
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
    shell:"""
set -euf -o pipefail
overhang=$(echo {params.rl}|awk '{{print $1-1}}')
cd {params.bamdir}
STAR \
    --chimSegmentMin 2 \
    --outFilterMismatchNmax 3 \
    --alignEndsType EndToEnd \
    --runThreadN {threads} \
    --outSAMstrandField intronMotif \
    --outSAMtype BAM SortedByCoordinate \
    --alignSJDBoverhangMin 6 \
    --alignIntronMax 299999 \
    --genomeDir {params.starindexdir} \
    --sjdbGTFfile {params.gtf} \
    --outFileNamePrefix "{params.sample}." \
    --readFilesIn {input.R1} {input.R2} \
    --readFilesCommand zcat \
    --outTmpDir=/lscratch/${{SLURM_JOB_ID}}/{params.sample} \
    --sjdbOverhang $overhang \
    --quantMode GeneCounts TranscriptomeSAM
"""
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
    shell:"""
set -euf -o pipefail
cd {params.rsemdir}
# inter strandedness
samtools index -@{threads} {input.bam}
infer_experiment.py -r {input.bed12} -i {input.bam} -s 1000000 > {output.strandinfo}
# Get strandedness to calculate Forward Probability
fp=$(tail -n1 {output.strandinfo} | awk '{{if($NF > 0.75) print "0.0"; else if ($NF<0.25) print "1.0"; else print "0.5";}}')
echo "Forward Probability Passed to RSEM: $fp"
rsemindex=$(echo {input.bed12}|sed "s@.bed12@@g")
rsem-calculate-expression --no-bam-output --calc-ci --seed 12345  \
        --bam --paired-end -p {threads}  {input.tbam} $rsemindex {params.sample} --time \
        --temporary-folder /lscratch/$SLURM_JOBID --keep-intermediate-files --forward-prob=${{fp}} --estimate-rspd
mv {params.sample}.genes.results {output.gcounts}
mv {params.sample}.isoforms.results {output.tcounts}
"""    
ShowHide 5 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/CCBR/CCBR_rMATs
Name: ccbr_rmats
Version: v0.1.1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...