Pipeline for RNA-seq analysis

public public 1yr ago 0 bookmarks

This is a simple utility to analyse RNA-seq data with a selection of popular tools. It is written in snakemake that allows the user to scale it to any system, including clusters and cloud computing architectures. Currently, it supports analyses with the following tools;

  • (quality control for reads)

In short, given a public run ID from the SRA archives (SRRXXXXXXX), the pipeline can download the raw data from the NCBI server, convert the files to fastq format, filter the reads, generate indices, map to genome/transcript, and quantify the mapped reads. In addition, it can also carry out differential expression analysis.

However, the user can provide their own FASTA or SRA files for the analysis. In order to modify the default options, refer to the appropriate rules in the respective rules/filename.smk files. Please read wiki for more details.

cardio

Dependencies

  • python3+ (tested on Python3.7 and 3.8)

  • snakemake (workflow management)

  • pandas

  • R 4+ (for DESeq2 analysis; tested on R 4.0.2)

  • tximport , DESeq2 , AnnotationDbi , Rsubread (for R related analyses)

All the above-mentioned dependencies have been successfully tested on Ubuntu 18.04.5 , Gentoo 5.4.48, Fedora 32.

Installation

For installation purposes, it is a good idea to make a new python virtual environment and install snakemake on it, and then download all the tool binaries in your desired location. Following this, retrieve the repository using git ;

git clone https://github.com/rohitsuratekar/CardioPipeLine.git

Alternatively, download the repository in the desired location.

The only other modification that needs to be done is to edit the config/config.yaml and config/samples.csv files to specify the location of files on your system, and then run following in the repository folder

snakemake --core all

The above mentioned command will essentially carry out all the steps for you, from downloading the file to quantification, while you sit back and relax :)

Tasks

This pipeline carries out tasks. However, you can always select which tasks you want :)

Task No Details Tool Used Input # Output #, *
0 All tasks -- -- --
1 Download .sra file prefetch samples.csv srr.sra
2 Convert to fastq fasterq-dump srr.sra srr.fastq
3 rRNA filtering sortmerna srr.fastq srr.filtered.fastq
4 STAR indexing star genome.fa, anno.gtf star-idx
5 STAR mapping star star-idx, srr.filtered.fastq / srr.fastq srr.bam
6 Quantification stringtie srr.bam, anno.gtf st.tsv
7 Salmon indexing salmon anno.gtf salmon-idx
8 Salmon mapping salmon salmon-idx, srr.filtered.fastq / srr.fastq quants.sf
9 Kallisto indexing kallisto anno.gtf kallisto-idx
10 Kallisto mapping kallisto kallisto-idx, srr.filtered.fastq / srr.fastq abundance.tsv
11 Count Matrix StringTie prepDE.py st.tsv stringtie.counts
12 Count Matrix Star Rsubread ssr.bam star.counts
13 Count Matrix Salmon tximport quants.sf salmon.counts
14 Count Matrix Kallisto tximport abundance.tsv kallisto.counts
15 Differential Analysis DESeq2 *.counts con1_vs_con2.csv
16 Quality Control Report fastqc -- srr.html
17 Quality Control Report multiqc -- quality_report.html
18 de-novo Assembly trinity srr.fastq trinity.fasta
19 de-novo Alignment bowtie2 trinity.fasta stat.txt

* There might be many other related outputs. # Names are for representative purpose. Actual names will be different

Selection of tasks

You have full control of which tasks to run. Your can run desired tasks by editing tasks parameter in the config.yaml file. If you want to perform specific task, you need to provide its input file in appropriate location.

But do not worry, thanks to snakemake , if you did not provide appropriate input files, it will search far task which outputs these files and will automatically run for you.

By default tasks: 0 is set. This will run all available tasks.

Setting up the workflow

Editing config.yaml and samples.csv is probably the most important aspect of this pipeline. These are files which will change for each user according to their system and work.

config.yaml

If you are fine with default options of tools used in this workflow, you can just edit this file and run the pipeline with snakemake . Just read the comment in config/config.yaml file and you will know what to do :) Comments are self explanatory.

Config file has parameter called base which provides base folder for all the analysis. Every output file crated with this pipeline can be found in this base folder. Overall output file structure is shown below. More detailed file structure can be found here .

base
├── sra
├── fastq
├── filtered
├── bams
├── index
├── mappings
├── deseq2
└── logs

samples.csv

This file will contain names of all the samples which you want to analyse with this pipeline. You should provide SRA Runs IDs (usually in SRRXXXXXXX format ). You can find these IDs for publically available datasets from GEO Dataset website. If you have your own fastq files, you can check naming convention and rename those files and keep in appropriate folder. and start the pipeline.

This .csv (comma delimited) file SHOULD have header row and SHOULD contains at least following headers.

  • run : Run ID

  • is_paired : true (if you are handling paired end sample), false (if it is single end sample)

Example file content

run,is_paired
SRR0000001,true
SRR0000002,false

If you are performing any DESeq2 analysis, third column specifying conditions is mandatory

run,is_paired,condition
SRR0000001,true,wt
SRR0000002,false,mt

Here third column condition will be used in DESeq2 analysis. You should update appropriate parameters in config.yaml file. For example, Let us assume following sample file

run,is_paired,time
SRR0000001,true,48
SRR0000002,false,24

In above file, as third column name is time and 24 is your reference condition then you should update following in the config.yaml

deseq2:
 design_column: "time"
 reference: "24"

In addition, you should also update design parameter from config.yaml file. For example, in above situation, it can be design: "~ time"

Code Snippets

21
script: "../scripts/bam_to_counts.R"
37
38
39
40
41
42
43
44
45
46
shell:
     """
     echo "{wildcards.SRR_ID} {wildcards.BASE_FOLDER}/mappings/stringtie/{wildcards.SRR_ID}/{wildcards.SRR_ID}_assembled.gtf" > input.temp

     mv input.temp {wildcards.BASE_FOLDER}/mappings/stringtie/{wildcards.SRR_ID}/{wildcards.SRR_ID}_input.temp

     {input.binary} -i {wildcards.BASE_FOLDER}/mappings/stringtie/{wildcards.SRR_ID}/{wildcards.SRR_ID}_input.temp -g {output.gmat} -t {output.tmat}

     rm -f {wildcards.BASE_FOLDER}/mappings/stringtie/{wildcards.SRR_ID}/{wildcards.SRR_ID}_input.temp
     """
59
script: "../scripts/txi_to_counts.R"
70
script: "../scripts/txi_to_counts.R"
41
42
run:
    generate_deseq_input(wildcards)
56
script: "../scripts/deseq2.R"
21
22
23
24
shell:
     """
     {input.fastqc} -t {threads} -o {wildcards.BASE_FOLDER}/quality/{wildcards.SRR_ID} {input.f1} {input.f2}
     """
33
34
35
36
shell:
     """
     {input.fastqc} -t {threads} -o {wildcards.BASE_FOLDER}/quality/{wildcards.SRR_ID} {input.f1}
     """
47
48
49
50
shell:
     """
     {input.fastqc} -t {threads} -o {wildcards.BASE_FOLDER}/quality/{wildcards.SRR_ID} {input.f1} {input.f2}
     """
59
60
61
62
shell:
     """
     {input.fastqc} -t {threads} -o {wildcards.BASE_FOLDER}/quality/{wildcards.SRR_ID} {input.f1}
     """
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
    shell:
         "{input.kallisto} index "  # Kallisto index mode
         "-i {output} "  # Filename (not folder) for kallisto
         "{input.transcript}"  # Transcript


def make_kallisto_reads(files):
    if len(files) == 2:
        # Paired end reads
        return " ".join(files)
    else:
        # Single end, use parameters from config file
        frg_length = config["extra"]["kallisto-fragment-length"]  # Fragment
        # length
        frg_sd = config["extra"]["kallisto-standard-deviation"]  # Standard
        # deviation
        return f"--single -l {frg_length} -s {frg_sd} {files}"
13
14
15
16
17
18
shell:
     """
     multiqc {wildcards.BASE} -f -n {wildcards.BASE}/quality_report.html 

     touch {wildcards.BASE}/quality.html
     """
24
25
26
27
28
29
30
31
32
33
    shell:
         "mkdir -p {wildcards.BASE_FOLDER}/sra && "  # Make folder if not present
         "{params.prefetch} -p {wildcards.SSR_ID} "  # Prefetch main command
         "--output-file {output} "  # Write to specific file

rule get_fastq_paired:
    input: ancient("{BASE_FOLDER}/sra/{SRR_ID}.sra")
    threads: config["threads"]
    output:"{BASE_FOLDER}/fastq/{SRR_ID}.sra_1.fastq",
          "{BASE_FOLDER}/fastq/{SRR_ID}.sra_2.fastq"
37
38
39
40
41
42
43
44
shell:
     "{params.fasterq} "  # Binary
     "{input} "  # Input SRA File
     "--split-files "  # Split files according to the reads
     "--progress "  # Show progress
     "--skip-technical "  # Skip technical reads
     "--threads {threads} "  # Number of threads
     "--outdir {wildcards.BASE_FOLDER}/fastq"
SnakeMake From line 37 of rules/ncbi.smk
52
53
54
55
56
57
58
59
shell:
     "{params.fasterq} "  # Binary
     "{input} "  # Input SRA File
     "--split-files "  # Split files according to the reads
     "--progress "  # Show progress
     "--skip-technical "  # Skip technical reads
     "--threads {threads} "  # Number of threads
     "--outdir {wildcards.BASE_FOLDER}/fastq"
SnakeMake From line 52 of rules/ncbi.smk
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
    shell:
         "scripts/generateDecoyTranscriptome.sh "
         "-j {threads} "  # Number of threads
         "-a {input.gtf} "  # Genome GTF File
         "-g {input.fasta} "  # Genome in fasta
         "-t {input.transcript} "  # Transcript file in fasta
         "-o {wildcards.BASE}/index/salmon "  # Salmon index output folder
         "-b {input.bedtools} "  # Path to bedtools
         "-m {input.mashmap} "  # Path to mashmap

# Generate index for salmon
rule index_salmon:
    input:
         decoy_index=ancient("{BASE}/index/salmon/gentrome.fa"),
         decoy_keys=ancient("{BASE}/index/salmon/decoys.txt"),
         salmon=config["tools"]["salmon"]
    threads: config["threads"]
    output: "{BASE}/index/salmon/pos.bin"
40
41
42
43
44
45
46
47
48
49
50
51
    shell:
         "{input.salmon} index "  # Salmon index mode
         "-t {input.decoy_index} "  # Decoy transcript
         "-i {wildcards.BASE}/index/salmon "  # Path to index folder
         "--decoys {input.decoy_keys} "  # Decoy keys
         "-k 31 "  # K-mer (31) is recommended by the authors
         "-p {threads}"  # Number of threads


def get_salmon_reads(files):
    if isinstance(files, str):
        return f"-r {files}"
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
shell:
     """
     rm -rf {params.index_dir}/kvdb

     {params.sortmerna} {params.ref_str} {params.read_str} -workdir {params.index_dir} -fastx -other -a {threads} -num_alignments 1 -paired_in --out2 

     mkdir -p {BASE}/filtered/ 

     mkdir -p {BASE}/logs

     mv {params.index_dir}/out/aligned.log {BASE}/logs/{wildcards.SRR_ID}_filtering.log

     mv {params.index_dir}/out/other_fwd.fastq {BASE}/filtered/{wildcards.SRR_ID}.sra.filtered_1.fastq
     mv {params.index_dir}/out/other_rev.fastq {BASE}/filtered/{wildcards.SRR_ID}.sra.filtered_2.fastq

     rm  -f {params.index_dir}/out/aligned_fwd.fastq
     rm  -f {params.index_dir}/out/aligned_rev.fastq
     """
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
shell:
     """
     rm -rf {params.index_dir}/kvdb

     {params.sortmerna} {params.ref_str}  -reads {input.reads} -workdir {params.index_dir} -fastx -other -a {threads} -num_alignments 1 -paired_in 

     mkdir -p {BASE}/filtered/ 

     mkdir -p {BASE}/logs

     mv {params.index_dir}/out/aligned.log {BASE}/logs/{wildcards.SRR_ID}_filtering.log

     mv {params.index_dir}/out/other.fastq {BASE}/filtered/{wildcards.SRR_ID}.sra.filtered.fastq 

     rm  -f {params.index_dir}/out/aligned.fastq
     """
29
30
31
32
33
34
35
36
37
shell:
     """
     {input.star} --runMode genomeGenerate --runThreadN {threads} --genomeDir {params.folder} --genomeFastaFiles {input.genome} --sjdbGTFfile {input.gtf}

    if [ -f Log.out ]; then
        mkdir -p {BASE}/logs
        mv Log.out {BASE}/logs/STAR.index.log
    fi 
     """
SnakeMake From line 29 of rules/star.smk
60
61
62
63
64
65
66
shell:
     """
     {input.star} --runThreadN {threads} --genomeDir {BASE}/index/star --outFileNamePrefix {params.prefix} --outSAMtype BAM SortedByCoordinate --readFilesIn {input.files}

     rm -f {BASE}/bams/{params.prefix}.Log.progress.out

     """
SnakeMake From line 60 of rules/star.smk
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
    shell:
         """ 
         export PATH="$(dirname {input.samtools}):$PATH"
         export PATH="$(dirname {input.jellyfish}):$PATH"
         export PATH="$(dirname {input.bowtie2}):$PATH"
         export PATH="$(dirname {input.salmon}):$PATH"

         {input.trinity} --seqType fq --left {input.files[0]} --right {input.files[1]} --CPU {threads} --output {wildcards.BASE_FOLDER}/mappings/trinity/trinity_{wildcards.SRR_ID} --max_memory 50G --verbose
         """

"""
Build BowTie indexs

bowtie2-build assembled transcript output_prefix
"""
51
52
53
54
55
56
57
58
59
60
61
62
63
64
    shell:
         """
         {input.bowtie2} {input.transcripts} {input.transcripts}
         """

"""
Get statistics from paired-end
Bowtie2:
-p : number of threads
-q : input files are in fastq format
--no-unal: suppress SAM records for unaligned reads
-x: Index files prefix (without X.bt2 extension)

"""
74
75
76
77
shell:
     """
     {input.bowtie} -p {threads} -q --no-unal -x {input.transcripts} -1 {input.files[0]} -2 {input.files[1]} > {output}
     """
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
suppressPackageStartupMessages(library("data.table"))
library("Rsubread")


fc <- Rsubread::featureCounts(
  files = snakemake@input$bam,
  annot.ext = snakemake@input$gtf,
  isGTFAnnotationFile = TRUE,
  isPairedEnd = as.logical(snakemake@params$is_paired),
  nthreads = snakemake@threads)

colnames(fc$counts) <- snakemake@wildcards$SRR_ID
df <- data.frame(fc$counts)
data.table::setDT(df, keep.rownames = "gene_id")

fname <- paste(snakemake@wildcards$BASE, "deseq2", "counts", snakemake@wildcards$SRR_ID, sep = "/")
fname <- paste0(fname, "/", snakemake@wildcards$SRR_ID, ".star.counts")
write.csv(df, file = fname, row.names = FALSE)
write.csv(fc$stat, file = paste0(fname, ".stat"), row.names = FALSE)
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
start_time <- Sys.time()

suppressPackageStartupMessages(library("DESeq2"))

# We can relevel our conditions initially so that we do not have to worry
# about them later on
conds <- factor(snakemake@params$conditions)
conds <- relevel(conds, snakemake@params$reference)

colData <- data.frame(sample = snakemake@params$samples,
                      condition = conds)

df <- read.csv(snakemake@input$combined)
rownames(df) <- df$gene_id
df$gene_id <- NULL

dds <- DESeq2::DESeqDataSetFromMatrix(countData = df,
                                      colData = colData,
                                      design = formula
                                      (snakemake@params$design))


# Start the DE analysis
dds <- DESeq2::DESeq(dds)

outputs <- NULL

# Get results from each combination of conditions
for (g in unique(snakemake@params$conditions)) {
  if (g != snakemake@params$reference) {
    contrast <- c(snakemake@params$column, g, snakemake@params$reference)
    res <- results(dds, contrast = contrast)
    fn <- paste(snakemake@wildcards$BASE, "deseq2", "analysis",
                snakemake@wildcards$METHOD, snakemake@wildcards$METHOD,
                sep = "/")
    name <- paste(fn, g, "vs", snakemake@params$reference, sep = "_")
    name <- paste0(name, ".csv")
    df <- as.data.frame(res)
    data.table::setDT(df, keep.rownames = "gene_id")
    write.csv(df, file = name, row.names = FALSE)
    outputs <- c(outputs, name)
  }
}

# Generate file for snakamake by renaming the input file
summary <- paste(snakemake@wildcards$BASE, "deseq2", "analysis",
                 snakemake@wildcards$METHOD, "analysis.log",
                 sep = "/")


fileConn <- file(summary)
fc <- "Analysis \t: DESeq2"
fc <- c(fc, paste0("Start Time \t: ", start_time))
fc <- c(fc, paste0("Finish Time \t: ", Sys.time()))
fc <- c(fc, paste0("Samples \t: ", paste0(snakemake@params$samples, collapse
  = ",")))
fc <- c(fc, paste0("Conditions \t: ", paste0(snakemake@params$conditions,
                                             collapse = ",")))
fc <- c(fc, paste0("Design Formula \t: ", snakemake@params$design))
fc <- c(fc, paste0("Reference \t: ", snakemake@params$reference))
fc <- c(fc, paste0("Outputs \t: ", paste(outputs, sep = "")))
writeLines(fc, fileConn)
close(fileConn)
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
abort()
{
    echo >&2 '
***************
*** ABORTED ***
***************
'
    echo "An error occurred. Exiting..." >&2
    exit 1
}

trap 'abort' 0

set -e

###############################################
## It assumes awk, bedtools and mashmap is
## available.
## We have tested this script with
## awk 4.1.3, bedtools v2.28.0 and mashmap v2.0
## on an Ubuntu system.
###############################################

threads=1
awk="awk"
bedtools="bedtools"
mashmap="mashmap"

# Argument Parsing
print_usage_and_exit () {
    echo "Usage: $0 [-j <N> =1 default] [-b <bedtools binary path> =bedtools default] [-m <mashmap binary path> =mashmap default] -a <gtf file> -g <genome fasta> -t <txome fasta> -o <output path>"
    exit 1
}

echo "****************"
echo "*** getDecoy ***"
echo "****************"
while getopts ":a:b:o:j:h:g:t:m:" opt; do
    case $opt in
        b)
            bedtools=`realpath $OPTARG`
            echo "-b <bedtools binary> = $bedtools"
            ;;
        m)
            mashmap=`realpath $OPTARG`
            echo "-m <mashmap binary> = $mashmap"
            ;;
        a)
            gtffile=`realpath $OPTARG`
            echo "-a <Annotation GTF file> = $gtffile"
            ;;
        o)
            outfolder="$OPTARG"
            echo "-o <Output files Path> = $outfolder"
            ;;
        j)
            threads="$OPTARG"
            echo "-j <Concurrency level> = $threads"
            ;;
        g)
            genomefile=`realpath $OPTARG`
            echo "-g <Genome fasta> = $genomefile"
            ;;
        t)
            txpfile=`realpath $OPTARG`
            echo "-t <Transcriptome fasta> = $txpfile"
            ;;
        h)
            print_usage_and_exit
            ;;
        \?)
            echo "Invalid option: -$OPTARG"
            print_usage_and_exit
            ;;
        :)
            echo "Option -$OPTARG requires an argument."
            print_usage_and_exit
            ;;
    esac
done

# Required arguments
if [ -z "$gtffile" -o -z "$outfolder" -o -z "$genomefile" -o -z "$txpfile" -o -z "$mashmap" -o -z "$awk" -o -z "$bedtools" -o -z "$threads" ]
then
    echo "Error: missing required argument(s)"
    print_usage_and_exit
fi

mkdir -p $outfolder
cd $outfolder

# extracting all the exonic features to mask
echo "[1/10] Extracting exonic features from the gtf"
$awk -v OFS='\t' '{if ($3=="exon") {print $1,$4,$5}}' $gtffile > exons.bed

# masking the exonic regions from the genome
echo "[2/10] Masking the genome fasta"
$bedtools maskfasta -fi $genomefile -bed exons.bed -fo reference.masked.genome.fa

# aligning the transcriptome to the masked genome
echo "[3/10] Aligning transcriptome to genome"
$mashmap -r reference.masked.genome.fa -q $txpfile -t $threads --pi 80 -s 500

# extracting the bed files from the reported alignment
echo "[4/10] Extracting intervals from mashmap alignments"
$awk -v OFS='\t' '{print $6,$8,$9}' mashmap.out | sort -k1,1 -k2,2n - > genome_found.sorted.bed

# merging the reported intervals
echo "[5/10] Merging the intervals"
$bedtools merge -i genome_found.sorted.bed > genome_found_merged.bed

# extracting relevant sequence from the genome
echo "[6/10] Extracting sequences from the genome"
$bedtools getfasta -fi reference.masked.genome.fa -bed genome_found_merged.bed -fo genome_found.fa

# concatenating the sequence at per chromsome level to extract decoy sequences
echo "[7/10] Concatenating to get decoy sequences"
$awk '{a=$0; getline;split(a, b, ":");  r[b[1]] = r[b[1]]""$0} END { for (k in r) { print k"\n"r[k] } }' genome_found.fa > decoy.fa

# concatenating decoys to transcriptome
echo "[8/10] Making gentrome"
cat $txpfile decoy.fa > gentrome.fa

# extracting the names of the decoys
echo "[9/10] Extracting decoy sequence ids"
grep ">" decoy.fa | $awk '{print substr($1,2); }' > decoys.txt

# removing extra files
echo "[10/10] Removing temporary files"
rm exons.bed reference.masked.genome.fa mashmap.out genome_found.sorted.bed genome_found_merged.bed genome_found.fa decoy.fa reference.masked.genome.fa.fai

trap : 0
echo >&2 '
**********************************************
*** DONE Processing ...
*** You can use files `$outfolder/gentrome.fa`
*** and $outfolder/decoys.txt` with
*** `salmon index`
**********************************************
'
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
library("tximport")
suppressPackageStartupMessages(library("AnnotationDbi"))

tx <- GenomicFeatures::makeTxDbFromGFF(file = snakemake@input$gtf,
                                       format = "gtf",
                                       organism = snakemake@params$animal)

k <- AnnotationDbi::keys(tx, keytype = "GENEID")
tx_df <- AnnotationDbi::select(tx,
                               keys = k,
                               keytype = "GENEID",
                               columns = "TXNAME")
tx2gene <- tx_df[, 2:1]

txi <- tximport::tximport(snakemake@input$file,
                          type = snakemake@params$method,
                          tx2gene = tx2gene,
                          ignoreTxVersion = TRUE)

colnames(txi$counts) <- snakemake@wildcards$SRR_ID
# Tximport will generate fractions. We need to convert them to integers by
# just rounding them. This is what the original class does
# https://github.com/mikelove/DESeq2/blob/master/R/AllClasses.R
txi$counts <- round(txi$counts)
df <- data.frame(txi$counts)
data.table::setDT(df, keep.rownames = "gene_id")

fname <- paste(snakemake@wildcards$BASE, "deseq2", "counts",
               snakemake@wildcards$SRR_ID, sep = "/")
fname <- paste0(fname, "/", snakemake@wildcards$SRR_ID, ".",
                snakemake@params$method, ".counts")
write.csv(df, file = fname, row.names = FALSE)
ShowHide 20 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/rohitsuratekar/CardioPipeLine
Name: cardiopipeline
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...