Ribosome Profiling Data Analysis with Snakemake: public_riboseq Workflow

public public 1yr ago 0 bookmarks

This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain. Insert your code into the respective folders, i.e. scripts , rules , and envs . Define the entry point of the workflow in the Snakefile and the main configuration in the config.yaml file.

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

  1. Create a new github repository using this workflow as a template .

  2. Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup.

Step 3: Install Snakemake

Install Snakemake using conda :

conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Execute workflow

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

Step 5: Investigate results

After successful execution, you can create a self-contained interactive HTML report with all results via:

snakemake --report report.html

This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .

Code Snippets

  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
library("IHW")
library("DESeq2")
library("ggplot2")
library("RColorBrewer")
library("pheatmap")
library("tximport")
#library("biomart")

print(snakemake@output[[1]])

#Carrega os nomes dos arquivos de expressão de cada biblioteca
tx2gene <- read.csv(snakemake@params[["transcripts_to_genes"]], header = FALSE, sep=",")
head(tx2gene)
print("\n\n")
head(snakemake@params[["samples_file"]])
print("\n\n")
samples_table <- read.csv(snakemake@params[["samples_file"]], header = TRUE, sep=",")
samples_table <- samples_table[!duplicated(samples_table$sample), ]
samples <- unique(samples_table$sample)
files <- file.path("results/kallisto",samples, "abundance.tsv")
names(files) <- samples
files
print("\n\n")
#Importa os resultados de expressão de cada biblioteca
txi.kallisto.tsv <- tximport(files, type = "kallisto",  tx2gene = tx2gene, dropInfReps = TRUE)
print(samples)
head(txi.kallisto.tsv$counts)
#colnames(txi.kallisto.tsv$counts) <- samples
dim(txi.kallisto.tsv$counts)

colDatak <-data.frame(row.names=colnames(txi.kallisto.tsv$counts), condition=as.factor(samples_table$condition), type=as.factor(samples_table$type))
colDatak

ddsk <- DESeqDataSetFromTximport(txi = txi.kallisto.tsv, colData = colDatak, design = ~ type + condition + type:condition)

#Cria objeto do DESeq2
ddsk <- DESeq(ddsk)

print("Criar gráfico do Heatmap")
#Cria gráfico do Heatmap
vsdk <- rlog(ddsk, blind=FALSE)
sampleDistsk <- dist(t(assay(vsdk)))
sampleDistMatrixk <- as.matrix(sampleDistsk)
rownames(sampleDistMatrixk) <- paste(vsdk$type, vsdk$condition, sep="-")
#colnames(sampleDistMatrixk)
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
print("Heatmap")
png("Heatmap.png")
pheatmap(sampleDistMatrixk,
         clustering_distance_rows=sampleDistsk,
         clustering_distance_cols=sampleDistsk,
         col=colors)
dev.off()

#Cria PCA
pcaDatak <- plotPCA(vsdk, intgroup=c("type", "condition"), returnData=TRUE)
percentVark <- round(100 * attr(pcaDatak, "percentVar"))
png("PCA.png")
ggplot(pcaDatak, aes(PC1, PC2, color=condition, shape=type)) +
  geom_point(size=3) +
  xlab(paste0("PC1: ",percentVark[1],"% variance")) +
  ylab(paste0("PC2: ",percentVark[2],"% variance")) +
  coord_fixed()
dev.off()
print("Results")

ddsk$group <- factor(paste0(ddsk$type, "_", ddsk$condition))
design(ddsk) <- ~ group
ddsk <- DESeq(ddsk)
print(ddsk$group)

#Comparações que serão feitas
res_trap_npc_neuron<-results(ddsk, contrast=c("group","trap_Neuron","trap_NPC"),  filterFun=ihw)
res_rna_npc_neuron<-results(ddsk, contrast=c("group","rna_Neuron","rna_NPC"),  filterFun=ihw)
res_trap_npc_rna_npc<-results(ddsk, contrast=c("group","trap_NPC","rna_NPC"),  filterFun=ihw)
res_trap_neuron_rna_neuron<-results(ddsk, contrast=c("group","trap_Neuron","rna_Neuron"),  filterFun=ihw)

design(ddsk) <- ~ type + condition + type:condition
ddsk <- DESeq(ddsk, test = "LRT", reduced = ~type + condition)
resk <- results(ddsk,  filterFun=ihw)

res_neuron <- results(ddsk, name="typetrap.conditionNPC", test="Wald")

print(resultsNames(ddsk))

#Adiciona anotação para os genes
split_gene_id <- function(word){
    strsplit(word, ".", fixed = TRUE)[[1]][1]
    #unlist(strsplit(my.string, ".", fixed = TRUE))[1]
}

gene_id = apply(as.matrix(rownames(resk)), 1, split_gene_id)

rownames(resk) <- gene_id

resk$ensembl_id <- rownames(resk)
#martk <- biomaRt::useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
#genes.tablek <- biomaRt::getBM(attributes = c("ensembl_gene_id", "external_gene_name", "description", "gene_biotype"), mart = martk)
#resk <- merge(x = as.matrix(resk), y = genes.tablek, by.x = "ensembl_id", by.y = "ensembl_gene_id", all.x = T, all.y = F )
counts_normalizedk<-counts(ddsk, normalized=TRUE)
counts_rawk<-counts(ddsk)
Tabelao<-cbind(res_trap_npc_neuron,res_rna_npc_neuron,res_trap_npc_rna_npc,res_trap_neuron_rna_neuron,resk,res_neuron,counts_rawk,counts_normalizedk)

write.table(as.data.frame(Tabelao),file=snakemake@output[[1]], sep = "\t")
35
36
wrapper:
    "v0.75.0/bio/sra-tools/fasterq-dump"
51
52
shell:
    "trim_galore {params.trim} --cores {threads} --output_dir {output} {input} 2> {log}"
66
67
68
69
70
71
72
73
74
75
shell:
    """
    set +e
    FQ1=`echo {input} | awk '{{for(X=1;X<=NF;X++){{OUT=OUT $X"/*_1_val_1.fq.gz "}}}}END{{print OUT}}'`
    FQ2=`echo {input} | awk '{{for(X=1;X<=NF;X++){{OUT=OUT $X"/*_2_val_2.fq.gz "}}}}END{{print OUT}}'`
    echo $FQ1
    echo $FQ2
    cat $FQ1 > {output.fq1} 2> {log}
    cat $FQ2 > {output.fq2} 2>> {log}
    """
94
95
96
97
98
shell:
    """
    out=`echo {output.unaligned_fq1} | sed 's/.1.fq.gz/.%.fq.gz/'`
    bowtie2 {params.bowtie2} -p {threads} --un-conc-gz $out -1 {input.fq1} -2 {input.fq2} -S {output.aligned} 2> {log}
    """
115
116
117
118
shell:
    "kallisto quant -t {threads} {params.extra} " 
    "-g {params.gtf} "
    "-i {params.index} -o {output} {input} > {log} 2>&1"
130
131
script:
    "./scripts/kallisto_deseq2.R"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

outdir = os.path.dirname(snakemake.output[0])
if outdir:
    outdir = "--outdir {}".format(outdir)

extra = snakemake.params.get("extra", "")

with tempfile.TemporaryDirectory() as tmp:
    shell(
        "fasterq-dump --temp {tmp} --threads {snakemake.threads} "
        "{extra} {outdir} {snakemake.wildcards.accession} {log}"
    )
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/osvaldoreisss/public_trapseq
Name: public_trapseq
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...