Snakemake workflow of shRNA-seq and CRISPR-Cas9 genetic screen analysis using edgeR.
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Snakemake workflow of shRNA-seq and CRISPR-Cas9 genetic screen analysis using edgeR.
Table of Contents
Overview
This workflow is used to analysis shRNA-seq and CRISPR/cas9 genetic screens. The workflow is built in Snakemake , a workflow managament system. It primarily uses edgeR , a Biocondutor R package which allows for differential expression analysis of RNA-seq expression profiles. The workflow performs data processing, quality control, differential expression analysis, gene set testing and gene enrichment, with batch correction implemented.
Installation
Install snakemake using the mamba package manager:
$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
$ mamba activate snakemake
Pull the workflow to your project directory:
$ git pull https://github.com/zifornd/shrnaseq
Usage
Configure the workflow by editing the
config.yaml
file:
$ nano config/config.yaml
Run the workflow:
$ snakemake --use-conda --cores all
For further details on Snakemake, see the Snakemake documentation .
Alternatively, the Dockerfile can be used to run the workflow in a container.
Shiny application
The output of this workflow can be visualised in Shiny from RStudio, see the shRNAseq shiny documentation .
Documentation
See the Documentation file for configuration and output information.
Contributing
See CONTRIBUTING.md for ways to get started.
Please adhere to this project's code of conduct .
Authors
Acknowledgements
This workflow is based on the following research article:
Dai Z, Sheridan JM, Gearing LJ et al. edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens [version 2; peer review: 3 approved]. F1000Research 2014, 3:95 (https://doi.org/10.12688/f1000research.3928.2)
using the Snakemake management system:
Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)
License
This workflow is licensed under the
MIT
license.
Copyright © 2022, Zifo RnD Solutions
Code Snippets
13 14 | script: "../scripts/corrected_counts.R" |
28 29 | script: "../scripts/model_matrix.R" |
43 44 | script: "../scripts/contrasts_matrix.R" |
58 59 | script: "../scripts/estimateDisp.R" |
73 74 | script: "../scripts/glmFit.R" |
90 91 | script: "../scripts/glmLRT.R" |
105 106 | script: "../scripts/top_guideRNAs.R" |
123 124 | script: "../scripts/FDR_guideRNAs.R" |
18 19 | script: "../scripts/camera.R" |
36 37 | script: "../scripts/camerarank.R" |
52 53 | script: "../scripts/gene_level.R" |
70 71 | script: "../scripts/generank.R" |
19 20 | script: "../scripts/goana.R" |
34 35 | script: "../scripts/top_goana.R" |
55 56 | script: "../scripts/kegg.R" |
70 71 | script: "../scripts/top_kegg.R" |
18 19 | script: "../scripts/processAmplicons.R" |
14 15 | script: "../scripts/filter_guideRNAs.R" |
29 30 | script: "../scripts/norm.R" |
45 46 | script: "../scripts/MDS_plot.R" |
60 61 | script: "../scripts/PCA_plot.R" |
75 76 | script: "../scripts/sample_dist_heatmap.R" |
90 91 | script: "../scripts/BVC_plot.R" |
12 13 | shell: "conda create --yes --prefix {params.path} --strict-channel-priority --override-channels --channel conda-forge --channel bioconda --channel defaults bioconductor-{params.name} 1> {log.out} 2> {log.err}" |
17 18 | script: "../scripts/shiny.R" |
38 39 | script: "../scripts/shiny_contrasts.R" |
54 55 | script: "../scripts/merge_shiny.R" |
19 20 | script: "../scripts/expression_heatmap.R" |
36 37 | script: "../scripts/volcano_plot.R" |
51 52 | script: "../scripts/guideRNA_histogram.R" |
69 70 | script: "../scripts/plotSMEAR.R" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) xglm=readRDS(input$rds) png(output$plot, width=2000, height=2000, res=400) plotBCV(xglm, main="BCV Plot") dev.off() } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) matrix=readRDS(input$rds[1]) des=readRDS(input$rds[2]) xglm=readRDS(input$rds[3]) genesymbols = xglm$genes$Gene genesymbollist = list() unq = unique(genesymbols) unq = unq[!is.na(unq)] for (i in unq) { sel = genesymbols == i & !is.na(genesymbols) if (sum(sel) > 1) genesymbollist[[i]] =which(sel) } camera.res = camera(xglm, index = genesymbollist, des, contrast=matrix[, params$contrast]) camera.res=camera.res[!is.na(camera.res$FDR),] camera.res$gene=rownames(camera.res) write.table(camera.res, output$tsv, quote=F, row.names=F) saveRDS(camera.res,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(ggplot2) camera <- read.table(input$tsv, header=T) colnames(camera) <- c("nGuides", "Direction", "Pvalue", "FDR", "Gene") res <- as.data.frame(camera) res$Status <- factor("NS", levels = c("Up", "NS", "Down")) res$Status[res$Direction == "Up" & res$FDR < params$FDR] <- "Up" res$Status[res$Direction == "Down" & res$FDR < params$FDR] <- "Down" res$Pvalue <- -log10(res$Pvalue) res$Rank <- rank(res$Pvalue) col <- c( "Up" = "#FF0000", "NS" = "#B8DBCC", "Down" = "#6495ED" ) res.s <- res[res$FDR<params$FDR,] plt <- ggplot(res, aes(x = Rank, y = Pvalue, colour = Status, size = nGuides, label = Gene)) + geom_point() + geom_point(data = res.s, shape = 1, color="black") + geom_text(data = res.s, hjust = 0, nudge_x = -20, color="black", size=3, check_overlap = T) + scale_colour_manual(values = col, breaks = names(col)) + theme_classic() + labs(title= "Camera rank", x = "Rank", y = "-log10(Pvalue)", colour = "Status" ) png(output$plot, width=3000, height=3500, res=400) print(plt) dev.off() } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) des=readRDS(input$rds) conditions <- colnames(des) combinations <- expand.grid(conditions, conditions) contrasts <- apply(combinations, 1, paste, collapse = "-") matrix <- makeContrasts(contrasts = contrasts, levels = conditions) saveRDS(matrix,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(limma) library(edgeR) x=readRDS(input$rds) mat=cpm(x$counts, log=TRUE, prior.count = 1) mat=removeBatchEffect(mat, batch=factor(x$samples$batch)) saveRDS(mat, file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) x=readRDS(input$rds[1]) des=readRDS(input$rds[2]) xglm = estimateDisp(x, des) com_disp=sqrt(xglm$common.disp) print(com_disp) saveRDS(xglm,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | exp_hm <- function(x, mat, selY, hmtitle) { colnames(mat) <- paste(x$samples$group, x$samples$Replicate, sep = " - " ) mat <- subset(mat, rownames(mat) %in% selY) colors <- colorRampPalette( brewer.pal(9, "Blues") )(255) pheatmap(mat, col = colors,border_color=NA, main=hmtitle) } analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(RColorBrewer) library(pheatmap) library(edgeR) lrt=readRDS(input$rds[1]) top2 = topTags(lrt, n=Inf) selY <- selY <- rownames(top2)[c(1:20)] x=readRDS(input$rds[2]) mat <- cpm(x$counts, log=TRUE, prior.count = 1) png(output$plot[1], width=3000, height=3500, res=400) exp_hm(x, mat, selY, "Differential expression \n across the groups (logCPM)") dev.off() #batch corrected mat=readRDS(input$rds[3]) png(output$plot[2], width=3000, height=3500, res=400) exp_hm(x, mat, selY, "Batch corrected differential expression \n across the groups (logCPM)") dev.off() } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) thresh = params$threshold lrt=readRDS(input$rds) top2 = topTags(lrt, n=Inf) top2ids = top2$table[top2$table$FDR<thresh,1] write.table(top2ids, output$tsv, row.names=F, quote=F, col.names=F) saveRDS(top2ids, file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | analysis =function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) x=readRDS(input$rds) sel=rowSums(cpm(x$counts)>0.5)>=3 x=x[sel,] saveRDS(x, file = output$rds) png(output$plot,width=4000, height=2000, res=400) par(mfrow=c(2,1)) barplot(colSums(x$counts), las=2, main="Counts per index", cex.names=0.5, cex.axis=0.8) barplot(rowSums(x$counts), las=2, main="Counts per guide RNA", cex.names=0.5, cex.axis=0.8) dev.off() } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(metap) lrt=readRDS(input$rds) # gene level dat=NULL for (i in unique(lrt$genes$Gene)) { sel = lrt$genes$Gene == i & !is.na(lrt$genes$Gene) #number of guideRNAs nguideRNA=length(which(sel)) if (nguideRNA==1) { logFC=lrt$table$logFC[which(sel)] IQRFC=NA if (logFC>0) { dir="Up" } else { dir="Down" } if (logFC>0) { dir_pval="Up" } else { dir_pval="Down" } stouffers=NA } if (nguideRNA>1) { #calculate average logFC logFC=mean(lrt$table$logFC[which(sel)]) IQRFC=IQR(lrt$table$logFC[which(sel)]) if (logFC >0) { dir="Up" } else { dir="Down" } #select up regulated guideRNAs res=lrt$table[which(sel),] up=res[which(res$logFC>0),] down=res[which(res$logFC<0),] options(warn=-1) if ((min(up$Pvalue)>min(down$PValue)) | nrow(up)==0) dir_pval="Down" if ((min(up$Pvalue)<min(down$PValue)) | nrow(down)==0) dir_pval="Up" options(warn=-0) pvals=lrt$table$PValue[which(sel)] pvals=pvals[!pvals==1] if (length(pvals)>1) { stouffers=as.numeric(sumz(lrt$table$PValue[which(sel)])[2]) } else { stouffers=NA } } vector=cbind(i,nguideRNA, logFC, IQRFC, dir, dir_pval, stouffers) dat=rbind(dat, vector) } dat=cbind(dat,p.adjust(dat[,"stouffers"], method="fdr")) colnames(dat)=c("gene", "nguides", "mean.logFC", "iqr.logFC", "direction.mean.logFC", "direction.smallest.pvalue", "stouffers.pvalue", "stouffers.FDR") write.table(dat, output$tsv, quote=F, row.names=F, sep=",") saveRDS(data.frame(dat),file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(ggplot2) genelevel <- read.table(input$tsv, header=T, sep=",") colnames(genelevel) <- c("Gene", "nGuides", "Mean logFC", "IQR logFC", "Direction mean logFC", "Direction smallest Pvalue", "Stouffer's Pvalue", "Stouffer's FDR") res <- as.data.frame(genelevel) FC_index=c(paste0("> ", params$FC), paste0("< ", params$FC, " and > -", params$FC), paste0("< -", params$FC)) res$LogFC <- factor(FC_index[2], levels = FC_index) res$LogFC[res$`Mean logFC`>params$FC] <- FC_index[1] res$LogFC[res$`Mean logFC`<(-(params$FC))] <- FC_index[3] res$Rank <- rank(res$`Mean logFC`) col <- c( "#FF0000","#B8DBCC", "#6495ED") names(col)=c(FC_index) res.up <- res[order(-res$`Mean logFC`),] res.up <- res.up[res.up$`Mean logFC`>params$FC,][c(1:5),] res.down <- res[order(res$`Mean logFC`),] res.down <- res.down[res.down$`Mean logFC`<(-params$FC),][c(1:5),] plt <- ggplot(res, aes(x = Rank, y = `Mean logFC`, colour = LogFC, size = nGuides, label = Gene)) + geom_point() + geom_point(data = res.up, shape = 1, color="black") + geom_point(data = res.down, shape = 1, color="black") + geom_text(data = res.up, hjust = 0, nudge_x = -20, color="black", size=3, check_overlap = T) + geom_text(data = res.down, hjust = 0, nudge_x = -20, color="black", size=3, check_overlap = T) + theme_classic() + scale_colour_manual(values = col, breaks = names(col)) + labs(title = "Gene rank", x = "Rank", y = "Mean log fold-change", colour = "LogFC" ) png(output$plot, width=3000, height=3500, res=400) print(plt) dev.off() } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) des=readRDS(input$rds[1]) xglm=readRDS(input$rds[2]) fit = glmFit(xglm, des) saveRDS(fit,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) matrix=readRDS(input$rds[1]) fit=readRDS(input$rds[2]) lrt = glmLRT(fit, contrast=matrix[, params$contrast]) saveRDS(lrt,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(AnnotationDbi) library(params$organism, lib.loc=dirname(input$pkg), character.only = TRUE) matrix=readRDS(input$rds[1]) lrt=readRDS(input$rds[2]) org <- params$organism obj <- getFromNamespace(org, org) row.names(lrt) <- keys(obj)[1:nrow(lrt)] library(limma) go <- goana(lrt, con=matrix[, params$contrast], FDR=params$threshold, species = strsplit(params$organism, ".", fixed = TRUE)[[1]][2]) write.table(go, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA) saveRDS(go,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) lrt=readRDS(input$rds) png(output$plot, width=3500, height=2000, res=400) hist(lrt$table$PValue, breaks = 40, main = "Histogram of guide RNA P values", xlab="guide RNA p values") dev.off() } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(AnnotationDbi) library(params$organism, lib.loc=dirname(input$pkg), character.only = TRUE) matrix=readRDS(input$rds[1]) lrt=readRDS(input$rds[2]) org <- params$organism obj <- getFromNamespace(org, org) row.names(lrt) <- keys(obj)[1:nrow(lrt)] library(limma) keg = kegga(lrt, species = strsplit(params$organism, ".", fixed = TRUE)[[1]][2], coef=matrix[, params$contrast], FDR=params$threshold) write.table(keg, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA) saveRDS(keg,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | mdsplt <- function(x, mat, mdstitle) { var <- matrixStats::rowVars(mat) num <- min(500, length(var)) ind <- order(var, decreasing = TRUE)[seq_len(num)] dst <- dist(t(mat[ind, ])) mds <- cmdscale(as.matrix(dst)) dat <- data.frame( MD1 = mds[, 1], MD2 = mds[, 2], group = x$samples$group ) ggplot(dat, aes(MD1, MD2, colour = group)) + geom_point(size = 3) + labs(x = "MDS 1", y = "MDS 2", colour = "") + labs(title=mdstitle) + scale_color_brewer(palette = "Set3") + theme_classic() } analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(RColorBrewer) library(ggplot2) x=readRDS(input$rds[1]) mat <- cpm(x$counts, log = TRUE, prior.count = 1) png(output$plot[1], width=2500, height=1800, res=400) par(mar=c(5,5,3,7), xpd=TRUE) print(mdsplt(x, mat, "Multidimensional plot")) dev.off() ##batch corrected mat=readRDS(input$rds[2]) png(output$plot[2], width=2500, height=1800, res=400) par(mar=c(5,5,3,7), xpd=TRUE) print(mdsplt(x, mat, "Batch corrected multidimensional plot")) dev.off() } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script data=list() for (i in 1:length(input$rds)) { inputlist=readRDS(input$rds[i]) data=c(data, inputlist) } saveRDS(data, file = output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) x=readRDS(input$rds) x$samples$group=factor(x$samples$group, levels = c(unique(x$samples$group))) # Set group factor if ("group" %in% names(x$samples)) { group <- factor(x$samples$group) n.group <- nlevels(group) is.group <- n.group > 1 } else { is.group <- FALSE } # Set batch factor if ("batch" %in% names(x$samples)) { batch <- factor(x$samples$batch) n.batch <- nlevels(batch) is.batch <- n.batch > 1 } else { is.batch <- FALSE } # Construct design matrix if (is.group & !is.batch) { des <- model.matrix(~ 0 + group) } if (is.group & is.batch) { des <- model.matrix(~ 0 + group + batch) } # Rename group coefficients which.group <- seq_len(n.group) colnames(des)[which.group] <- levels(group) saveRDS(des,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | analysis =function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) x=readRDS(input$rds) x=calcNormFactors(x) saveRDS(x,file=output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | pcaplt <- function(x, mat, pcatitle) { var <- matrixStats::rowVars(mat) num <- min(500, length(var)) ind <- order(var, decreasing = TRUE)[seq_len(num)] pca <- prcomp(t(mat[ind, ])) pct <- (pca$sdev ^ 2) / sum(pca$sdev ^ 2) dat <- data.frame( PC1 = pca$x[,1], PC2 = pca$x[,2], group = x$samples$group) ggplot(dat, aes_string(x = "PC1", y = "PC2", color = "group")) + geom_point(size = 3) + xlab(paste0("PC1: ", round(pct[1] * 100), "% variance")) + ylab(paste0("PC2: ", round(pct[2] * 100), "% variance")) + coord_fixed() + labs(title=pcatitle, color="") + scale_color_brewer(palette="Set3") + theme_classic() } analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(matrixStats) library(ggplot2) library(RColorBrewer) x=readRDS(input$rds[1]) x$samples$group=factor(x$samples$group, levels = c(unique(x$samples$group))) mat=cpm(x$counts, log=TRUE, prior.count = 1) png(output$plot[1], width=2500, height=2000, res=400) par(mar=c(5,5,6,5)) print(pcaplt(x, mat, "Principal component analysis")) dev.off() #batch corrected mat=readRDS(input$rds[2]) png(output$plot[2], width=2500, height=2000, res=400) par(mar=c(5,5,6,5)) print(pcaplt(x, mat, "Batch corrected principal component analysis")) dev.off() } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(ggplot2) lrt=readRDS(input$rds[1]) top2 <- topTags(lrt, n=Inf) top2ids <- top2$table[(top2$table$logFC>params$FC | top2$table$logFC<(-(params$FC))),1] selY=top2$table[top2$table$FDR<params$FDR,] df <- data.frame(lrt$table) df$Guide <- rownames(df) colors <- c("FDR sig." = "red") plt=ggplot(df, aes(x=logCPM, y=logFC, text=Guide)) + geom_point(color = "#b8dbcc") + geom_point(data = df[(row.names(df) %in% top2ids),], color = "#000000") + geom_point(data = df[(df$Guide %in% selY$ID),], aes(color = "FDR sig.")) + geom_hline(yintercept=(-(params$FC)), linetype="dashed", color="#b8dbcc") + geom_hline(yintercept=0, color="cornflowerblue") + geom_hline(yintercept=(params$FC), linetype="dashed", color="#b8dbcc") + labs(title = "Mean difference plot", color="") + scale_color_manual(values = colors) + theme_classic() png(output$plot, width=2000, height=2000, res=400) print(plt) dev.off() } analysis(snakemake@input, snakemake@output,snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | analysis= function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) x = processAmplicons(c(input$fastq), readfile2=params$readfile2, barcodefile=input$index, hairpinfile=input$guideRNA, verbose=TRUE, hairpinBeforeBarcode=params$hairpinBeforeBarcode ) saveRDS(x, file = output$rds) } analysis(snakemake@input, snakemake@output,snakemake@params,snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | sd_heatmap <- function(x, mat, hmtitle) { sampleDists <- dist(t(mat)) sampleDistMatrix <- as.matrix(sampleDists) rownames(sampleDistMatrix) <- paste(x$samples$group, x$samples$Replicate, sep = " - " ) colnames(sampleDistMatrix) <- NULL colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists, clustering_distance_cols = sampleDists, col = colors, main = hmtitle) } analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(DESeq2) library(DEFormats) library(ggplot2) library(pheatmap) library(RColorBrewer) x=readRDS(input$rds[1]) mat=cpm(x$counts, log=TRUE, prior.count = 1) png(output$plot[1], width=2000, height=1500, res=400) sd_heatmap(x,mat,"Sample distances") dev.off() #batch corrected mat=readRDS(input$rds[2]) png(output$plot[2], width=2000, height=1500, res=400) sd_heatmap(x,mat,"Batch corrected sample distances") dev.off() } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script lrt=readRDS(input$rds[1]) assign(paste0(params$contrast, "_lrt"), lrt) go=readRDS(input$rds[2]) assign(paste0(params$contrast, "_go"), go) kegg=readRDS(input$rds[3]) assign(paste0(params$contrast, "_kegg"), kegg) camera=read.table(input$tsv[1], header=T) assign(paste0(params$contrast, "_camera"), camera) genelevel=read.table(input$tsv[2], header=T, sep=",") assign(paste0(params$contrast, "_genelevel"), genelevel) data=ls() data=data[! data %in% c("input", "output", "params", "log", "out", "err", "lrt", "go", "kegg", "camera", "genelevel")] data=mget(data) saveRDS(data, file = output$rds) } analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script x=readRDS(input$rds[1]) corrected=readRDS(input$rds[2]) des=readRDS(input$rds[3]) matrix=readRDS(input$rds[4]) xglm=readRDS(input$rds[5]) data=ls() data=data[! data %in% c("input", "output", "params", "log", "out", "err")] data=mget(data) saveRDS(data, file = output$rds) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(limma) go=readRDS(input$rds) topgo <- topGO(go, sort="up") write.table(go, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) lrt=readRDS(input$rds) write.table(topTags(lrt), output$tsv, row.names=F, quote=F) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | analysis=function(input, output, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(limma) keg=readRDS(input$rds) topkegg <- topKEGG(keg, sort="up") write.table(topkegg, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA) } analysis(snakemake@input, snakemake@output, snakemake@log) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | analysis=function(input, output, params, log) { #Log out <- file(log$out, open = "wt") err <- file(log$err, open = "wt") sink(out, type = "output") sink(err, type = "message") #Script library(edgeR) library(ggplot2) lrt=readRDS(input$rds) top2 <- topTags(lrt, n=Inf) selY=top2$table[top2$table$FDR<params$FDR,] df <- data.frame(lrt$table) df$Guide <- rownames(df) colors <- c("FDR sig." = "red") plt=ggplot(df, aes(x=logFC, y=-10*log10(PValue), text=Guide)) + geom_point() + geom_point() + geom_point(data = df[(df$Guide %in% selY$ID),], aes(color = "FDR sig.")) + labs(title = "Volcano plot", x="M", y = "-10*log(P-value)", color="") + scale_color_manual(values = colors) + theme_classic() png(output$plot, width=2000, height=2000, res=400) print(plt) dev.off() } analysis(snakemake@input, snakemake@output,snakemake@params, snakemake@log) |
Support
- Future updates