Snakemake workflow of shRNA-seq and CRISPR-Cas9 genetic screen analysis using edgeR.

public public 1yr ago 0 bookmarks
Loading...

Snakemake workflow of shRNA-seq and CRISPR-Cas9 genetic screen analysis using edgeR.

Table of Contents

Overview

This workflow is used to analysis shRNA-seq and CRISPR/cas9 genetic screens. The workflow is built in Snakemake , a workflow managament system. It primarily uses edgeR , a Biocondutor R package which allows for differential expression analysis of RNA-seq expression profiles. The workflow performs data processing, quality control, differential expression analysis, gene set testing and gene enrichment, with batch correction implemented.

Installation

Install snakemake using the mamba package manager:

$ mamba create -c conda-forge -c bioconda -n snakemake snakemake
$ mamba activate snakemake

Pull the workflow to your project directory:

$ git pull https://github.com/zifornd/shrnaseq

Usage

Configure the workflow by editing the config.yaml file:

$ nano config/config.yaml

Run the workflow:

$ snakemake --use-conda --cores all

For further details on Snakemake, see the Snakemake documentation .

Alternatively, the Dockerfile can be used to run the workflow in a container.

Shiny application

The output of this workflow can be visualised in Shiny from RStudio, see the shRNAseq shiny documentation .

Documentation

See the Documentation file for configuration and output information.

Contributing

See CONTRIBUTING.md for ways to get started.

Please adhere to this project's code of conduct .

Authors

Acknowledgements

This workflow is based on the following research article:

Dai Z, Sheridan JM, Gearing LJ et al. edgeR: a versatile tool for the analysis of shRNA-seq and CRISPR-Cas9 genetic screens [version 2; peer review: 3 approved]. F1000Research 2014, 3:95 (https://doi.org/10.12688/f1000research.3928.2)

using the Snakemake management system:

Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)

License

This workflow is licensed under the MIT license.
Copyright © 2022, Zifo RnD Solutions

Code Snippets

13
14
script:
    "../scripts/corrected_counts.R" 
28
29
script:
    "../scripts/model_matrix.R"
43
44
script:
    "../scripts/contrasts_matrix.R"
58
59
script:
    "../scripts/estimateDisp.R" 
73
74
script:
    "../scripts/glmFit.R" 
90
91
script:
    "../scripts/glmLRT.R" 
105
106
script:
    "../scripts/top_guideRNAs.R"
123
124
script:
    "../scripts/FDR_guideRNAs.R"
18
19
script:
    "../scripts/camera.R"
36
37
script:
    "../scripts/camerarank.R"
52
53
script:
    "../scripts/gene_level.R"
70
71
script:
    "../scripts/generank.R"
19
20
script:
    "../scripts/goana.R"
SnakeMake From line 19 of rules/ge.smk
34
35
script:
    "../scripts/top_goana.R"
SnakeMake From line 34 of rules/ge.smk
55
56
script:
    "../scripts/kegg.R" 
SnakeMake From line 55 of rules/ge.smk
70
71
script:
    "../scripts/top_kegg.R" 
SnakeMake From line 70 of rules/ge.smk
18
19
script:
    "../scripts/processAmplicons.R"
14
15
script:
    "../scripts/filter_guideRNAs.R"
SnakeMake From line 14 of rules/qc.smk
29
30
script:
    "../scripts/norm.R"
SnakeMake From line 29 of rules/qc.smk
45
46
script:
    "../scripts/MDS_plot.R"
SnakeMake From line 45 of rules/qc.smk
60
61
script:
    "../scripts/PCA_plot.R"
SnakeMake From line 60 of rules/qc.smk
75
76
script:
    "../scripts/sample_dist_heatmap.R"
SnakeMake From line 75 of rules/qc.smk
90
91
script:
    "../scripts/BVC_plot.R" 
SnakeMake From line 90 of rules/qc.smk
12
13
shell:
    "conda create --yes --prefix {params.path} --strict-channel-priority --override-channels --channel conda-forge --channel bioconda --channel defaults bioconductor-{params.name} 1> {log.out} 2> {log.err}"
17
18
script:
    "../scripts/shiny.R"
38
39
script:
    "../scripts/shiny_contrasts.R"
54
55
script:
    "../scripts/merge_shiny.R"
19
20
script:
    "../scripts/expression_heatmap.R"
36
37
script:
    "../scripts/volcano_plot.R"
51
52
script:
    "../scripts/guideRNA_histogram.R"
69
70
script:
    "../scripts/plotSMEAR.R"       
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    xglm=readRDS(input$rds)
    png(output$plot, width=2000, height=2000, res=400)
        plotBCV(xglm, main="BCV Plot")
    dev.off()

}

analysis(snakemake@input, snakemake@output, snakemake@log)
R From line 1 of scripts/BVC_plot.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    matrix=readRDS(input$rds[1])
    des=readRDS(input$rds[2])
    xglm=readRDS(input$rds[3])

    genesymbols = xglm$genes$Gene
    genesymbollist = list()
    unq = unique(genesymbols)
    unq = unq[!is.na(unq)]

    for (i in unq) {
        sel = genesymbols == i & !is.na(genesymbols)
        if (sum(sel) > 1) 
            genesymbollist[[i]] =which(sel)
    }
    camera.res = camera(xglm, index = genesymbollist, des, contrast=matrix[, params$contrast])
    camera.res=camera.res[!is.na(camera.res$FDR),]
    camera.res$gene=rownames(camera.res)
    write.table(camera.res, output$tsv, quote=F, row.names=F)
    saveRDS(camera.res,file=output$rds)

}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
R From line 1 of scripts/camera.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    library(ggplot2)
    camera <- read.table(input$tsv, header=T)
    colnames(camera) <- c("nGuides", "Direction", "Pvalue", "FDR", "Gene")
    res <- as.data.frame(camera)

    res$Status <- factor("NS", levels = c("Up", "NS", "Down"))
    res$Status[res$Direction == "Up" & res$FDR < params$FDR] <- "Up"
    res$Status[res$Direction == "Down" & res$FDR < params$FDR] <- "Down"
    res$Pvalue <- -log10(res$Pvalue)
    res$Rank <- rank(res$Pvalue)

    col <- c(
    "Up"   = "#FF0000",
    "NS"   = "#B8DBCC",
    "Down" = "#6495ED"
    )

    res.s <- res[res$FDR<params$FDR,]

    plt <- ggplot(res, aes(x = Rank, y = Pvalue, colour = Status, size = nGuides, label = Gene)) + 
    geom_point() + 
    geom_point(data = res.s, shape = 1, color="black") + 
    geom_text(data = res.s, hjust = 0, nudge_x = -20, color="black", size=3, check_overlap = T) +  
    scale_colour_manual(values = col, breaks = names(col)) + 
    theme_classic() +
    labs(title= "Camera rank",
        x = "Rank",
        y = "-log10(Pvalue)",
        colour = "Status"
    )
    png(output$plot, width=3000, height=3500, res=400)
    print(plt)
    dev.off()

}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    des=readRDS(input$rds)
    conditions <- colnames(des)
    combinations <- expand.grid(conditions, conditions)
    contrasts <- apply(combinations, 1, paste, collapse = "-")
    matrix <- makeContrasts(contrasts = contrasts, levels = conditions)
    saveRDS(matrix,file=output$rds)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script 
    library(limma)
    library(edgeR)
    x=readRDS(input$rds)

    mat=cpm(x$counts, log=TRUE, prior.count = 1)
    mat=removeBatchEffect(mat, batch=factor(x$samples$batch))

    saveRDS(mat, file=output$rds)
}
analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    x=readRDS(input$rds[1])
    des=readRDS(input$rds[2])
    xglm = estimateDisp(x, des)
    com_disp=sqrt(xglm$common.disp)
    print(com_disp)
    saveRDS(xglm,file=output$rds)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
exp_hm <- function(x, mat, selY, hmtitle) {
    colnames(mat) <- paste(x$samples$group, x$samples$Replicate, sep = " - " )  
    mat <- subset(mat, rownames(mat) %in% selY)
    colors <- colorRampPalette( brewer.pal(9, "Blues") )(255)
    pheatmap(mat, col = colors,border_color=NA, main=hmtitle)
}

analysis=function(input, output, params, log) {

    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(RColorBrewer)
    library(pheatmap)
    library(edgeR)

    lrt=readRDS(input$rds[1])
    top2 = topTags(lrt, n=Inf)
    selY <- selY <- rownames(top2)[c(1:20)]

    x=readRDS(input$rds[2])
    mat <- cpm(x$counts, log=TRUE, prior.count = 1)

    png(output$plot[1], width=3000, height=3500, res=400)
    exp_hm(x, mat, selY, "Differential expression \n across the groups (logCPM)")
    dev.off()

    #batch corrected
    mat=readRDS(input$rds[3])

    png(output$plot[2], width=3000, height=3500, res=400)
    exp_hm(x, mat, selY, "Batch corrected differential expression \n across the groups (logCPM)")
    dev.off()
}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    thresh = params$threshold   

    lrt=readRDS(input$rds)
    top2 = topTags(lrt, n=Inf)
    top2ids = top2$table[top2$table$FDR<thresh,1]
    write.table(top2ids, output$tsv, row.names=F, quote=F, col.names=F)
    saveRDS(top2ids, file=output$rds)

}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
analysis =function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    x=readRDS(input$rds)
    sel=rowSums(cpm(x$counts)>0.5)>=3
    x=x[sel,]
    saveRDS(x, file = output$rds)
    png(output$plot,width=4000, height=2000, res=400)
        par(mfrow=c(2,1))
        barplot(colSums(x$counts), las=2, main="Counts per index", cex.names=0.5, cex.axis=0.8)
        barplot(rowSums(x$counts), las=2, main="Counts per guide RNA", cex.names=0.5, cex.axis=0.8)
    dev.off()
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    library(metap)
    lrt=readRDS(input$rds)
    # gene level
    dat=NULL

    for (i in unique(lrt$genes$Gene)) {

        sel = lrt$genes$Gene == i & !is.na(lrt$genes$Gene)

        #number of guideRNAs
        nguideRNA=length(which(sel))

        if (nguideRNA==1) {
            logFC=lrt$table$logFC[which(sel)]
            IQRFC=NA
            if (logFC>0) {
                dir="Up" 
            } else {
                dir="Down"
            }
            if (logFC>0) {
                dir_pval="Up" 
            } else {
                dir_pval="Down"
            }
            stouffers=NA
        }

        if (nguideRNA>1) {
            #calculate average logFC
            logFC=mean(lrt$table$logFC[which(sel)])
            IQRFC=IQR(lrt$table$logFC[which(sel)])
            if (logFC >0) {
                dir="Up"
                } else {
                dir="Down"
                }

            #select up regulated guideRNAs
            res=lrt$table[which(sel),]
            up=res[which(res$logFC>0),]
            down=res[which(res$logFC<0),]
            options(warn=-1)
            if ((min(up$Pvalue)>min(down$PValue)) | nrow(up)==0)
                dir_pval="Down"
            if ((min(up$Pvalue)<min(down$PValue)) | nrow(down)==0)
                dir_pval="Up"
            options(warn=-0)
            pvals=lrt$table$PValue[which(sel)]
            pvals=pvals[!pvals==1]
            if (length(pvals)>1) {
            stouffers=as.numeric(sumz(lrt$table$PValue[which(sel)])[2])
            } else {
            stouffers=NA
            }
        }
        vector=cbind(i,nguideRNA, logFC, IQRFC, dir, dir_pval, stouffers)
        dat=rbind(dat, vector)
    }
    dat=cbind(dat,p.adjust(dat[,"stouffers"], method="fdr"))
    colnames(dat)=c("gene", "nguides", "mean.logFC", "iqr.logFC", "direction.mean.logFC",
     "direction.smallest.pvalue", "stouffers.pvalue", "stouffers.FDR")

    write.table(dat, output$tsv, quote=F, row.names=F, sep=",")
    saveRDS(data.frame(dat),file=output$rds)

}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
 analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    library(ggplot2)
    genelevel <- read.table(input$tsv, header=T, sep=",")

    colnames(genelevel) <- c("Gene", "nGuides", "Mean logFC", "IQR logFC", "Direction mean logFC",	"Direction smallest Pvalue", "Stouffer's Pvalue", "Stouffer's FDR")

    res <- as.data.frame(genelevel)
    FC_index=c(paste0("> ", params$FC), paste0("< ", params$FC, " and > -", params$FC), paste0("< -", params$FC))

    res$LogFC <- factor(FC_index[2], levels = FC_index)
    res$LogFC[res$`Mean logFC`>params$FC] <- FC_index[1]
    res$LogFC[res$`Mean logFC`<(-(params$FC))] <- FC_index[3]
    res$Rank <- rank(res$`Mean logFC`)

    col <- c( "#FF0000","#B8DBCC", "#6495ED")
    names(col)=c(FC_index)
    res.up <-  res[order(-res$`Mean logFC`),]
    res.up <- res.up[res.up$`Mean logFC`>params$FC,][c(1:5),]
    res.down <- res[order(res$`Mean logFC`),]
    res.down <- res.down[res.down$`Mean logFC`<(-params$FC),][c(1:5),]

    plt <- ggplot(res, aes(x = Rank, y = `Mean logFC`, colour = LogFC, size = nGuides, label = Gene)) + 
      geom_point() + 
      geom_point(data = res.up, shape = 1, color="black") + 
      geom_point(data = res.down, shape = 1, color="black") + 
      geom_text(data = res.up, hjust = 0, nudge_x = -20, color="black", size=3, check_overlap = T) +  
      geom_text(data = res.down, hjust = 0, nudge_x = -20, color="black", size=3, check_overlap = T) + 
      theme_classic() +
      scale_colour_manual(values = col, breaks = names(col)) + 
      labs(title = "Gene rank",
        x = "Rank",
        y = "Mean log fold-change",
        colour = "LogFC"
      )  
    png(output$plot, width=3000, height=3500, res=400)
    print(plt)
    dev.off()

}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    des=readRDS(input$rds[1])
    xglm=readRDS(input$rds[2])

    fit = glmFit(xglm, des)
    saveRDS(fit,file=output$rds)

}

analysis(snakemake@input, snakemake@output, snakemake@log)
R From line 1 of scripts/glmFit.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    matrix=readRDS(input$rds[1])
    fit=readRDS(input$rds[2])
    lrt = glmLRT(fit, contrast=matrix[, params$contrast])
    saveRDS(lrt,file=output$rds)

}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
R From line 1 of scripts/glmLRT.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(AnnotationDbi)
    library(params$organism, lib.loc=dirname(input$pkg), character.only = TRUE)

    matrix=readRDS(input$rds[1])
    lrt=readRDS(input$rds[2])

    org <- params$organism
    obj <- getFromNamespace(org, org)

    row.names(lrt) <- keys(obj)[1:nrow(lrt)]
    library(limma)
    go <- goana(lrt, con=matrix[, params$contrast], FDR=params$threshold, 
    species = strsplit(params$organism, ".", fixed = TRUE)[[1]][2])

    write.table(go, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA)
    saveRDS(go,file=output$rds)
}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    lrt=readRDS(input$rds)
    png(output$plot, width=3500, height=2000, res=400)
    hist(lrt$table$PValue, breaks = 40, main = "Histogram of guide RNA P values", xlab="guide RNA p values")
    dev.off()
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(AnnotationDbi)
    library(params$organism, lib.loc=dirname(input$pkg), character.only = TRUE)

    matrix=readRDS(input$rds[1])
    lrt=readRDS(input$rds[2])

    org <- params$organism
    obj <- getFromNamespace(org, org)

    row.names(lrt) <- keys(obj)[1:nrow(lrt)]
    library(limma)
    keg = kegga(lrt, species = strsplit(params$organism, ".", fixed = TRUE)[[1]][2], 
    coef=matrix[, params$contrast],  FDR=params$threshold)

    write.table(keg, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA)
    saveRDS(keg,file=output$rds)
}

analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
mdsplt <- function(x, mat, mdstitle) {

    var <- matrixStats::rowVars(mat)
    num <- min(500, length(var))
    ind <- order(var, decreasing = TRUE)[seq_len(num)]
    dst <- dist(t(mat[ind, ]))
    mds <- cmdscale(as.matrix(dst))
    dat <- data.frame(
    MD1 = mds[, 1], 
    MD2 = mds[, 2], 
    group = x$samples$group
    )

    ggplot(dat, aes(MD1, MD2, colour = group)) + 
    geom_point(size = 3) + 
    labs(x = "MDS 1", y = "MDS 2", colour = "") + 
    labs(title=mdstitle) +  
    scale_color_brewer(palette = "Set3") +
    theme_classic()

}

analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script 
    library(edgeR)
    library(RColorBrewer)
    library(ggplot2)
    x=readRDS(input$rds[1])
    mat <- cpm(x$counts, log = TRUE, prior.count = 1)
    png(output$plot[1], width=2500, height=1800, res=400)
    par(mar=c(5,5,3,7), xpd=TRUE)
    print(mdsplt(x, mat, "Multidimensional plot"))
    dev.off()

    ##batch corrected
    mat=readRDS(input$rds[2])

    png(output$plot[2], width=2500, height=1800, res=400)
    par(mar=c(5,5,3,7), xpd=TRUE)
    print(mdsplt(x, mat, "Batch corrected multidimensional plot"))
    dev.off()
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    data=list()
    for (i in 1:length(input$rds)) {
        inputlist=readRDS(input$rds[i])
        data=c(data, inputlist)

    }
    saveRDS(data, file = output$rds)
}


analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
R From line 1 of scripts/merge_shiny.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
 analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)

    x=readRDS(input$rds)

    x$samples$group=factor(x$samples$group, levels = c(unique(x$samples$group)))

    # Set group factor

    if ("group" %in% names(x$samples)) {
        group <- factor(x$samples$group)
        n.group <- nlevels(group)
        is.group <- n.group > 1
        } else {
        is.group <- FALSE
        }

    # Set batch factor

    if ("batch" %in% names(x$samples)) {
        batch <- factor(x$samples$batch)
        n.batch <- nlevels(batch)
        is.batch <- n.batch > 1
        } else {
        is.batch <- FALSE
        }

    # Construct design matrix

    if (is.group & !is.batch) {
        des <- model.matrix(~ 0 + group)
        }
    if (is.group & is.batch) {
        des <- model.matrix(~ 0 + group + batch)    
    }

    # Rename group coefficients

    which.group <- seq_len(n.group)
    colnames(des)[which.group] <- levels(group)      
    saveRDS(des,file=output$rds)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
analysis =function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    x=readRDS(input$rds)
    x=calcNormFactors(x)
    saveRDS(x,file=output$rds)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
R From line 1 of scripts/norm.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
pcaplt <- function(x, mat, pcatitle) {
    var <- matrixStats::rowVars(mat)
    num <- min(500, length(var))
    ind <- order(var, decreasing = TRUE)[seq_len(num)]
    pca <- prcomp(t(mat[ind, ]))
    pct <- (pca$sdev ^ 2) / sum(pca$sdev ^ 2)
    dat <- data.frame(
        PC1 = pca$x[,1],
        PC2 = pca$x[,2],
        group = x$samples$group) 

    ggplot(dat, aes_string(x = "PC1", y = "PC2", color = "group")) + 
      geom_point(size = 3) + 
      xlab(paste0("PC1: ", round(pct[1] * 100), "% variance")) + 
      ylab(paste0("PC2: ", round(pct[2] * 100), "% variance")) + 
      coord_fixed() +
      labs(title=pcatitle, color="") + 
      scale_color_brewer(palette="Set3") +
      theme_classic() 

}

analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    library(matrixStats)
    library(ggplot2)
    library(RColorBrewer)

    x=readRDS(input$rds[1])
    x$samples$group=factor(x$samples$group, levels = c(unique(x$samples$group)))

    mat=cpm(x$counts, log=TRUE, prior.count = 1)
    png(output$plot[1], width=2500, height=2000, res=400)
    par(mar=c(5,5,6,5))
    print(pcaplt(x, mat, "Principal component analysis"))
    dev.off()

    #batch corrected
    mat=readRDS(input$rds[2])

    png(output$plot[2], width=2500, height=2000, res=400)
    par(mar=c(5,5,6,5))
    print(pcaplt(x, mat, "Batch corrected principal component analysis"))
    dev.off()

}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
  library(edgeR)
  library(ggplot2)
  lrt=readRDS(input$rds[1])
  top2 <- topTags(lrt, n=Inf)
  top2ids <- top2$table[(top2$table$logFC>params$FC | top2$table$logFC<(-(params$FC))),1]
  selY=top2$table[top2$table$FDR<params$FDR,]
  df <- data.frame(lrt$table)
  df$Guide <- rownames(df)
  colors <- c("FDR sig." = "red")

  plt=ggplot(df, aes(x=logCPM, y=logFC, text=Guide)) +
    geom_point(color = "#b8dbcc") +
    geom_point(data = df[(row.names(df) %in% top2ids),], color = "#000000") +
    geom_point(data = df[(df$Guide %in% selY$ID),], aes(color = "FDR sig.")) +
    geom_hline(yintercept=(-(params$FC)), linetype="dashed", color="#b8dbcc") +
    geom_hline(yintercept=0,  color="cornflowerblue") +
    geom_hline(yintercept=(params$FC), linetype="dashed", color="#b8dbcc") +
    labs(title = "Mean difference plot",
    color="") +
    scale_color_manual(values = colors) +
    theme_classic()
  png(output$plot, width=2000, height=2000, res=400)
    print(plt)
  dev.off()
}


analysis(snakemake@input, snakemake@output,snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
analysis= function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    x = processAmplicons(c(input$fastq), 
                        readfile2=params$readfile2,
                        barcodefile=input$index, 
                        hairpinfile=input$guideRNA, 
                        verbose=TRUE,
                        hairpinBeforeBarcode=params$hairpinBeforeBarcode
                        )
    saveRDS(x, file = output$rds)
}
analysis(snakemake@input, snakemake@output,snakemake@params,snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
sd_heatmap <- function(x, mat, hmtitle) {
        sampleDists <- dist(t(mat))
        sampleDistMatrix <- as.matrix(sampleDists)
        rownames(sampleDistMatrix) <- paste(x$samples$group, x$samples$Replicate, sep = " - " )
        colnames(sampleDistMatrix) <- NULL
        colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
        pheatmap(sampleDistMatrix,
                clustering_distance_rows = sampleDists,
                clustering_distance_cols = sampleDists,
                col = colors,
                main = hmtitle)    
}

analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    library(DESeq2)
    library(DEFormats)
    library(ggplot2)
    library(pheatmap)
    library(RColorBrewer)

    x=readRDS(input$rds[1])
    mat=cpm(x$counts, log=TRUE, prior.count = 1)

    png(output$plot[1], width=2000, height=1500, res=400)
    sd_heatmap(x,mat,"Sample distances")
    dev.off()

    #batch corrected
    mat=readRDS(input$rds[2])

    png(output$plot[2], width=2000, height=1500, res=400)
    sd_heatmap(x,mat,"Batch corrected sample distances")
    dev.off()
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    lrt=readRDS(input$rds[1])
    assign(paste0(params$contrast, "_lrt"), lrt)
    go=readRDS(input$rds[2])
    assign(paste0(params$contrast, "_go"), go)
    kegg=readRDS(input$rds[3])
    assign(paste0(params$contrast, "_kegg"), kegg)
    camera=read.table(input$tsv[1], header=T)
    assign(paste0(params$contrast, "_camera"), camera)
    genelevel=read.table(input$tsv[2], header=T, sep=",")
    assign(paste0(params$contrast, "_genelevel"), genelevel)

    data=ls()
    data=data[! data %in% c("input", "output", "params", "log", "out", "err", "lrt", "go", "kegg", "camera", "genelevel")]

    data=mget(data)

    saveRDS(data, file = output$rds)
}


analysis(snakemake@input, snakemake@output, snakemake@params, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    x=readRDS(input$rds[1])
    corrected=readRDS(input$rds[2])
    des=readRDS(input$rds[3])
    matrix=readRDS(input$rds[4])
    xglm=readRDS(input$rds[5])

    data=ls()
    data=data[! data %in% c("input", "output", "params", "log", "out", "err")]

    data=mget(data)

    saveRDS(data, file = output$rds)
}


analysis(snakemake@input, snakemake@output, snakemake@log)
R From line 1 of scripts/shiny.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(limma)
    go=readRDS(input$rds)
    topgo <- topGO(go, sort="up")

    write.table(go, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
R From line 1 of scripts/top_goana.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    lrt=readRDS(input$rds)
    write.table(topTags(lrt), output$tsv, row.names=F, quote=F)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
analysis=function(input, output, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(limma)
    keg=readRDS(input$rds)
    topkegg <- topKEGG(keg, sort="up")

    write.table(topkegg, file = output$tsv, quote = FALSE, sep = '\t', col.names = NA)
}

analysis(snakemake@input, snakemake@output, snakemake@log)
R From line 1 of scripts/top_kegg.R
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
analysis=function(input, output, params, log) {
    #Log 
    out <- file(log$out, open = "wt")

    err <- file(log$err, open = "wt")

    sink(out, type = "output")

    sink(err, type = "message")

    #Script
    library(edgeR)
    library(ggplot2)
    lrt=readRDS(input$rds)
    top2 <- topTags(lrt, n=Inf)
    selY=top2$table[top2$table$FDR<params$FDR,]
    df <- data.frame(lrt$table)
    df$Guide <- rownames(df)
    colors <- c("FDR sig." = "red")

    plt=ggplot(df, aes(x=logFC, y=-10*log10(PValue), text=Guide)) + geom_point() +
    geom_point() + 
    geom_point(data = df[(df$Guide %in% selY$ID),], aes(color = "FDR sig.")) +
    labs(title = "Volcano plot", x="M", y = "-10*log(P-value)", color="") +
    scale_color_manual(values = colors) +
    theme_classic()

    png(output$plot, width=2000, height=2000, res=400)
    print(plt)
    dev.off()

}

analysis(snakemake@input, snakemake@output,snakemake@params, snakemake@log)
ShowHide 55 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/zifornd/Bioinformatics-Internship
Name: bioinformatics-internship
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...