Workflow Steps and Code Snippets

130 tagged steps and code snippets that match keyword FeatureCounts

High throughput Next Generation Sequencing (NGS) data analysis using Python 3 Snakemake

87
88
89
90
91
92
93
	shell:
		"~/miniconda2/bin/featureCounts -a genome/gencode.v32.annotation.gtf -o {output.counts} {input.bam} -T {threads}

rule PhiXContamination:
	input:
		trimmed1="Trimmed/{id}_forward_paired.fastq.gz",
		trimmed2="Trimmed/{id}_reverse_paired.fastq.gz"

Projet fait par Pierre BERGERET, Camille RABIER, Lydie TRAN et Dalyan VENTURA

124
125
126
127
shell:
 """
  featureCounts -T {threads} -t {wildcards.TYPE} -g gene_id -s {wildcards.STRAND} -a {input.genome} -o {output} {input.bam}
 """

Snakemake workflow for generating gene expression counts from RNA-sequencing data.

15
16
17
18
19
20
21
22
shell:
    "featureCounts "
    "{params.extra} "
    "-a {params.annotation} "
    "-o {output.counts} "
    "-T {threads} "
    "{input.bam} > {log} 2>&1; "
    "mv \"{output.counts}.summary\" {output.summary}"

RNA-seq pipeline comparison and analyses

706
707
708
709
710
711
712
713
714
715
716
shell:
    '''
    featureCounts \
        {input.bams} \
        -T {threads} \
        -a {input.annotation} \
        -L \
        -M \
        -S 1 \
        -o {output.counts}
    '''

A snakemake pipeline improves the gene annotation for cross species analysis of single cell RNA-Seq

  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
rule mapFastq:
  input:
    R1 = config["fastqDir"] + "{fastq}_R1.fastq.gz",
    R2 = config["fastqDir"] + "{fastq}_R2.fastq.gz"
  output:
    countMtx = config["outputDir"] + "starSolo/{fastq}/{fastq}.Solo.out/Gene/raw/matrix.mtx",
    sortBam = config["outputDir"] + "starSolo/{fastq}/{fastq}.Aligned.sortedByCoord.out.bam"
  params:
    outputPrefix = lambda wildcards: config["outputDir"] + "starSolo/" + wildcards.fastq + "/" + wildcards.fastq + ".",
    whitelist = config["whitelist"],soloType = config["STARArgs"].get("soloType","Droplet"),
    barcodeLen = 0, CBLen = config["CBLen"], UMIStart = config["UMIStart"], UMILen = config["UMILen"],
    genome = config["StarsoloGenome"],
    readFilesCommand = config["STARArgs"].get("readFilesCommand","zcat"),
    outSAMtype = config["STARArgs"].get("outSAMtype","BAM SortedByCoordinate"),
    soloStrand = config["STARArgs"].get("soloStrand","Forward"),
    winAnchorMultimapNmax = config["STARArgs"].get("winAnchorMultimapNmax",2000),
    outFilterMultimapNmax = config["STARArgs"].get("outFilterMultimapNmax",2000),
    outSAMprimaryFlag = config["STARArgs"].get("outSAMprimaryFlag","AllBestScore"),
    outSAMmultNmax = config["STARArgs"].get("outSAMmultNmax",1),
    limitBAMsortRAM = config["STARArgs"].get("limitBAMsortRAM",40000000000),
    limitOutSJoneRead = config["STARArgs"].get("limitOutSJoneRead",10000),
    limitOutSJcollapsed = config["STARArgs"].get("limitOutSJcollapsed",3000000),
    limitIObufferSize = config["STARArgs"].get("limitIObufferSize",300000000),
    outSAMattributes = config["STARArgs"].get("outSAMattributes"),
    additionalArguments = config["STARArgs"].get("additionalArguments", ""),
    starOptions = starsoloParameters
  threads: math.ceil(config["STARArgs"].get("threads",16) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["STARArgs"].get("memory",40000)
  shell:
    """
      STAR \
        --genomeDir {params.genome} \
        --runThreadN {threads} \
        --readFilesIn {input.R2} {input.R1}  \
        --readFilesCommand {params.readFilesCommand} \
        --outSAMtype {params.outSAMtype} \
	      --winAnchorMultimapNmax {params.winAnchorMultimapNmax} \
	      --outFilterMultimapNmax {params.outFilterMultimapNmax} \
	      --outSAMprimaryFlag {params.outSAMprimaryFlag} \
	      --outSAMmultNmax {params.outSAMmultNmax} \
	      --limitBAMsortRAM {params.limitBAMsortRAM} \
	      --limitOutSJoneRead {params.limitOutSJoneRead} \
        --limitOutSJcollapsed {params.limitOutSJcollapsed} \
        --limitIObufferSize {params.limitIObufferSize} \
        --outFileNamePrefix {params.outputPrefix} \
        --outSAMattributes {params.outSAMattributes} \
	      {starsoloParameters} \
        {params.additionalArguments}
    """

rule featureCount:
  input:
    bamfile = rules.mapFastq.output.sortBam,
    gtf = config["featureCountGTF"]
  output:
    temp(config["outputDir"] + "featureCount/" + config["countBy"] + "/{fastq}.Aligned.sortedByCoord.out.bam.featureCounts.bam")
  params:
    outDir = config["outputDir"] + "featureCount/" + config["countBy"] + "/", strand = config["featureCountArgs"].get("strand",1)
  threads: math.ceil(config["featureCountArgs"].get("threads",16) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["featureCountArgs"].get("memory",10000)
  shell:
    """
      featureCounts --fraction -M -s {params.strand} -T {threads} -t exon -g gene_name -a {input.gtf} -o {params.outDir}{wildcards.fastq}.counts.txt {input.bamfile} -R BAM --Rpath {params.outDir}
    """


rule featureCountToDT:
  input:
    rules.featureCount.output
  output:
    config["outputDir"] + "featureCount/" + config["countBy"] + "/" + "{fastq}.tsv.gz"
  params:
    countScript = srcdir("../scripts/countBamFromSTARSolo.sh"),
    outDir = config["outputDir"] + "featureCount/",
    numHit = config['numHit']
  threads: math.ceil(config["featureCountArgs"].get("threads",8) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["featureCountArgs"].get("memory",4000)
  shell:
    """
      samtools view {input} | NUMHIT={params.numHit} bash {params.countScript} | pigz -p {threads} -c > {output}
    """

rule DTToH5:
  input:
    rules.featureCountToDT.output
  output:
    countMtx = config["outputDir"] + "featureCount/" +  config["countBy"] + "/{fastq}.h5",
    molInfo = config["outputDir"] + "featureCount/" +  config["countBy"] + "/{fastq}.molInfo.h5"
  params:
    writeToh5 = srcdir("../scripts/writeHDF5.R"), whitelist = config["whitelist"]
  threads: math.ceil(config["DTToH5Args"].get("threads",16) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["DTToH5Args"].get("memory",4000)
  shell:
    """
      Rscript {params.writeToh5} {input} {params.whitelist} {output.countMtx} {threads} {wildcards.fastq}
    """

This workflow performs an RNA-seq analysis from the sequencing output data to the differential expression analyses. (v2.0.0)

101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
shell:
  """
  reads=({params.reads})
  geneid=({params.geneid})
  annotation={input.annotation}
  bam=({input.bam})
  countmatrices=({output.countmatrices})
  len=${{#bam[@]}}
  if [ ${{reads}} == 'paired' ];then
    for (( i=0; i<$len; i++ ))
      do featureCounts -T 12 -p -t exon -g ${{geneid}} -a ${{annotation}} -o ${{countmatrices[$i]}} ${{bam[$i]}}
    done
  elif [ ${{reads}} == 'unpaired' ];then
    for (( i=0; i<$len; i++ ))
      do featureCounts -T 12 -t exon -g ${{geneid}} -a ${{annotation}} -o ${{countmatrices[$i]}} ${{bam[$i]}}
    done
  fi
  """

RNA-seq Preprocessing Pipeline with Snakemake for Differential Expression Analysis

83
84
85
86
87
88
shell:
    """
    featureCounts -a {params.GTF} \
    -o {output.counts} \
     {input.bam_files}
    """

Snakemake workflow: J2Seq

 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
library(Rsubread)
library(dplyr)
library(mgsub)

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts`
## run all bams together
samples <- read.table(snakemake@input[["samples"]], header=T)
bamfiles <- paste0("./merged_bam/", as.vector(samples$sample),"_merged_dedup_sorted.bam")

## run one bam file
# bamfiles <- snakemake@input[["bamfile"]]
RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['gtf']],
    isGTFAnnotationFile=TRUE, GTF.featureType=snakemake@params[["featureType"]], GTF.attrType="gene_name",
    strandSpecific=snakemake@params[["strand"]], countMultiMappingReads=FALSE, juncCounts=TRUE, nthreads=snakemake@threads[[1]])

write.table(RPFcounts$counts, file=snakemake@output[[1]], sep="\t", quote=F, row.names = TRUE, col.names = NA)
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
library(Rsubread)
library(dplyr)
library(mgsub)

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts`
## run all bams together
samples <- read.table(snakemake@input[["samples"]], header=T)
bamfiles <- paste0("./merged_bam/", as.vector(samples$sample),"_merged_dedup_sorted.bam")

## run one bam file
# bamfiles <- snakemake@input[["bamfile"]]
RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['saf']],
    isGTFAnnotationFile=FALSE, fracOverlap=1,
    strandSpecific=snakemake@params[["strand"]], countMultiMappingReads=FALSE, juncCounts=TRUE, 
    nthreads=snakemake@threads[[1]])

write.table(RPFcounts$counts, file=snakemake@output[[1]], sep="\t", quote=F, row.names = TRUE, col.names = NA)

Ribo-seq_pipeline Snakemake workflow

 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library(Rsubread)
library(dplyr)
library(mgsub)

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts`
## run all bams together
samples <- read.table(snakemake@input[["samples"]], header=T)
bamfiles <- paste0("./STAR_align/", as.vector(samples$sample),".bam")

## run one bam file
# bamfiles <- snakemake@input[["bamfile"]]
RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['gtf']],
    isGTFAnnotationFile=TRUE, GTF.featureType="CDS", GTF.attrType="gene_id")

id_length <- RPFcounts$annotation %>% as.data.frame() %>% dplyr::select(GeneID,Length)
rownames(id_length) <- id_length$GeneID
count_table <- merge(id_length, RPFcounts$counts,by="row.names")[,-1]
mapped_reads <- RPFcounts$stat %>% dplyr::filter(Status=="Assigned")
# convert counts to RPKM
values <- mapply('/', count_table %>% summarise(across(starts_with("GSM"), ~./Length*1000*1000000)), mapped_reads[,-1])
rpkm_table <- cbind(count_table[,c(1,2)], values)
colnames(rpkm_table) <- mgsub(colnames(rpkm_table), c("RibosomeProfiling_", ".bam"), c("",""))

write.table(rpkm_table, file=snakemake@output[[1]], quote=FALSE, sep="\t", col.names=TRUE, row.names=FALSE)
tool / biotools

FeatureCounts

featureCounts is a very efficient read quantifier. It can be used to summarize RNA-seq reads and gDNA-seq reads to a variety of genomic features such as genes, exons, promoters, gene bodies and genomic bins. It is included in the Bioconductor Rsubread package and also in the SourceForge Subread package.