Workflow Steps and Code Snippets

130 tagged steps and code snippets that match keyword FeatureCounts

High throughput Next Generation Sequencing (NGS) data analysis using Python 3 Snakemake

	shell:
		"~/miniconda2/bin/featureCounts -a genome/gencode.v32.annotation.gtf -o {output.counts} {input.bam} -T {threads}

rule PhiXContamination:
	input:
		trimmed1="Trimmed/{id}_forward_paired.fastq.gz",
		trimmed2="Trimmed/{id}_reverse_paired.fastq.gz"

SnakeMake FeatureCounts From line 87 of master/Snakefile

Projet fait par Pierre BERGERET, Camille RABIER, Lydie TRAN et Dalyan VENTURA

shell:
 """
  featureCounts -T {threads} -t {wildcards.TYPE} -g gene_id -s {wildcards.STRAND} -a {input.genome} -o {output} {input.bam}
 """

SnakeMake FeatureCounts From line 124 of master/Snakefile

Snakemake workflow for generating gene expression counts from RNA-sequencing data.

shell:
    "featureCounts "
    "{params.extra} "
    "-a {params.annotation} "
    "-o {output.counts} "
    "-T {threads} "
    "{input.bam} > {log} 2>&1; "
    "mv \"{output.counts}.summary\" {output.summary}"

SnakeMake FeatureCounts From line 15 of rules/counts.smk

RNA-seq pipeline comparison and analyses

shell:
    '''
    featureCounts \
        {input.bams} \
        -T {threads} \
        -a {input.annotation} \
        -L \
        -M \
        -S 1 \
        -o {output.counts}
    '''

SnakeMake FeatureCounts From line 706 of workflow/Snakefile

A snakemake pipeline improves the gene annotation for cross species analysis of single cell RNA-Seq

rule mapFastq:
  input:
    R1 = config["fastqDir"] + "{fastq}_R1.fastq.gz",
    R2 = config["fastqDir"] + "{fastq}_R2.fastq.gz"
  output:
    countMtx = config["outputDir"] + "starSolo/{fastq}/{fastq}.Solo.out/Gene/raw/matrix.mtx",
    sortBam = config["outputDir"] + "starSolo/{fastq}/{fastq}.Aligned.sortedByCoord.out.bam"
  params:
    outputPrefix = lambda wildcards: config["outputDir"] + "starSolo/" + wildcards.fastq + "/" + wildcards.fastq + ".",
    whitelist = config["whitelist"],soloType = config["STARArgs"].get("soloType","Droplet"),
    barcodeLen = 0, CBLen = config["CBLen"], UMIStart = config["UMIStart"], UMILen = config["UMILen"],
    genome = config["StarsoloGenome"],
    readFilesCommand = config["STARArgs"].get("readFilesCommand","zcat"),
    outSAMtype = config["STARArgs"].get("outSAMtype","BAM SortedByCoordinate"),
    soloStrand = config["STARArgs"].get("soloStrand","Forward"),
    winAnchorMultimapNmax = config["STARArgs"].get("winAnchorMultimapNmax",2000),
    outFilterMultimapNmax = config["STARArgs"].get("outFilterMultimapNmax",2000),
    outSAMprimaryFlag = config["STARArgs"].get("outSAMprimaryFlag","AllBestScore"),
    outSAMmultNmax = config["STARArgs"].get("outSAMmultNmax",1),
    limitBAMsortRAM = config["STARArgs"].get("limitBAMsortRAM",40000000000),
    limitOutSJoneRead = config["STARArgs"].get("limitOutSJoneRead",10000),
    limitOutSJcollapsed = config["STARArgs"].get("limitOutSJcollapsed",3000000),
    limitIObufferSize = config["STARArgs"].get("limitIObufferSize",300000000),
    outSAMattributes = config["STARArgs"].get("outSAMattributes"),
    additionalArguments = config["STARArgs"].get("additionalArguments", ""),
    starOptions = starsoloParameters
  threads: math.ceil(config["STARArgs"].get("threads",16) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["STARArgs"].get("memory",40000)
  shell:
    """
      STAR \
        --genomeDir {params.genome} \
        --runThreadN {threads} \
        --readFilesIn {input.R2} {input.R1}  \
        --readFilesCommand {params.readFilesCommand} \
        --outSAMtype {params.outSAMtype} \
	      --winAnchorMultimapNmax {params.winAnchorMultimapNmax} \
	      --outFilterMultimapNmax {params.outFilterMultimapNmax} \
	      --outSAMprimaryFlag {params.outSAMprimaryFlag} \
	      --outSAMmultNmax {params.outSAMmultNmax} \
	      --limitBAMsortRAM {params.limitBAMsortRAM} \
	      --limitOutSJoneRead {params.limitOutSJoneRead} \
        --limitOutSJcollapsed {params.limitOutSJcollapsed} \
        --limitIObufferSize {params.limitIObufferSize} \
        --outFileNamePrefix {params.outputPrefix} \
        --outSAMattributes {params.outSAMattributes} \
	      {starsoloParameters} \
        {params.additionalArguments}
    """

rule featureCount:
  input:
    bamfile = rules.mapFastq.output.sortBam,
    gtf = config["featureCountGTF"]
  output:
    temp(config["outputDir"] + "featureCount/" + config["countBy"] + "/{fastq}.Aligned.sortedByCoord.out.bam.featureCounts.bam")
  params:
    outDir = config["outputDir"] + "featureCount/" + config["countBy"] + "/", strand = config["featureCountArgs"].get("strand",1)
  threads: math.ceil(config["featureCountArgs"].get("threads",16) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["featureCountArgs"].get("memory",10000)
  shell:
    """
      featureCounts --fraction -M -s {params.strand} -T {threads} -t exon -g gene_name -a {input.gtf} -o {params.outDir}{wildcards.fastq}.counts.txt {input.bamfile} -R BAM --Rpath {params.outDir}
    """


rule featureCountToDT:
  input:
    rules.featureCount.output
  output:
    config["outputDir"] + "featureCount/" + config["countBy"] + "/" + "{fastq}.tsv.gz"
  params:
    countScript = srcdir("../scripts/countBamFromSTARSolo.sh"),
    outDir = config["outputDir"] + "featureCount/",
    numHit = config['numHit']
  threads: math.ceil(config["featureCountArgs"].get("threads",8) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["featureCountArgs"].get("memory",4000)
  shell:
    """
      samtools view {input} | NUMHIT={params.numHit} bash {params.countScript} | pigz -p {threads} -c > {output}
    """

rule DTToH5:
  input:
    rules.featureCountToDT.output
  output:
    countMtx = config["outputDir"] + "featureCount/" +  config["countBy"] + "/{fastq}.h5",
    molInfo = config["outputDir"] + "featureCount/" +  config["countBy"] + "/{fastq}.molInfo.h5"
  params:
    writeToh5 = srcdir("../scripts/writeHDF5.R"), whitelist = config["whitelist"]
  threads: math.ceil(config["DTToH5Args"].get("threads",16) * scaleDownThreads)
  conda:
    "../envs/generateH5.yaml"
  resources:
    mem_mb=lambda wildcards, attempt: attempt * config["DTToH5Args"].get("memory",4000)
  shell:
    """
      Rscript {params.writeToh5} {input} {params.whitelist} {output.countMtx} {threads} {wildcards.fastq}
    """

Python SAMtools STAR FeatureCounts From line 5 of rules/countFastq.py

This workflow performs an RNA-seq analysis from the sequencing output data to the differential expression analyses. (v2.0.0)

shell:
  """
  reads=({params.reads})
  geneid=({params.geneid})
  annotation={input.annotation}
  bam=({input.bam})
  countmatrices=({output.countmatrices})
  len=${{#bam[@]}}
  if [ ${{reads}} == 'paired' ];then
    for (( i=0; i<$len; i++ ))
      do featureCounts -T 12 -p -t exon -g ${{geneid}} -a ${{annotation}} -o ${{countmatrices[$i]}} ${{bam[$i]}}
    done
  elif [ ${{reads}} == 'unpaired' ];then
    for (( i=0; i<$len; i++ ))
      do featureCounts -T 12 -t exon -g ${{geneid}} -a ${{annotation}} -o ${{countmatrices[$i]}} ${{bam[$i]}}
    done
  fi
  """

SnakeMake FeatureCounts From line 101 of 04_Workflow/count.smk

RNA-seq Preprocessing Pipeline with Snakemake for Differential Expression Analysis

shell:
    """
    featureCounts -a {params.GTF} \
    -o {output.counts} \
     {input.bam_files}
    """

SnakeMake FeatureCounts From line 83 of main/Snakefile

Snakemake workflow: J2Seq

library(Rsubread)
library(dplyr)
library(mgsub)

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts`
## run all bams together
samples <- read.table(snakemake@input[["samples"]], header=T)
bamfiles <- paste0("./merged_bam/", as.vector(samples$sample),"_merged_dedup_sorted.bam")

## run one bam file
# bamfiles <- snakemake@input[["bamfile"]]
RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['gtf']],
    isGTFAnnotationFile=TRUE, GTF.featureType=snakemake@params[["featureType"]], GTF.attrType="gene_name",
    strandSpecific=snakemake@params[["strand"]], countMultiMappingReads=FALSE, juncCounts=TRUE, nthreads=snakemake@threads[[1]])

write.table(RPFcounts$counts, file=snakemake@output[[1]], sep="\t", quote=F, row.names = TRUE, col.names = NA)

R dplyr FeatureCounts Rsubread mgsub From line 2 of scripts/featureCount.R

library(Rsubread)
library(dplyr)
library(mgsub)

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts`
## run all bams together
samples <- read.table(snakemake@input[["samples"]], header=T)
bamfiles <- paste0("./merged_bam/", as.vector(samples$sample),"_merged_dedup_sorted.bam")

## run one bam file
# bamfiles <- snakemake@input[["bamfile"]]
RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['saf']],
    isGTFAnnotationFile=FALSE, fracOverlap=1,
    strandSpecific=snakemake@params[["strand"]], countMultiMappingReads=FALSE, juncCounts=TRUE, 
    nthreads=snakemake@threads[[1]])

write.table(RPFcounts$counts, file=snakemake@output[[1]], sep="\t", quote=F, row.names = TRUE, col.names = NA)

R dplyr FeatureCounts Rsubread mgsub From line 2 of scripts/featureCount_segments.R

Ribo-seq_pipeline Snakemake workflow

library(Rsubread)
library(dplyr)
library(mgsub)

log <- file(snakemake@log[[1]], open="wt")
sink(log)
sink(log, type="message")

## Count RPFs (normalized in RPKM) on CDS for each gene, using `featureCounts`
## run all bams together
samples <- read.table(snakemake@input[["samples"]], header=T)
bamfiles <- paste0("./STAR_align/", as.vector(samples$sample),".bam")

## run one bam file
# bamfiles <- snakemake@input[["bamfile"]]
RPFcounts <- featureCounts(files=bamfiles, annot.ext=snakemake@input[['gtf']],
    isGTFAnnotationFile=TRUE, GTF.featureType="CDS", GTF.attrType="gene_id")

id_length <- RPFcounts$annotation %>% as.data.frame() %>% dplyr::select(GeneID,Length)
rownames(id_length) <- id_length$GeneID
count_table <- merge(id_length, RPFcounts$counts,by="row.names")[,-1]
mapped_reads <- RPFcounts$stat %>% dplyr::filter(Status=="Assigned")
# convert counts to RPKM
values <- mapply('/', count_table %>% summarise(across(starts_with("GSM"), ~./Length*1000*1000000)), mapped_reads[,-1])
rpkm_table <- cbind(count_table[,c(1,2)], values)
colnames(rpkm_table) <- mgsub(colnames(rpkm_table), c("RibosomeProfiling_", ".bam"), c("",""))

write.table(rpkm_table, file=snakemake@output[[1]], quote=FALSE, sep="\t", col.names=TRUE, row.names=FALSE)

R dplyr FeatureCounts Rsubread mgsub From line 2 of scripts/featureCount.R

tool / biotools

FeatureCounts

featureCounts is a very efficient read quantifier. It can be used to summarize RNA-seq reads and gDNA-seq reads to a variety of genomic features such as genes, exons, promoters, gene bodies and genomic bins. It is included in the Bioconductor Rsubread package and also in the SourceForge Subread package.

References: