Analysis of Dual RNA-seq data - an experimental method for interrogating host-pathogen interactions through simultaneous RNA-seq.

public public 1yr ago Version: 1.0.0 0 bookmarks

Dual RNA-seq pipeline

nf-core/dualrnaseq is a bioinformatics pipeline built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Introduction

nf-core/dualrnaseq is specifically used for the analysis of Dual RNA-seq data, interrogating host-pathogen interactions through simultaneous RNA-seq.

This pipeline has been initially tested with eukaryotic host's including Human and Mouse, and pathogens including Salmonella enterica , Orientia tsutsugamushi , Streptococcus penumoniae , Escherichia coli and Mycobacterium leprae . The workflow should work with any eukaryotic and bacterial organisms with an available reference genome and annotation.

Method

The workflow merges host and pathogen genome annotations taking into account differences in annotation conventions, then processes raw data from FastQ inputs ( FastQC , BBDuk ), quantifies gene expression ( STAR and HTSeq ; STAR , Salmon and tximport ; or Salmon in quasimapping mode and tximport ), and summarises the results ( MultiQC ), as well as generating a number of custom summary plots and separate results tables for the pathogen and host. See the output documentation for more details.

Workflow

The workflow diagram below gives a simplified visual overview of how dualrnaseq has been designed.

nf-core/dualrnaseq

Documentation

The nf-core/dualrnaseq pipeline comes with documentation about the pipeline, found in the docs/ directory:

  1. Installation

  2. Pipeline configuration

  3. Running the pipeline

  4. Output and how to interpret the results

  5. Troubleshooting

Credits

nf-core/dualrnaseq was coded and written by Bozena Mika-Gospodorz and Regan Hayward.

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #dualrnaseq channel (you can join with this invite ).

Citations

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x . ReadCube: Full Access Link

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Code Snippets

721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
"""
echo $workflow.manifest.version > v_pipeline.txt
echo $workflow.nextflow.version > v_nextflow.txt
python --version > v_python.txt
R --version > v_r.txt
cutadapt --version > v_cutadapt.txt
fastqc --version > v_fastqc.txt
multiqc --version > v_multiqc.txt
STAR --version > v_star.txt
htseq-count . . --version > v_htseq.txt
samtools --version > v_samtools.txt
gffread --version > v_gffread.txt
salmon --version > v_salmon.txt
scrape_software_versions.py &> software_versions_mqc.yaml
"""
773
774
775
'''
python !{workflow.projectDir}/bin/check_replicates.py -s !{sample_name} 2>&1
'''
NextFlow From line 773 of master/main.nf
804
805
806
'''
cp -n !{f_ext} !{base_name_file}.fasta
'''
NextFlow From line 804 of master/main.nf
810
811
812
813
	    '''
      gunzip -f -S .zip !{f_ext}
	    cp -n !{old_base_name_file} !{base_name_file}.fasta
	    '''
NextFlow From line 810 of master/main.nf
817
818
819
820
'''
gunzip -f !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.fasta
'''
NextFlow From line 817 of master/main.nf
822
823
824
'''
echo "Your pathogen genome files appear to have the wrong extension. \n Currently, the pipeline only supports .fasta or .fa, or compressed files with .zip or .gz extensions."
'''
NextFlow From line 822 of master/main.nf
852
853
854
'''
cp -n !{f_ext} !{base_name_file}.fasta
'''
NextFlow From line 852 of master/main.nf
858
859
860
861
'''
gunzip -f -S .zip !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.fasta
'''
NextFlow From line 858 of master/main.nf
865
866
867
868
'''
gunzip -f !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.fasta
'''
NextFlow From line 865 of master/main.nf
870
871
872
'''
echo "Your host genome files appear to have the wrong extension. \n Currently, the pipeline only supports .fasta or .fa, or compressed files with .zip or .gz extensions."
'''
NextFlow From line 870 of master/main.nf
901
902
903
'''
cp -n !{f_ext} !{base_name_file}.gff3
'''
NextFlow From line 901 of master/main.nf
905
906
907
908
'''
gunzip -f -S .zip !{f_ext}
cp -n !{base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 905 of master/main.nf
913
914
915
916
'''
gunzip -f !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 913 of master/main.nf
918
919
920
'''
echo "Your pathogen GFF file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions."
'''
NextFlow From line 918 of master/main.nf
953
954
955
'''
cp -n !{f_ext} !{base_name_file}.gff3
'''
NextFlow From line 953 of master/main.nf
957
958
959
960
'''
gunzip -f -S .zip !{f_ext}
cp -n !{base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 957 of master/main.nf
965
966
967
968
'''
gunzip -f !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 965 of master/main.nf
970
971
972
'''
echo "Your host GFF file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions."
'''
NextFlow From line 970 of master/main.nf
1005
1006
1007
'''
cp -n !{f_ext} !{base_name_file}.gff3
'''
NextFlow From line 1005 of master/main.nf
1009
1010
1011
1012
'''
gunzip -f -S .zip !{f_ext}
cp -n !{base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 1009 of master/main.nf
1017
1018
1019
1020
'''
gunzip -f !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 1017 of master/main.nf
1022
1023
1024
'''
echo "Your host GFF file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions."
'''
NextFlow From line 1022 of master/main.nf
1051
1052
1053
'''
cp -n !{f_ext} !{base_name_file}.gff3
'''
NextFlow From line 1051 of master/main.nf
1055
1056
1057
1058
'''
gunzip -f -S .zip !{f_ext}
cp -n !{base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 1055 of master/main.nf
1063
1064
1065
1066
'''
gunzip -f !{f_ext}
cp -n !{old_base_name_file} !{base_name_file}.gff3
'''
NextFlow From line 1063 of master/main.nf
1068
1069
1070
'''
echo "Your host GFF tRNA file appears to be in the wrong format or has the wrong extension. \n Currently, the pipeline only supports .gff or .gff3, or compressed files with .zip or .gz extensions."
'''
NextFlow From line 1068 of master/main.nf
1100
1101
1102
"""
cat $pathogen_fa $host_fa > host_pathogen.fasta
"""
NextFlow From line 1100 of master/main.nf
1130
1131
1132
"""
cat $host_gff_genome $host_gff_tRNA > ${outfile_name}
"""
NextFlow From line 1130 of master/main.nf
1158
1159
1160
"""
$workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features
"""
NextFlow From line 1158 of master/main.nf
1184
1185
1186
"""
$workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features
"""
NextFlow From line 1184 of master/main.nf
1212
1213
1214
"""
$workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} $host_attribute $pathogen_attribute
"""
NextFlow From line 1212 of master/main.nf
1237
1238
1239
"""
cat $pathogen_gff_genome $host_gff > host_pathogen_htseq.gff
"""
NextFlow From line 1237 of master/main.nf
1266
1267
1268
"""
python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f $features -a $pathogen_attribute -org pathogen -q_tool htseq -o ${outfile_name}
"""
1295
1296
1297
"""
python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f $features -a $host_attribute -org host -q_tool htseq -o ${outfile_name}
"""
1327
1328
1329
"""
$workflow.projectDir/bin/extract_reference_names_from_fasta_files.sh reference_host_names.txt $host_fa
"""
NextFlow From line 1327 of master/main.nf
1354
1355
1356
"""
$workflow.projectDir/bin/extract_reference_names_from_fasta_files.sh reference_pathogen_names.txt $pathogen_fa
"""
NextFlow From line 1354 of master/main.nf
1386
1387
1388
"""
$workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} parent Parent
"""
NextFlow From line 1386 of master/main.nf
1416
1417
1418
"""
$workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} parent $host_attribute
"""
NextFlow From line 1416 of master/main.nf
1441
1442
1443
"""
cat $host_gff_genome $host_gff_tRNA > ${outfile_name}
"""
NextFlow From line 1441 of master/main.nf
1471
1472
1473
"""
$workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features
"""
NextFlow From line 1471 of master/main.nf
1499
1500
1501
"""
$workflow.projectDir/bin/replace_attribute_gff.sh $gff ${outfile_name} parent $pathogen_attribute
"""
NextFlow From line 1499 of master/main.nf
1528
1529
1530
"""
python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f $features -a parent -org pathogen -q_tool salmon -o ${outfile_name}
"""
1562
1563
1564
"""
python $workflow.projectDir/bin/extract_annotations_from_gff.py -gff $gff -f quant -a parent -org host -q_tool salmon -o ${outfile_name}
"""
1597
1598
1599
"""
gffread -w $outfile_name -g $host_fa $gff
"""
1627
1628
1629
"""
python $workflow.projectDir/bin/gff_to_fasta_transcriptome.py -fasta $host_fa -gff $gff  -f $features -a $attribute -o $outfile_name
"""
NextFlow From line 1627 of master/main.nf
1656
1657
1658
"""
cat $host_tr_fa $host_tRNA_tr_fa > host_transcriptome.fasta
"""
NextFlow From line 1656 of master/main.nf
1702
1703
1704
"""
python $workflow.projectDir/bin/gff_to_fasta_transcriptome.py -fasta $pathogen_fa -gff $gff -f $features -a $attribute  -o $outfile_name
"""
NextFlow From line 1702 of master/main.nf
1730
1731
1732
"""
cat $pathogen_tr_fa $host_tr_fa > host_pathogen_transcriptome.fasta
"""
NextFlow From line 1730 of master/main.nf
1760
1761
1762
"""
$workflow.projectDir/bin/replace_feature_gff.sh $gff ${outfile_name} $features
"""
NextFlow From line 1760 of master/main.nf
1786
1787
1788
"""
cat $pathogen_gff_genome $host_gff > host_pathogen_star_alignment_mode.gff
"""
NextFlow From line 1786 of master/main.nf
1814
1815
1816
"""
fastqc --quiet --threads $task.cpus --noextract $reads $fastqc_params
"""
1859
1860
1861
"""
cutadapt -j ${task.cpus} -q $q_value -a $adapter_seq_3 -m 1 -o ${name_out} $reads $cutadapt_params
"""
1872
1873
1874
"""
cutadapt -j ${task.cpus} -q $q_value -a ${adapter_seq_3[0]} -A ${adapter_seq_3[1]} -o ${name_1} -p ${name_2} -m 1 ${reads[0]} ${reads[1]} $cutadapt_params
"""
1915
1916
1917
"""
bbduk.sh -Xmx1g in=$reads out=${name_out} ref=$adapters minlen=$minlen qtrim=$qtrim trimq=$trimq ktrim=$ktrim k=$k mink=$mink hdist=$hdist &> $fileoutput $bbduk_params
"""
1929
1930
1931
"""
bbduk.sh -Xmx1g in1=${reads[0]} in2=${reads[1]} out1=${name_1} out2=${name_2} ref=$adapters minlen=$minlen qtrim=$qtrim trimq=$trimq ktrim=$ktrim k=$k mink=$mink hdist=$hdist $bbduk_params tpe tbo &> $fileoutput
"""
1967
1968
1969
"""
fastqc --threads ${task.cpus} --quiet --noextract $reads $fastqc_params
"""
1999
2000
2001
"""
$workflow.projectDir/bin/count_total_reads.sh $fastq >> total_raw_reads_fastq.tsv
"""
NextFlow From line 1999 of master/main.nf
2025
2026
2027
"""
$workflow.projectDir/bin/collect_total_raw_read_pairs.py -i $tsv
"""
NextFlow From line 2025 of master/main.nf
2069
2070
2071
2072
2073
'''
grep ">" !{host_fa} | cut -d " " -f 1 > decoys.txt
sed -i -e 's/>//g' decoys.txt
cat !{host_pathogen_transcriptome_fasta} !{host_fa} > gentrome.fasta
'''
NextFlow From line 2069 of master/main.nf
2098
2099
2100
"""
salmon index -t $gentrome -i transcripts_index --decoys $decoys -k $kmer_length -p ${task.cpus} $keepDuplicates $salmon_sa_params_index
"""
2136
2137
2138
"""
salmon quant -p ${task.cpus} -i $index -l $libtype -r $reads $softclip --incompatPrior $incompatPrior $UnmappedNames --validateMappings $dumpEq $writeMappings -o $sample_name $salmon_sa_params_mapping
"""
2142
2143
2144
		"""
		salmon quant -p ${task.cpus} -i $index -l $libtype -1 ${reads[0]} -2 ${reads[1]} $softclip --incompatPrior $incompatPrior $UnmappedNames --validateMappings $dumpEq $writeMappings -o $sample_name $salmon_sa_params_mapping
 		"""
2171
2172
2173
"""
$workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host  salmon/*/quant.sf "quant.sf"
"""
NextFlow From line 2171 of master/main.nf
2203
2204
2205
"""
$workflow.projectDir/bin/salmon_extract_ambig_uniq_transcripts_genes.R salmon/*/quant.sf salmon/*/aux_info/ambig_info.tsv $sample_name $annotations
"""
NextFlow From line 2203 of master/main.nf
2225
2226
2227
"""
$workflow.projectDir/bin/salmon_host_comb_ambig_uniq.R salmon/*/aux_info/*_host_quant_ambig_uniq.sf
"""
NextFlow From line 2225 of master/main.nf
2247
2248
2249
"""
$workflow.projectDir/bin/salmon_pathogen_comb_ambig_uniq.R salmon/*/aux_info/*_pathogen_quant_ambig_uniq.sf
"""
NextFlow From line 2247 of master/main.nf
2274
2275
2276
"""
python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a $gene_attribute -org both
"""
2310
2311
2312
2313
2314
"""
$workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host $quant_table "quant_salmon.tsv"
pathonen_tab=\$(if [ \$(cat pathogen_quant_salmon.tsv | wc -l) -gt 1  ]; then echo "true"; else echo "false"; fi)
host_tab=\$(if [ \$(cat host_quant_salmon.tsv | wc -l) -gt 1  ]; then echo "true"; else echo "false"; fi)
"""
NextFlow From line 2310 of master/main.nf
2338
2339
2340
"""
$workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org pathogen
"""
NextFlow From line 2338 of master/main.nf
2364
2365
2366
"""
$workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org host
"""
NextFlow From line 2364 of master/main.nf
2390
2391
2392
"""
$workflow.projectDir/bin/tximport.R salmon $annotations $sample_name
"""
2414
2415
2416
"""
python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a gene_id -org host_gene_level
"""
2439
2440
2441
"""
$workflow.projectDir/bin/combine_annotations_salmon_gene_level.py -q $quantification_table -annotations $annotation_table -a gene_id -org host
"""
NextFlow From line 2439 of master/main.nf
2473
2474
2475
"""
python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org pathogen 
"""
NextFlow From line 2473 of master/main.nf
2504
2505
2506
"""
python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org host
"""
NextFlow From line 2504 of master/main.nf
2528
2529
2530
"""
$workflow.projectDir/bin/extract_processed_reads.sh salmon/*/aux_info/meta_info.json $sample_name salmon
"""
2552
2553
2554
"""
cat $process_reads > processed_reads_salmon.tsv
"""
NextFlow From line 2552 of master/main.nf
2580
2581
2582
"""
python $workflow.projectDir/bin/mapping_stats.py -q_p $quant_table_pathogen -q_h $quant_table_host -total_processed $total_processed_reads -total_raw $total_raw_reads -a $attribute -t salmon -o salmon_host_pathogen_total_reads.tsv
"""
2604
2605
2606
"""
python $workflow.projectDir/bin/plot_mapping_statistics_salmon.py -i $stats
"""
NextFlow From line 2604 of master/main.nf
2635
2636
2637
'''
python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -q_tool salmon -org pathogen 2>&1
'''
2666
2667
2668
'''
python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -rna !{rna_classes_to_replace} -q_tool salmon -org host 2>&1
'''
2694
2695
2696
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table
"""
NextFlow From line 2694 of master/main.nf
2722
2723
2724
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org pathogen
"""
NextFlow From line 2722 of master/main.nf
2749
2750
2751
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table
"""
NextFlow From line 2749 of master/main.nf
2776
2777
2778
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org host
"""
NextFlow From line 2776 of master/main.nf
2815
2816
2817
2818
"""
mkdir index
STAR --runThreadN ${task.cpus} --runMode genomeGenerate --genomeDir index/ --genomeFastaFiles $fasta --sjdbGTFfile $gff --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang $sjdbOverhang $star_salmon_index_params
"""
2863
2864
2865
2866
"""
mkdir $sample_name
STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn $reads --outSAMtype BAM Unsorted --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon quant --sjdbGTFtagExonParentTranscript parent --quantMode TranscriptomeSAM --quantTranscriptomeBan $quantTranscriptomeBan --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_salmon_alignment_params
"""
2868
2869
2870
2871
"""
mkdir $sample_name
STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn ${reads[0]} ${reads[1]} --outSAMtype BAM Unsorted --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon quant --sjdbGTFtagExonParentTranscript parent --quantMode TranscriptomeSAM --quantTranscriptomeBan $quantTranscriptomeBan --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_salmon_alignment_params
"""
2902
2903
2904
"""
salmon quant -p ${task.cpus} -t $transcriptome -l $libtype -a $bam_file --incompatPrior $incompatPrior -o $sample_name $salmon_alignment_based_params
"""
2930
2931
2932
"""
$workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host salmon/*/quant.sf "quant.sf"
"""
NextFlow From line 2930 of master/main.nf
2963
2964
2965
"""
$workflow.projectDir/bin/salmon_extract_ambig_uniq_transcripts_genes.R salmon/*/quant.sf salmon/*/aux_info/ambig_info.tsv $sample_name $annotations
"""
NextFlow From line 2963 of master/main.nf
2985
2986
2987
"""
$workflow.projectDir/bin/salmon_host_comb_ambig_uniq.R salmon/*/aux_info/*_host_quant_ambig_uniq.sf
"""
NextFlow From line 2985 of master/main.nf
3007
3008
3009
"""
$workflow.projectDir/bin/salmon_pathogen_comb_ambig_uniq.R salmon/*/aux_info/*_pathogen_quant_ambig_uniq.sf
"""
NextFlow From line 3007 of master/main.nf
3033
3034
3035
"""
$workflow.projectDir/bin/tximport.R salmon $annotations $sample_name
"""
3057
3058
3059
"""
python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a gene_id -org host_gene_level
"""
3082
3083
3084
"""
python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q salmon -a $gene_attribute -org both
"""
3117
3118
3119
3120
3121
"""
$workflow.projectDir/bin/split_quant_tables_salmon.sh $transcriptome_pathogen $transcriptome_host $quant_table "quant_salmon.tsv"
pathonen_tab=\$(if [ \$(cat pathogen_quant_salmon.tsv | wc -l) -gt 1  ]; then echo "true"; else echo "false"; fi)
host_tab=\$(if [ \$(cat host_quant_salmon.tsv | wc -l) -gt 1  ]; then echo "true"; else echo "false"; fi)
"""
NextFlow From line 3117 of master/main.nf
3145
3146
3147
"""
$workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org pathogen
"""
NextFlow From line 3145 of master/main.nf
3170
3171
3172
"""
$workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org host
"""
NextFlow From line 3170 of master/main.nf
3194
3195
3196
"""
$workflow.projectDir/bin/combine_annotations_salmon_gene_level.py -q $quantification_table -annotations $annotation_table -a gene_id -org host
"""
NextFlow From line 3194 of master/main.nf
3220
3221
3222
"""
$workflow.projectDir/bin/extract_processed_reads.sh $Log_final_out $sample_name star
"""
NextFlow From line 3220 of master/main.nf
3243
3244
3245
"""
cat $process_reads > processed_reads_star.tsv
"""
NextFlow From line 3243 of master/main.nf
3274
3275
3276
"""
python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org pathogen
"""
NextFlow From line 3274 of master/main.nf
3305
3306
3307
"""
python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org host 
"""
NextFlow From line 3305 of master/main.nf
3329
3330
3331
"""
$workflow.projectDir/bin/extract_processed_reads.sh salmon_alignment_mode/*/aux_info/meta_info.json $sample_name salmon_alignment
"""
NextFlow From line 3329 of master/main.nf
3352
3353
3354
"""
cat $process_reads > processed_reads_salmon_alignment.tsv
"""
NextFlow From line 3352 of master/main.nf
3381
3382
3383
"""
python $workflow.projectDir/bin/mapping_stats.py -q_p $quant_table_pathogen -q_h $quant_table_host -total_processed $total_processed_reads -total_raw $total_raw_reads -a $attribute --star_processed $total_processed_reads_star -t salmon_alignment -o salmon_alignment_host_pathogen_total_reads.tsv
"""
NextFlow From line 3381 of master/main.nf
3407
3408
3409
"""
python $workflow.projectDir/bin/plot_mapping_statistics_salmon_alignment.py -i $stats
"""
NextFlow From line 3407 of master/main.nf
3436
3437
3438
'''
python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -q_tool salmon -org pathogen 2>&1
'''
3466
3467
3468
'''
python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -rna !{rna_classes_to_replace} -q_tool salmon -org host 2>&1
'''
3494
3495
3496
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table
"""
NextFlow From line 3494 of master/main.nf
3521
3522
3523
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table
"""
NextFlow From line 3521 of master/main.nf
3549
3550
3551
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org pathogen
"""
NextFlow From line 3549 of master/main.nf
3576
3577
3578
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org host
"""
NextFlow From line 3576 of master/main.nf
3617
3618
3619
3620
"""
mkdir index
STAR --runThreadN ${task.cpus} --runMode genomeGenerate --genomeDir index/ --genomeFastaFiles $fasta --sjdbGTFfile $gff --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang $sjdbOverhang $star_index_params
"""
3666
3667
3668
3669
"""
	mkdir $sample_name
	STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn $reads --outSAMtype BAM SortedByCoordinate --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outWigType $outWigType --outWigStrand $outWigStrand --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_alignment_params
"""
3671
3672
3673
3674
"""
mkdir $sample_name
STAR --runThreadN ${task.cpus} --genomeDir . --sjdbGTFfile $gff $readFilesCommand --readFilesIn ${reads[0]} ${reads[1]} --outSAMtype BAM SortedByCoordinate --outSAMunmapped $outSAMunmapped --outSAMattributes $outSAMattributes --outWigType $outWigType --outWigStrand $outWigStrand --outFileNamePrefix $sample_name/$sample_name --sjdbGTFfeatureExon exon --sjdbGTFtagExonParentTranscript Parent --outFilterMultimapNmax $outFilterMultimapNmax --outFilterType $outFilterType --limitBAMsortRAM $limitBAMsortRAM --alignSJoverhangMin $alignSJoverhangMin --alignSJDBoverhangMin $alignSJDBoverhangMin --outFilterMismatchNmax $outFilterMismatchNmax --outFilterMismatchNoverReadLmax $outFilterMismatchNoverReadLmax --alignIntronMin $alignIntronMin --alignIntronMax $alignIntronMax --alignMatesGapMax $alignMatesGapMax --winAnchorMultimapNmax $winAnchorMultimapNmax $star_alignment_params
"""
3708
3709
3710
"""
$workflow.projectDir/bin/remove_crossmapped_reads_BAM.sh $alignment $workflow.projectDir/bin $host_reference $pathogen_reference $cross_mapped_reads $bam_file_without_crossmapped
"""
NextFlow From line 3708 of master/main.nf
3712
3713
3714
"""
$workflow.projectDir/bin/remove_crossmapped_read_pairs_BAM.sh $alignment $workflow.projectDir/bin $host_reference $pathogen_reference $cross_mapped_reads $bam_file_without_crossmapped
"""
NextFlow From line 3712 of master/main.nf
3737
3738
3739
"""
$workflow.projectDir/bin/extract_processed_reads.sh $Log_final_out $sample_name star
"""
NextFlow From line 3737 of master/main.nf
3762
3763
3764
"""
cat $process_reads > processed_reads_star.tsv
"""
NextFlow From line 3762 of master/main.nf
3790
3791
3792
'''
!{workflow.projectDir}/bin/count_uniquely_mapped_reads.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name}
'''
NextFlow From line 3790 of master/main.nf
3794
3795
3796
'''
!{workflow.projectDir}/bin/count_uniquely_mapped_read_pairs.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name}
'''
NextFlow From line 3794 of master/main.nf
3818
3819
3820
"""
python $workflow.projectDir/bin/combine_tables.py -i $stats -o uniquely_mapped_reads_star.tsv -s uniquely_mapped_reads
"""
NextFlow From line 3818 of master/main.nf
3842
3843
3844
"""
$workflow.projectDir/bin/count_cross_mapped_reads.sh $cross_mapped_reads
"""
NextFlow From line 3842 of master/main.nf
3870
3871
3872
'''
!{workflow.projectDir}/bin/count_multi_mapped_reads.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name}
'''
NextFlow From line 3870 of master/main.nf
3874
3875
3876
'''
!{workflow.projectDir}/bin/count_multi_mapped_read_pairs.sh !{alignment} !{host_reference_names} !{pathogen_reference_names} !{sample_name} !{name}
'''
NextFlow From line 3874 of master/main.nf
3899
3900
3901
"""
python $workflow.projectDir/bin/combine_tables.py -i $stats -o multi_mapped_reads_star.tsv -s multi_mapped_reads
"""
NextFlow From line 3899 of master/main.nf
3928
3929
3930
"""
python $workflow.projectDir/bin/mapping_stats.py -total_raw $total_raw_reads -total_processed $total_processed_reads -m_u $uniquely_mapped_reads -m_m $multi_mapped_reads -c_m $cross_mapped_reads -t star -o star_mapping_stats.tsv
"""
NextFlow From line 3928 of master/main.nf
3952
3953
3954
"""
python $workflow.projectDir/bin/plot_mapping_stats_star.py -i $stats
"""
NextFlow From line 3952 of master/main.nf
3999
4000
4001
4002
"""
htseq-count -n $task.cpus -t quant -f bam -r pos $st $gff -i $host_attr -s $stranded --max-reads-in-buffer=$max_reads_in_buffer -a $minaqual $htseq_params > $name_file2
sed -i '1{h;s/.*/'"$sample_name"'/;G}' "$name_file2"
"""
4029
4030
4031
"""
python $workflow.projectDir/bin/collect_quantification_data.py -i $input_quantification -q htseq -a $host_attribute 
"""
4057
4058
4059
"""
$workflow.projectDir/bin/calculate_TPM_HTSeq.R $input_quantification $host_attribute $gff_pathogen $gff_host
"""
NextFlow From line 4057 of master/main.nf
4094
4095
4096
4097
4098
	    """
	    $workflow.projectDir/bin/split_quant_tables.sh $quant_table $host_annotations $pathogen_annotations quantification_uniquely_mapped_htseq.tsv
            pathonen_tab=\$(if [ \$(cat pathogen_quantification_uniquely_mapped_htseq.tsv | wc -l) -gt 1  ]; then echo "true"; else echo "false"; fi)
            host_tab=\$(if [ \$(cat host_quantification_uniquely_mapped_htseq.tsv | wc -l) -gt 1  ]; then echo "true"; else echo "false"; fi)
	    """
NextFlow From line 4094 of master/main.nf
4122
4123
4124
"""
$workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org pathogen
"""
NextFlow From line 4122 of master/main.nf
4148
4149
4150
"""
$workflow.projectDir/bin/combine_quant_annotations.py -q $quantification_table -annotations $annotation_table -a $attribute -org host
"""
NextFlow From line 4148 of master/main.nf
4181
4182
4183
"""
python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org pathogen
"""
NextFlow From line 4181 of master/main.nf
4213
4214
4215
"""
python $workflow.projectDir/bin/scatter_plots.py -q $quant_table -a $attribute -org host 
"""
NextFlow From line 4213 of master/main.nf
4241
4242
4243
"""
python $workflow.projectDir/bin/mapping_stats.py -q_p $quant_table_pathogen -q_h $quant_table_host -a $attribute  -star $star_stats -t htseq -o htseq_uniquely_mapped_reads_stats.tsv
"""
4266
4267
4268
"""
python $workflow.projectDir/bin/plot_mapping_stats_htseq.py -i $stats
"""
NextFlow From line 4266 of master/main.nf
4296
4297
4298
'''
python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -q_tool htseq -org pathogen 2>&1
'''
4327
4328
4329
'''
python !{workflow.projectDir}/bin/RNA_class_content.py -q !{quant_table} -a !{attribute} -annotations !{gene_annotations} -rna !{rna_classes_to_replace} -q_tool htseq -org host 2>&1
'''
4355
4356
4357
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table
"""
NextFlow From line 4355 of master/main.nf
4383
4384
4385
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_each.py -i $stats_table
"""
NextFlow From line 4383 of master/main.nf
4411
4412
4413
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org pathogen
"""
NextFlow From line 4411 of master/main.nf
4438
4439
4440
"""
python $workflow.projectDir/bin/plot_RNA_class_stats_combined.py -i $stats_table -org host
"""
NextFlow From line 4438 of master/main.nf
4477
4478
4479
"""
multiqc -d --export -f $rtitle $rfilename $custom_config_file . 
"""
4499
4500
4501
"""
markdown_to_html.py $output_docs -o results_description.html
"""
NextFlow From line 4499 of master/main.nf
ShowHide 144 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://nf-co.re/dualrnaseq
Name: dualrnaseq
Version: 1.0.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...