Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain.
Insert your code into the respective folders, i.e.
scripts
,
rules
, and
envs
. Define the entry point of the workflow in the
Snakefile
and the main configuration in the
config.yaml
file.
Authors
- Kevin Rue-Albrecht (@kevinrue)
Usage
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new github repository using this workflow as a template .
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 1b: Set up input files
Index the reference sequences
mkdir /ifs/mirror/alevin
cd /ifs/mirror/alevin
wget -nv http://refgenomes.databio.org/v2/asset/hg38/salmon_partial_sa_index/archive?tag=default
mv archive?tag=default salmon_partial_sa_index__default.tgz
tar -xvzf salmon_partial_sa_index__default.tgz
grep "^>" salmon_partial_sa_index/gentrome.fa | cut -d " " -f 1,7 --output-delimiter=$'\t' - | sed 's/[>"gene_symbol:"]//g' > txp2gene.tsv
Index the antibody sequences
cd /ifs/mirror/alevin
wget --content-disposition -nv https://ftp.ncbi.nlm.nih.gov/geo/series/GSE128nnn/GSE128639/suppl/GSE128639_MNC_ADT_Barcodes.csv.gz
zcat GSE128639_MNC_ADT_Barcodes.csv.gz | awk -F "," '{print $1"\t"$4}' | tail -n +2 > adt.tsv
salmon index -t adt.tsv -i adt_index --features -k7
cd /ifs/mirror/alevin
wget --content-disposition -nv https://ftp.ncbi.nlm.nih.gov/geo/series/GSE128nnn/GSE128639/suppl/GSE128639_MNC_HTO_Barcodes.csv.gz
zcat GSE128639_MNC_HTO_Barcodes.csv.gz | awk -F "," '{print $1"\t"$4}' | sed 's/Hashtag /Hashtag_/g' | tail -n +2 > hto.tsv
salmon index -t hto.tsv -i hto_index --features -k7
Download the raw RNA & antibody sequencing data
mkdir data
cd data
# RNA experiment
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758323/MNC-A_R1.fastq.gz &&
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758323/MNC-A_R2.fastq.gz &&
# ADT experiment
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758325/MNC-A-ADT_R1.fastq.gz &&
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758325/MNC-A-ADT_R2.fastq.gz &&
# HTO experiment
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758327/MNC-A-HTO_R1.fastq.gz &&
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758327/MNC-A-HTO_R2.fastq.gz
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the
config/
folder. Adjust
config.yaml
to configure the workflow execution, and
samples.tsv
to specify your sample setup.
Step 3: Install Snakemake
Install Snakemake using conda :
conda create -c bioconda -c conda-forge -n snakemake snakemake
For installation details, see the instructions in the Snakemake documentation .
Step 4: Execute workflow
Activate the conda environment:
conda activate snakemake
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100
or
snakemake --use-conda --drmaa --jobs 100
If you not only want to fix the software stack but also the underlying OS, use
snakemake --use-conda --use-singularity
in combination with any of the modes above. See the Snakemake documentation for further details.
Step 5: Investigate results
After successful execution, you can create a self-contained interactive HTML report with all results via:
snakemake --report report.html
This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .
Step 6: Commit changes
Whenever you change something, don't forget to commit the changes back to your github copy of the repository:
git commit -a
git push
Step 7: Obtain updates from upstream
Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.
-
Once, register the upstream repository in your local copy:
git remote add -f upstream git@github.com:snakemake-workflows/pipeline_alevin_citeseq2.git
orgit remote add -f upstream https://github.com/snakemake-workflows/pipeline_alevin_citeseq2.git
if you do not have setup ssh keys. -
Update the upstream version:
git fetch upstream
. -
Create a diff with the current version:
git diff HEAD upstream/master workflow > upstream-changes.diff
. -
Investigate the changes:
vim upstream-changes.diff
. -
Apply the modified diff via:
git apply upstream-changes.diff
. -
Carefully check whether you need to update the config files:
git diff HEAD upstream/master config
. If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.
Step 8: Contribute back
In case you have also changed or added steps, please consider contributing them back to the original repository:
-
Fork the original repo to a personal or lab account.
-
Clone the fork to your local system, to a different place than where you ran your analysis.
-
Copy the modified files from your analysis to the clone of your fork, e.g.,
cp -r workflow path/to/fork
. Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary. -
Commit and push your changes to your fork.
-
Create a pull request against the original repository.
Testing
Test cases are in the subfolder
.test
. They are automatically executed via continuous integration with
Github Actions
.
Code Snippets
35 36 37 38 39 40 41 42 43 44 45 | shell: """ which salmon > logs/salmon_path && cp /ifs/home/kevin/.config/snakemake/drmaa/config.yaml logs/drmaa_config.yaml && salmon alevin -l ISR -i {params.index} \ -1 {input.fastq1} -2 {input.fastq2} \ -o {params.output_folder} -p {params.threads} --tgMap {params.tgmap} \ --chromium --dumpFeatures \ {params.extra_options} \ 2> {log.stderr} """ |
69 70 71 72 73 74 75 76 77 | shell: """ salmon alevin -l ISR -i {params.index} \ -1 {input.fastq1} -2 {input.fastq2} \ --end {params.end} --umiLength {params.umi_length} --barcodeLength {params.barcode_length} \ -o {params.output_folder} -p {params.threads} --citeseq \ --featureStart {params.feature_start} --featureLength {params.feature_length} \ 2> {log.stderr} """ |
101 102 103 104 105 106 107 108 109 110 111 | shell: """ salmon alevin -l ISR -i {params.index} \ -1 {input.fastq1} -2 {input.fastq2} \ -o {params.output_folder} -p {params.threads} --citeseq \ --featureStart {params.feature_start} \ --end {params.end} --umiLength {params.umi_length} --barcodeLength {params.barcode_length} \ --featureLength {params.feature_length} \ --naiveEqclass \ 2> {log.stderr} """ |
Support
- Future updates
Related Workflows





