Pipeline running 'salmon alevin' for CITE-seq data

public 1yr ago 0 bookmarks

View Workflow

snakemake_alevin_citeseq_tutorial — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This is the template for a new Snakemake workflow. Replace this text with a comprehensive description covering the purpose and domain. Insert your code into the respective folders, i.e. scripts , rules , and envs . Define the entry point of the workflow in the Snakefile and the main configuration in the config.yaml file.

Authors

Kevin Rue-Albrecht (@kevinrue)

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

Create a new github repository using this workflow as a template .
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 1b: Set up input files

Index the reference sequences

mkdir /ifs/mirror/alevin
cd /ifs/mirror/alevin
wget -nv http://refgenomes.databio.org/v2/asset/hg38/salmon_partial_sa_index/archive?tag=default
mv archive?tag=default salmon_partial_sa_index__default.tgz
tar -xvzf salmon_partial_sa_index__default.tgz
grep "^>" salmon_partial_sa_index/gentrome.fa | cut -d " " -f 1,7 --output-delimiter=$'\t' - | sed 's/[>"gene_symbol:"]//g' > txp2gene.tsv

Index the antibody sequences

cd /ifs/mirror/alevin
wget --content-disposition -nv https://ftp.ncbi.nlm.nih.gov/geo/series/GSE128nnn/GSE128639/suppl/GSE128639_MNC_ADT_Barcodes.csv.gz
zcat GSE128639_MNC_ADT_Barcodes.csv.gz | awk -F "," '{print $1"\t"$4}' | tail -n +2 > adt.tsv
salmon index -t adt.tsv -i adt_index --features -k7

cd /ifs/mirror/alevin
wget --content-disposition -nv https://ftp.ncbi.nlm.nih.gov/geo/series/GSE128nnn/GSE128639/suppl/GSE128639_MNC_HTO_Barcodes.csv.gz
zcat GSE128639_MNC_HTO_Barcodes.csv.gz | awk -F "," '{print $1"\t"$4}' | sed 's/Hashtag /Hashtag_/g' | tail -n +2 > hto.tsv
salmon index -t hto.tsv -i hto_index --features -k7

Download the raw RNA & antibody sequencing data

mkdir data
cd data
# RNA experiment
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758323/MNC-A_R1.fastq.gz &&
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758323/MNC-A_R2.fastq.gz &&
# ADT experiment
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758325/MNC-A-ADT_R1.fastq.gz &&
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758325/MNC-A-ADT_R2.fastq.gz &&
# HTO experiment
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758327/MNC-A-HTO_R1.fastq.gz &&
wget --content-disposition -nv https://sra-pub-src-2.s3.amazonaws.com/SRR8758327/MNC-A-HTO_R2.fastq.gz

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup.

Step 3: Install Snakemake

Install Snakemake using conda :

conda create -c bioconda -c conda-forge -n snakemake snakemake

For installation details, see the instructions in the Snakemake documentation .

Step 4: Execute workflow

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

snakemake --use-conda --drmaa --jobs 100

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --use-singularity

in combination with any of the modes above. See the Snakemake documentation for further details.

Step 5: Investigate results

After successful execution, you can create a self-contained interactive HTML report with all results via:

snakemake --report report.html

This report can, e.g., be forwarded to your collaborators. An example (using some trivial test data) can be seen here .

Step 6: Commit changes

Whenever you change something, don't forget to commit the changes back to your github copy of the repository:

git commit -a
git push

Step 7: Obtain updates from upstream

Whenever you want to synchronize your workflow copy with new developments from upstream, do the following.

Once, register the upstream repository in your local copy: git remote add -f upstream git@github.com:snakemake-workflows/pipeline_alevin_citeseq2.git or git remote add -f upstream https://github.com/snakemake-workflows/pipeline_alevin_citeseq2.git if you do not have setup ssh keys.
Update the upstream version: git fetch upstream .
Create a diff with the current version: git diff HEAD upstream/master workflow > upstream-changes.diff .
Investigate the changes: vim upstream-changes.diff .
Apply the modified diff via: git apply upstream-changes.diff .
Carefully check whether you need to update the config files: git diff HEAD upstream/master config . If so, do it manually, and only where necessary, since you would otherwise likely overwrite your settings and samples.

Step 8: Contribute back

In case you have also changed or added steps, please consider contributing them back to the original repository:

Fork the original repo to a personal or lab account.
Clone the fork to your local system, to a different place than where you ran your analysis.
Copy the modified files from your analysis to the clone of your fork, e.g., cp -r workflow path/to/fork . Make sure to not accidentally copy config file contents or sample sheets. Instead, manually update the example config files if necessary.
Commit and push your changes to your fork.
Create a pull request against the original repository.

Testing

Test cases are in the subfolder .test . They are automatically executed via continuous integration with Github Actions .

Code Snippets

shell:
    """
    which salmon > logs/salmon_path &&
    cp /ifs/home/kevin/.config/snakemake/drmaa/config.yaml logs/drmaa_config.yaml &&
    salmon alevin -l ISR -i {params.index} \
    -1 {input.fastq1} -2 {input.fastq2} \
    -o {params.output_folder} -p {params.threads} --tgMap {params.tgmap} \
    --chromium --dumpFeatures \
    {params.extra_options} \
    2> {log.stderr}
    """

SnakeMake Salmon From line 35 of workflow/Snakefile

shell:
    """
    salmon alevin -l ISR -i {params.index} \
    -1 {input.fastq1} -2 {input.fastq2} \
    --end {params.end} --umiLength {params.umi_length} --barcodeLength {params.barcode_length} \
    -o {params.output_folder} -p {params.threads} --citeseq \
    --featureStart {params.feature_start} --featureLength {params.feature_length} \
    2> {log.stderr}
    """

SnakeMake Salmon From line 69 of workflow/Snakefile

shell:
    """
    salmon alevin -l ISR -i {params.index} \
    -1 {input.fastq1} -2 {input.fastq2} \
    -o {params.output_folder} -p {params.threads} --citeseq \
    --featureStart {params.feature_start} \
    --end {params.end} --umiLength {params.umi_length} --barcodeLength {params.barcode_length} \
    --featureLength {params.feature_length} \
    --naiveEqclass \
    2> {log.stderr}
    """