Alignment and antibody assembly pipelines for Croote et al. (Science, 2018)

public public 1yr ago 0 bookmarks

Alignment and assembly workflows associated with Croote et al. (2018). These workflows were developed to analyze single human B cells isolated from the peripheral blood of food-allergic individuals and assume single-cell RNA-sequencing were data generated using the Smart-seq2 chemistry (10X and drop-seq data will not assemble).

DAG

About

This repository contains two snakemake -based workflows for processing scRNA-seq data. The alignment (mapping) workflow uses STAR and htseq whereas the antibody heavy / light chain assembly workflow uses BASIC followed by the immcantation framework.

Configuration

Environment

  1. These workflows assume you have conda (a popular package management system ) in your PATH . If you do not have conda installed, I would recommending downloading miniconda and responding yes when prompted to prepend the install location to your PATH . To test the installation run: conda list .

  2. To launch the workflows, you need a python 3 conda environment with snakemake and pandas installed. This can be created by running the following:

    conda create -q -n snakemake_env python=3.5 snakemake-minimal=5.2.4 pandas
    

    The environment can then be activated by running: source activate snakemake_env .

  3. To avoid the challenges of installing and configuring IgBLAST with the necessary IMGT germline databases for V, (D), and J gene segment calling, this workflow uses the kleinstein/immcantation docker / singularity image . The image will automatically be pulled as part of the workflow, simply specify whether to use the docker or singularity image in the config.yaml file.

Samplesheet

The samplesheet is a two column, tab-delimited file. For each row, the base column specifies the path to the directory containing the folder samplename . It is assumed that each samplename is unique and that within each base/samplename directory there is one *R1*.fastq.gz file and one *R2*.fastq.gz file.

base samplename
/path/to/seq_run_1 cell_1
/path/to/seq_run_1 cell_2
... ...
/path/to/seq_run_2 cell_500

Config file

Edit the config.yaml file to point to your samplesheet and modify it as necessary for your sequencing data and computing environment.

Running

Once configured, source your conda environment and launch the workflows with the following:

snakemake --use-conda 

For cluster computing (e.g. on a SLURM cluster), the following can serve as a template:

snakemake --use-conda --cluster "sbatch --ntasks=1 \
 --job-name={params.name} --cpus-per-task={threads} --partition={params.partition} \
 --mem={resources.mem_mb} -o logs/{params.name}.%j.log" -j 10 -w 200 -k

If you would like to run only one of the workflows, edit the Snakefile all rule or add the desired output e.g. combined_assemblies.tsv to the end of the snakemake call.

Expected outputs

Assembly workflow

The final output is a tab-delimited file containing parsed assembly data from all cells. For example, here are some of the columns:

SEQUENCE_ID FUNCTIONAL V_CALL J_CALL CDR3_IMGT C_CALL C_LEN SAMPLENAME
cell_id=transcripts;heavy_chain;BASIC T IGHV3-23 01,IGHV3-23D 01 IGHJ4*02 GCGAAAGATGAGTGGAAACCACCTCGCCGCGTTGACTAC IGHA1*01 1059 PA16P1B11
cell_id=transcripts;light_chain;BASIC T IGKV1-9*01 IGKJ1*01 CAACAGCTTAATTTTTATCCGTGGACG IGKC*01 321 PA16P1B11

Alignment workflow

The final output is a counts table where each row is an Ensembl gene and each column is a cell.

Testing

Travis CI is used for automated continuous integration testing of the assembly workflow with scRNA-seq reads downsampled from two human plasmablasts. Unfortunately, the large memory requirements of STAR prevent similar testing of the alignment workflow. While not originally developed for species other than human, these workflows will also work for mouse by changing species in config.yaml .

If you would like to test the assembly workflow, edit .test/config.yaml according to your singularity / docker environment and then run the following from the current directory:

snakemake --use-conda --directory .test combined_assemblies.tsv

Code Snippets

22
23
24
25
26
27
28
29
30
31
shell:
    "fa=resources/star/{params.fasta_name}.fa.gz && "
    "gtf=resources/star/{params.gtf_name}.gtf.gz && "
    "wget {params.fasta_url} -O $fa && "
    "wget {params.gtf_url} -O $gtf && "
    "zcat $fa > {output.fasta} && "
    "zcat $gtf > {output.gtf} && "
    "if [ {params.include_ERCCs} = true ]; then "
    "cat resources/star/ERCC92.fa >> {output.fasta} && "
    "cat resources/star/ERCC92.gtf >> {output.gtf}; fi"
57
58
59
60
61
62
63
64
65
66
67
shell:
    "genomedir=$(dirname {output}) && "
    "mkdir -p $genomedir && "
    "cd $(dirname $genomedir) && " 
    "STAR --runMode genomeGenerate "
    "--genomeDir $genomedir "
    "--genomeFastaFiles {input.fasta} "
    "--sjdbGTFfile {input.gtf} "
    "--sjdbOverhang {params.star_read_len} "
    "--sjdbFileChrStartEnd {input.sjadditional} "
    "--genomeSAsparseD 2 --runThreadN {threads}"
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
shell:  "wdir=$(dirname {output}) && "
        "echo $wdir && "
        "mkdir -p $wdir && "
        "STAR "
        "--genomeDir $(dirname {input[0]}) "
        "--readFilesIn {input[1]} {input[2]} "
        "--readFilesCommand gunzip -c "
        "--outSAMmapqUnique 60 "
        "--outFilterMismatchNmax 999 "
        "--outFilterMismatchNoverReadLmax 0.1 "
        "--twopassMode Basic "
        "--runThreadN {threads} "
        "--outSAMtype BAM Unsorted "
        "--outFileNamePrefix $wdir/"
133
134
135
shell: "mkdir -p $(dirname {output}) && "
       "htseq-count -s no -r name -f bam -m intersection-nonempty "
       "{input.bam} {input.gtf} > {output}"
156
157
shell:
    "python {params.scripts_dir}/combine_counts.py {config[samplesheet]} {output}"
SnakeMake From line 156 of rules/align.smk
22
23
24
25
26
27
28
shell:  "outdir=$(dirname {output}) && "
        "cd $(dirname $outdir) && "
        "BASIC.py -b $(which bowtie2) "
        "-g {config[species]} -PE_1 $(basename {input[0]}) "
        "-PE_2 $(basename {input[1]}) "
        "-p {threads} -n transcripts -i {config[receptor]} "
        "-t {params.scratch} -a -o basic"
52
53
54
55
shell:  'blastn '
        '-db {params.const_db_dir}/imgt_{config[species]}_{params.ig_receptor}_combined.fasta '
        '-query {input} -outfmt "6 qseqid sacc length pident mismatch gaps qstart qend score evalue" '
        '-out {output} -task megablast -word_size 20 -num_threads {threads} -max_target_seqs 3'
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
run:
    if os.stat(str(input.assembly)).st_size == 0:
        print('Input assembly fasta is empty. Touching empty outputs')
        for out in output:
            with open(str(out), 'w') as outfile:
                pass
    else:
        if params.container_type == 'docker':
            # docker needs to bind an absolute path
            abs_dir_path = os.path.dirname(os.path.abspath(str(input.assembly)))
            assembly_name = os.path.basename(str(input.assembly))
            # run docker with current user:group (avoids permissions issues)
            shell(  'docker run -v {abs_dir_path}:/data:z -u `stat -c "%u:%g" $PWD` '
                    'kleinstein/immcantation:2.6.0 changeo-igblast '
                    '-o /data -s /data/{assembly_name} -n igblast '
                    '-p 1 -g {config[species]} -t ig')
        else:
            # singularity
            shell(  "{params.singularity_pre_cmd} "
                    "singularity exec -B $(dirname {output[0]}):/data "
                    "{input.img} changeo-igblast "
                    "-o /data -s {input.assembly} -n igblast "
                    "-p 1 -g {config[species]} -t ig")
119
120
121
shell:
    "python {params.scripts_dir}/merge_changeo_constant.py "
    "{input.changeo} {input.constant} {output}"
142
143
shell:
    "python {params.scripts_dir}/combine_assemblies.py {config[samplesheet]} {output}"
17
18
19
shell:
    "docker pull kleinstein/immcantation:2.6.0 && "
    "echo $(date) > {output}"
35
36
37
38
39
40
shell:
    "{params.singularity_pre_cmd} "
    "cd $(dirname {output}) && "
    "img=$(basename {output}) && "
    "singularity pull --name $img "
    "docker://kleinstein/immcantation:2.6.0"
ShowHide 8 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/dcroote/singlecell-ige
Name: singlecell-ige
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...