COVID-19 PubSeq Public Sequence Workflow

public 1yr ago Version: Version 1 0 bookmarks

View Workflow

covid-19-pubseq-pangenome-generate — View Workflow

COVID-19 PubSeq: Public Sequence uploader

This repository provides a sequence uploader for the COVID-19 Virtual Biohackathon's Public Sequence Resource project. There are two versions, one that runs on the command line and another that acts as web interface. You can use it to upload the genomes of SARS-CoV-2 samples to make them publicly and freely available to other researchers. For more information see the paper .

alt text

To get started, first install the uploader , and use the bh20-seq-uploader command to upload your data .

Installation

There are several ways to install the uploader. The most portable is with a virtualenv .

Installation with `virtualenv`

Prepare your system. You need to make sure you have Python, and the ability to install modules such as pycurl and pyopenssl . On Ubuntu 18.04, you can run:

sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev

Create and enter your virtualenv. Go to some memorable directory and make and enter a virtualenv:

virtualenv --python python3 venv
. venv/bin/activate

Note that you will need to repeat the . venv/bin/activate step from this directory to enter your virtualenv whenever you want to use the installed tool.

Install the tool. Once in your virtualenv, install this project:

Install from PyPi:

pip3 install bh20-seq-uploader

Install from git:

pip3 install git+https://github.com/arvados/bh20-seq-resource.git@master

Test the tool. Try running:

bh20-seq-uploader --help

It should print some instructions about how to use the uploader.

Make sure you are in your virtualenv whenever you run the tool! If you ever can't run the tool, and your prompt doesn't say (venv) , try going to the directory where you put the virtualenv and running . venv/bin/activate . It only works for the current terminal window; you will need to run it again if you open a new terminal.

Installation with `pip3 --user`

If you don't want to have to enter a virtualenv every time you use the uploader, you can use the --user feature of pip3 to install the tool for your user.

Prepare your system. Just as for the virtualenv method, you need to install some dependencies. On Ubuntu 18.04, you can run:

sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev

Install the tool. You can run:

pip3 install --user git+https://github.com/arvados/bh20-seq-resource.git@master

Make sure the tool is on your PATH . The pip3 command will install the uploader in .local/bin inside your home directory. Your shell may not know to look for commands there by default. To fix this for the terminal you currently have open, run:

export PATH=$PATH:$HOME/.local/bin

To make this change permanent, assuming your shell is Bash, run:

echo 'export PATH=$PATH:$HOME/.local/bin' >>~/.bashrc

Test the tool. Try running:

bh20-seq-uploader --help

It should print some instructions about how to use the uploader.

Installation from Source for Development

If you plan to contribute to the project, you may want to install an editable copy from source. With this method, changes to the source code are automatically reflected in the installed copy of the tool.

Prepare your system. On Ubuntu 18.04, you can run:

sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev

Clone and enter the repository. You can run:

git clone https://github.com/arvados/bh20-seq-resource.git
cd bh20-seq-resource

Create and enter a virtualenv. Go to some memorable directory and make and enter a virtualenv:

virtualenv --python python3 venv
. venv/bin/activate

Note that you will need to repeat the . venv/bin/activate step from this directory to enter your virtualenv whenever you want to use the installed tool.

Install the checked-out repository in editable mode. Once in your virtualenv, install with this special pip command:

pip3 install -e .

Test the tool. Try running:

bh20-seq-uploader --help

It should print some instructions about how to use the uploader.

Installation with GNU Guix

For running/developing the uploader with GNU Guix see INSTALL.md

Usage

Run the uploader with a FASTA or FASTQ file and accompanying metadata file in JSON or YAML:

bh20-seq-uploader example/metadata.yaml example/sequence.fasta

If the sample_id of your upload matches a sample already in PubSeq, it will be considered a new version and supercede the existing entry.

Workflow for Generating a Pangenome

All these uploaded sequences are being fed into a workflow to generate a pangenome for the virus. You can replicate this workflow yourself.

An example is to get your SARS-CoV-2 sequences from GenBank in seqs.fa , and then run a series of commands

minimap2 -cx asm20 -X seqs.fa seqs.fa >seqs.paf
seqwish -s seqs.fa -p seqs.paf -g seqs.gfa
odgi build -g seqs.gfa -s -o seqs.odgi
odgi viz -i seqs.odgi -o seqs.png -x 4000 -y 500 -R -P 5

Here we convert such a pipeline into the Common Workflow Language (CWL) and sources can be found here .

For more information on building pangenome models, see this wiki page .

Web Interface

This project comes with a simple web server that lets you use the sequence uploader from a browser. It will work as long as you install the packager with the web extra.

To run it locally:

virtualenv --python python3 venv
. venv/bin/activate
pip install -e ".[web]"
env FLASK_APP=bh20simplewebuploader/main.py flask run

Then visit http://127.0.0.1:5000/ .

Production

For production deployment, you can use gunicorn :

pip3 install gunicorn
gunicorn bh20simplewebuploader.main:app

This runs on http://127.0.0.1:8000/ by default, but can be adjusted with various gunicorn options .

Code Snippets

baseCommand: abpoa
stdout: $(inputs.readsFA.nameroot).O0.gfa
arguments: [
    $(inputs.readsFA),
    -r 3,
    -O, '0'
]

CWL abPOA From line 20 of pangenome-generate/abpoa.cwl

baseCommand: python3
inputs:
  script:
    type: File
    default:
      class: File
      location: collect-seqs.py
    inputBinding: {position: 1}
  src_project:
    type: string
    inputBinding: {position: 2}
  metadataSchema:
    type: File
    inputBinding: {position: 3}
  exclude:
    type: File?
    inputBinding: {position: 4}

CWL From line 17 of pangenome-generate/collect-seqs.cwl

baseCommand: python
inputs:
  script:
    type: File
    default:
      class: File
      location: dups2metadata.py
    inputBinding: {position: 1}
  metadata:
    type: File
    inputBinding: {position: 2}
  dups:
    type: File?
    inputBinding: {position: 3}

CWL From line 3 of pangenome-generate/dups2metadata.cwl

arguments: [python3, $(inputs.script), $(inputs.metadata), $(inputs.fasta), $(inputs.query)]

CWL From line 30 of pangenome-generate/from_sparql.cwl

entryname: "block"+b,
entry: JSON.stringify(block)

CWL From line 41 of pangenome-generate/merge-metadata.cwl

entryname: "subs"+b,
entry: JSON.stringify(sub)

CWL From line 45 of pangenome-generate/merge-metadata.cwl

baseCommand: python

CWL From line 52 of pangenome-generate/merge-metadata.cwl

arguments: [odgi, build, -g, $(inputs.inputGFA), -o, -,
            {shellQuote: false, valueFrom: "|"},
            odgi, sort, -i, -, -p, s, -o, $(inputs.inputGFA.nameroot).odgi]

CWL From line 24 of pangenome-generate/odgi-build.cwl

arguments:
  - "sh"
  - "-c"
  - >-
    odgi build -g '$(inputs.inputGFA.path)' -o - | odgi unchop -i - -o - |
    odgi sort -i - -p s -o $(inputs.inputGFA.nameroot).unchop.sorted.odgi

CWL ODGI From line 24 of pangenome-generate/odgi-build-from-xpoa-gfa.cwl

arguments:
  [odgi_to_rdf.py, $(inputs.odgi), "-",
   {valueFrom: "|", shellQuote: false},
   xz, --stdout]

CWL From line 18 of pangenome-generate/odgi_to_rdf.cwl

entryname: "block"+b,
entry: JSON.stringify(block)

CWL From line 38 of pangenome-generate/relabel-seqs.cwl

entryname: "subs"+b,
entry: JSON.stringify(sub)

CWL From line 42 of pangenome-generate/relabel-seqs.cwl

baseCommand: [python]

CWL From line 53 of pangenome-generate/relabel-seqs.cwl

baseCommand: seqwish
arguments: [-t, $(runtime.cores),
            -k, $(inputs.kmerSize),
            -s, $(inputs.readsFA),
            -p, $(inputs.readsPAF),
            -g, $(inputs.readsPAF.nameroot).gfa]

CWL seqwish From line 24 of pangenome-generate/seqwish.cwl

baseCommand: [python]

CWL From line 29 of pangenome-generate/sort_fasta_by_quality_and_len.cwl

baseCommand: spoa
stdout: $(inputs.readsFA.nameroot).g6.gfa
arguments: [
    $(inputs.readsFA),
    -G,
    -g, '-6'
]

CWL Spoa From line 20 of pangenome-generate/spoa.cwl

ShowHide 12 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/arvados/bh20-seq-resource

Name: covid-19-pubseq-pangenome-generate

Version: Version 1

Badge:

Insert copied code into your website to add a link to this workflow.

License: Boost Software License 1.0

Keywords:

JSON User metadata Analysis Plot seqwish abPOA ODGI Spoa COVID19 Risk Mitigation

Refs:

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free

COVID-19 PubSeq Public Sequence Workflow

COVID-19 PubSeq: Public Sequence uploader

Installation

Installation with virtualenv

Installation with pip3 --user

Installation from Source for Development

Installation with GNU Guix

Usage

Workflow for Generating a Pangenome

Web Interface

Production

Code Snippets

Comments

Support

Free

Related Workflows

public

public

public

public

public

public

Installation with `virtualenv`

Installation with `pip3 --user`