CADD-SV – a framework to score the effect of structural variants

public 1yr ago Version: v1.1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

CADD-SV

CADD-SV – a framework to score the effect of structural variants

Here, we describe CADD-SV, a method to retrieve a wide set of annotations in the range and vicinity of a SV. Our tool computes summary statistics and uses a trained machine learning model to differentiate deleterious from neutral variants. In training, we use human and chimpanzee derived alleles as proxy-neutral and contrast them with matched simulated variants as proxy-pathogenic. This approach has proven powerful in the interpretation of SNVs (CADD, https://cadd.gs.washington.edu). We show that CADD-SV scores correlate with known pathogenic variants in individual genomes and allelic diversity.

Pre-requirements

Conda

The pipeline depends on Snakemake , a workflow management system that wraps up all scripts and runs them highly automated, in various environments (workstations, clusters, grid, or cloud). Further, we use Conda as software/dependency management tool. Conda can install snakemake and all neccessary software with its dependencies automatically. Conda installation guidelines can be found here:

https://conda.io/projects/conda/en/latest/user-guide/install/index.html

Snakemake

After installing Conda, you install Snakemake using Conda and the environment.yaml provided in this repository. For this purpose, please clone or download and uncompress the repository first. Then change into the root folder of the local repository.

git clone https://github.com/kircherlab/CADD-SV
cd CADD-SV

We will now initiate the Conda environment, which we will need for getting the Snakemake workflow invoked. Using this environment ( run.caddsv ) snakemake will be installed

conda env create -n run.caddsv --file environment.yaml

The second conda environment ( envs/SV.yml ), containing all packages and tools to run CADD-SV, will be installed automatically during the first run. This can take some time.

Annotations

CADD-SV depends on various annotations to provide the model with its necessary input features. CADD-SV automatically retrieves and transforms these annotations (see Snakefile) and combines them in bed-format at /desired-sv-set/matrix.bed

Annotations can be downloaded and expanded individually. However, to run CADD-SV with the pre-trained model and to minimize runtime and memory failures use the annotation sets as stored at https://kircherlab.bihealth.org/download/CADD-SV/

wget https://kircherlab.bihealth.org/download/CADD-SV/v1.0/dependencies.tar.gz
tar -xf dependencies.tar.gz

Config

Almost ready to go. After you prepared the files above, you may need to adjust the name of your dataset in the config.yml .

List of required input files

Models and scripts as cloned from this GIT repository
Annotations in the annotations/ folder
CADD-SV scores SV in a coordinate sorted BED format on the GRCh38 genome build. The type of SV needs to be included for each variant in the 4th column. We recommend to split files containing more than 10,000 SVs into smaller files. An example input file can be found in input/ . The file needs to have the suffix id_ . If you plan to process variants from another genome build or SVs in VCF format, see below.

Running the pipeline

Ready to go! If you run the pipeline on a cluster see the cluster.json for an estimate of minimum resource requirements for the individual jobs. Note that this depends on your dataset size so you may have to adjust this.

To start the pipeline:

conda activate run.caddsv
# dry run to see if everything works
snakemake --use-conda --configfile config.yml -j 4 -n
# run the pipeline
snakemake --use-conda --configfile config.yml -j 4

Output files

The pipeline outputs your SV set containing all annotations in BED format in a folder named output containing the CADD-SV and two raw scores in rows 6-8. Further information about individual annotations are kept in a subfolder named after your input dataset.

Further Information

Annotations

CADD-SV integrates different annotations, here some links to its annotation sources. A complete list can be found as Suppl. Table 1 of the manuscript/pre-print.

Integrated Scores

CADD (https://krishna.gs.washington.edu/download/CADD/bigWig/)
LINSIGHT (http://compgen.cshl.edu/LINSIGHT/LINSIGHT.bw)

Species conservation and constraint metrics

PhastCons (http://hgdownload.cse.ucsc.edu/goldenpath/hg38/)
Syntenic regions (http://webclu.bio.wzw.tum.de/cgi-bin/syntenymapper/get-species-list.py)
GERP score (http://mendel.stanford.edu/SidowLab/downloads/gerp/)

Population and disease constraint metrics

pLI score (ftp://ftp.broadinstitute.org/pub/ExAC_release/release1/manuscript_data/forweb_cleaned_exac_r03_march16_z_data_pLI.txt.gz)
Conserved coding regions (CCR) (https://www.nature.com/articles/s41588-018-0294-6?WT.feed_name=subjects_population-genetics)
DDD Happloinsufficiency (https://decipher.sanger.ac.uk/files/downloads/HI_Predictions_Version3.bed.gz)

Epigenetic and regulatory activity

Encode Features such as Histon Modifications and DNase and RNase-seq (https://www.encodeproject.org/help/batch-download/)
GC content (http://hgdownload.cse.ucsc.edu/gbdb/hg38/bbi/gc5BaseBw/gc5Base.bw)
ChromHMM states of ENCODE cell lines (http://compbio.mit.edu/ChromHMM/)

3D genome organization

CTCF (http://genome.cshlp.org/content/suppl/2012/08/28/22.9.1680.DC1/Table_S2_Location_of_ChIP-seq_binding_positions_in_19_cell_lines.txt)
Encode-HiC data (https://www.encodeproject.org/search/?type=Experiment&assay_term_name=HiC&replicates.library.biosample.donor.organism.scientific_name=Homo%20sapiens&status=released)
Enhancer-promoter-links from FOCS (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5930446/)
Frequently Interacting Regulatory Elements (FIREs) (https://www.sciencedirect.com/science/article/pii/S2211124716314814)
Directionality index for HiC data from various datasets (https://www.genomegitar.org/processed-data.html)
DeepC saliencies score (http://userweb.molbiol.ox.ac.uk/public/rschwess/container_for_ucsc/data/deepC/saliency_scores/saliencies_merged_gm12878_5kb.bw)

Gene and element annotations

Ensembl-gff3 genebuild 96 (ftp://ftp.ensembl.org/pub/release-96/gff3/homo_sapiens/Homo_sapiens.GRCh38.96.chr.gff3.gz)
Fantom5 enhancers (https://zenodo.org/record/556775#.Xkz3G0oo-70)

Converting VCF and other genome builds

If you want to score SVs in a VCF format or your SVs are not in GRCh38 genomebuild coordinates: We provide an environment to handle this. It uses the SURVIVOR tools (https://github.com/fritzsedlazeck/SURVIVOR).

conda env create -n prepBED --file envs/prepBED.yml

To convert your VCF into BED format run:

conda activate prepBED
SURVIVOR vcftobed input.vcf 0 -1 output.bed
cut -f1,2,6,11 output.bed > beds/set_id.bed

To lift hg19 coordinates to GRCh38 apply the following steps:

conda activate prepBED
liftOver beds/setname_hg19_id.bed dependencies/hg19ToHg38.over.chain.gz beds/setname_id.bed beds/setname_unlifted.bed

Code Snippets

shell:
    """
    cut -f1,2,3 {input} > {output}
    """

SnakeMake From line 69 of master/Snakefile

shell:
    """
    sed 's/^chr\\|%$//g' {input} > {output}
    """

SnakeMake From line 82 of master/Snakefile

shell:
    """
    bedtools merge -i {input.nochr} > {output.nochr}
    bedtools merge -i {input.wchr} > {output.wchr}

    """

SnakeMake BEDTools From line 97 of master/Snakefile

shell:
    """
    bedtools coverage -b {input.anno} -a {input.bed} > {output}
    """

SnakeMake BEDTools From line 114 of master/Snakefile

shell:
    """
    bedtools coverage -b {input.anno} -a {input.bed} > {output}
    """

SnakeMake BEDTools From line 128 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') | cat annotations/dummy4.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o mean; done < {input.bed}) > {output.t1}
    """

SnakeMake BEDTools From line 143 of master/Snakefile

shell:
    """
    tabix {input.anno} -R {input.merg} | awk '{{ if ($0 ~ "transcript_id") print $0; else print $0" transcript_id \\"\\";"; }}' | gtf2bed - | cut -f1,2,3,8,10  > {output}
    """

SnakeMake tabix GFFutils From line 159 of master/Snakefile

shell:
    """
    paste <( bedtools coverage -b <(grep "exon" {input.anno}) -a {input.bed} | cut -f 1,2,3,5 ) <(bedtools coverage -b <(grep "transcript" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "gene" {input.anno} ) -a {input.bed} | cut -f 5) <( bedtools coverage -b <( grep "start_codon" {input.anno} ) -a {input.bed} | cut -f 5 ) <( bedtools coverage -b <(grep "stop_codon" {input.anno}) -a {input.bed} | cut -f 5) <(bedtools coverage -b <(grep "three_prime_utr" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "five_prime_utr" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "CDS" {input.anno}) -a {input.bed} | cut -f 5 ) > {output}
    """

SnakeMake BEDTools From line 174 of master/Snakefile

shell:
    """
    grep "exon" {input.anno} | bedtools closest -d -t first -b stdin -a {input.bed} |cut -f 1,2,3,9 |paste - <(grep "gene" {input.anno} | bedtools closest -d -t first -b stdin -a {input.bed} | cut -f 9)|paste - <(grep "start_codon" {input.anno} |bedtools closest -d -t first -b stdin -a {input.bed} | cut -f 9 ) >> {output}   """

SnakeMake BEDTools From line 191 of master/Snakefile

shell:
    """
    bedtools map -b {input.anno} -a {input.bed} -c 4 -o distinct > {output}
    """

SnakeMake BEDTools From line 205 of master/Snakefile

shell:
    """
    Rscript --vanilla scripts/PLIextract.R {input.gn} {input.pli} {output}
    """

SnakeMake From line 220 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}')| cat annotations/dummy8.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4,4,5,5,6,6,7,7,8,8 -o max,sum,max,sum,max,sum,max,sum,max,sum; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 235 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | wc -l); done < {input.bed}) > {output}
    """

SnakeMake From line 249 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}')| cat annotations/dummy5_nochr.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o max; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 267 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | wc -l); done < {input.bed}) > {output}
    """

SnakeMake From line 281 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | awk '{{ total+=$4 }}END{{ printf("%f",total) }}'); done < {input.bed}) > {output}

    """

SnakeMake From line 296 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.ep} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 315 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4,4 -o max,min > {output}
    """

SnakeMake BEDTools From line 330 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 341 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 354 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 369 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 381 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 394 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4 -o max > {output}
    """

SnakeMake BEDTools From line 408 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} \
    -b {input.anno} -c 4,4 -o max,min > {output}
    """

SnakeMake BEDTools From line 423 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 435 of master/Snakefile

shell:
    """
    cut -f 4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65,69,70,74,75,79,80,84,85,89,90 {input} | awk '{{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print m}}' > {output.mingg}
    cut -f 4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65,69,70,74,75,79,80,84,85,89,90 {input} | awk '{{m=$1;for(i=1;i<=NF;i++)if($i>m)m=$i;print m}}' > {output.maxgg}"""

SnakeMake From line 445 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4 -o mean > {output}
    """

SnakeMake BEDTools From line 459 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') | cat annotations/dummy4_nochr.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o mean; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 473 of master/Snakefile

shell:
    "tabix {input.anno} -R {input.merg} | cat annotations/dummy5_nochr.bed - | bedtools map -a {input.bed} -b - -c 4,4 -o max,sum > {output}"

SnakeMake BEDTools tabix From line 489 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 498 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(cat annotations/dummy_chrhmm.bed; tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 -o max; done < {input.bed}) > {output}
    """

SnakeMake BEDTools tabix From line 511 of master/Snakefile

shell:
    """
    bedtools coverage -a {input.bed} -b {input.anno} -counts  > {output}
    """

SnakeMake BEDTools From line 528 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 5 -o max > {output}
    """

SnakeMake BEDTools From line 542 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | awk 'BEGIN{{ maxVal="." }}{{ if ((maxVal == ".") || ($4 > maxVal)) {{ maxVal=$4 }} }}END{{ print maxVal }}'); done < {input.bed}) > {output}
    """

SnakeMake From line 556 of master/Snakefile

shell:
    """
    paste <(cut -f1-11 {input.cadd}) <(cut -f4 {input.cadd2}) <(cut -f4 {input.ccr}) <(cut -f4-28 {input.chromHMM}) <(cut -f4,5,7 {input.ctcf}) <(cut -f1 {input.di_min}) <(cut -f1 {input.di_max}) <(cut -f4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65 {input.encode}) <(cut -f4-7 {input.ep}) <(cut -f4,5,9,10,14,15,19,20,24,25 {input.fire}) <(cut -f4 {input.gc}) <(cut -f4-11 {input.gm}) <(cut -f4 {input.gerp}) <(cut -f4 {input.gerp2}) <(cut -f4,5,6,7,11,12,13,14,18,19,20,21,25,26,27,28 {input.hic}) <(cut -f4-7 {input.hesc}) <(cut -f4-7 {input.microsyn}) <(cut -f4 {input.mpc}) <(cut -f4 {input.pli}) <(cut -f4,5,6 {input.g_dist}) <(cut -f4 {input.remapTF}) <(cut -f4 {input.f5}) <(cut -f4 {input.hi}) <(cut -f4 {input.deepc}) <(cut -f4,5,7 {input.ultrac}) <(cut -f4 {input.linsight})| cat {input.header} - > {output}
    """

SnakeMake From line 595 of master/Snakefile

shell:
    """
    bedtools flank -i {input.bed} -g {input.genome} -l 100 -r 0 > {output.wchr}
    bedtools flank -i {input.bed} -g {input.genome} -l 100 -r 0 | sed 's/^chr\|%$//g' > {output.nchr}

    """

SnakeMake BEDTools From line 616 of master/Snakefile

shell:
    """
    bedtools merge -i {input.nchr} > {output.nchr}
    bedtools merge -i {input.wchr} > {output.wchr}

    """

SnakeMake BEDTools From line 633 of master/Snakefile

shell:
    """
    bedtools coverage -b {input.anno} -a {input.bed} > {output}
    """

SnakeMake BEDTools From line 650 of master/Snakefile

shell:
    """
    bedtools coverage -b {input.anno} -a {input.bed} > {output}
    """

SnakeMake BEDTools From line 664 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') | cat annotations/dummy4.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o mean; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 679 of master/Snakefile

shell:
    """
    tabix {input.anno} -R {input.merg} | awk '{{ if ($0 ~ "transcript_id") print $0; else print $0" transcript_id \\"\\";"; }}' | gtf2bed - | cut -f1,2,3,8,10  > {output}
    """

SnakeMake tabix GFFutils From line 695 of master/Snakefile

shell:
    """
    paste <( bedtools coverage -b <(grep "exon" {input.anno}) -a {input.bed} | cut -f 1,2,3,5 ) <(bedtools coverage -b <(grep "transcript" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "gene" {input.anno} ) -a {input.bed} | cut -f 5) <( bedtools coverage -b <( grep "start_codon" {input.anno} ) -a {input.bed} | cut -f 5 ) <( bedtools coverage -b <(grep "stop_codon" {input.anno}) -a {input.bed} | cut -f 5) <(bedtools coverage -b <(grep "three_prime_utr" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "five_prime_utr" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "CDS" {input.anno}) -a {input.bed} | cut -f 5 ) > {output}
    """

SnakeMake BEDTools From line 710 of master/Snakefile

shell:
    """
    grep "exon" {input.anno} | bedtools closest -d -t first -b stdin -a {input.bed} |cut -f 1,2,3,9 |paste - <(grep "gene" {input.anno} | bedtools closest -d -t first -b stdin -a {input.bed} | cut -f 9)|paste - <(grep "start_codon" {input.anno} |bedtools closest -d -t first -b stdin -a {input.bed} | cut -f 9 ) >> {output}   """

SnakeMake BEDTools From line 727 of master/Snakefile

shell:
    """
    bedtools map -b {input.anno} -a {input.bed} -c 4 -o distinct > {output}
    """

SnakeMake BEDTools From line 741 of master/Snakefile

shell:
    """
    Rscript --vanilla scripts/PLIextract.R {input.gn} {input.pli} {output}
    """

SnakeMake From line 756 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}')| cat annotations/dummy8.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4,4,5,5,6,6,7,7,8,8 -o max,sum,max,sum,max,sum,max,sum,max,sum; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 771 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | wc -l); done < {input.bed}) > {output}
    """

SnakeMake From line 785 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}')| cat annotations/dummy5_nchr.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o max; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 804 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | wc -l); done < {input.bed}) > {output}
    """

SnakeMake From line 818 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | awk '{{ total+=$4 }}END{{ printf("%f",total) }}'); done < {input.bed}) > {output}        

    """

SnakeMake From line 833 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.ep} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 852 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4,4 -o max,min > {output}
    """

SnakeMake BEDTools From line 867 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 878 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 891 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 906 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 920 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 933 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4 -o max > {output}
    """

SnakeMake BEDTools From line 947 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} \
    -b {input.anno} -c 4,4 -o max,min > {output}
    """

SnakeMake BEDTools From line 962 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 974 of master/Snakefile

shell:
    """
    cut -f 4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65,69,70,74,75,79,80,84,85,89,90 {input} | awk '{{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print m}}' > {output.mingg}
    cut -f 4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65,69,70,74,75,79,80,84,85,89,90 {input} | awk '{{m=$1;for(i=1;i<=NF;i++)if($i>m)m=$i;print m}}' > {output.maxgg}
    """

SnakeMake From line 984 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4 -o mean > {output}
    """

SnakeMake BEDTools From line 999 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') | cat annotations/dummy4_nchr.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o mean; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 1013 of master/Snakefile

shell:
    "tabix {input.anno} -R {input.merg} | cat annotations/dummy5_nchr.bed - | bedtools map -a {input.bed} -b - -c 4,4 -o max,sum > {output}"

SnakeMake BEDTools tabix From line 1029 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 1038 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(cat annotations/dummy_chrhmm.bed; tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 -o max; done < {input.bed}) > {output}
    """

SnakeMake BEDTools tabix From line 1051 of master/Snakefile

shell:
    """
    bedtools coverage -a {input.bed} -b {input.anno} -counts  > {output}
    """

SnakeMake BEDTools From line 1068 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 5 -o max > {output}
    """

SnakeMake BEDTools From line 1082 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | awk 'BEGIN{{ maxVal="." }}{{ if ((maxVal == ".") || ($4 > maxVal)) {{ maxVal=$4 }} }}END{{ print maxVal }}'); done < {input.bed}) > {output}
    """

SnakeMake From line 1096 of master/Snakefile

shell:
    """
    paste <(cut -f1-11 {input.cadd}) <(cut -f4 {input.cadd2}) <(cut -f4 {input.ccr}) <(cut -f4-28 {input.chromHMM}) <(cut -f4,5,7 {input.ctcf}) <(cut -f1 {input.di_min}) <(cut -f1 {input.di_max}) <(cut -f4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65 {input.encode}) <(cut -f4-7 {input.ep}) <(cut -f4,5,9,10,14,15,19,20,24,25 {input.fire}) <(cut -f4 {input.gc}) <(cut -f4-11 {input.gm}) <(cut -f4 {input.gerp}) <(cut -f4 {input.gerp2}) <(cut -f4,5,6,7,11,12,13,14,18,19,20,21,25,26,27,28 {input.hic}) <(cut -f4-7 {input.hesc}) <(cut -f4-7 {input.microsyn}) <(cut -f4 {input.mpc}) <(cut -f4 {input.pli}) <(cut -f4,5,6 {input.g_dist}) <(cut -f4 {input.remapTF}) <(cut -f4 {input.f5}) <(cut -f4 {input.hi}) <(cut -f4 {input.deepc}) <(cut -f4,5,7 {input.ultrac}) <(cut -f4 {input.linsight})| cat {input.header} - > {output}
    """

SnakeMake From line 1135 of master/Snakefile

shell:
    """
    bedtools flank -i {input.bed} -g {input.genome} -l 0 -r 100 | bedtools sort > {output.wchr}
    bedtools flank -i {input.bed} -g {input.genome} -l 0 -r 100 | bedtools sort | sed 's/^chr\|%$//g' > {output.nchr}

    """

SnakeMake BEDTools From line 1156 of master/Snakefile

shell:
    """
    bedtools merge -i {input.nchr} > {output.nchr}
    bedtools merge -i {input.wchr} > {output.wchr}

    """

SnakeMake BEDTools From line 1173 of master/Snakefile

shell:
    """
    bedtools coverage -b {input.anno} -a {input.bed} > {output}
    """

SnakeMake BEDTools From line 1190 of master/Snakefile

shell:
    """
    bedtools coverage -b {input.anno} -a {input.bed} > {output}
    """

SnakeMake BEDTools From line 1204 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') | cat annotations/dummy4.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o mean; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 1219 of master/Snakefile

shell:
    """
    tabix {input.anno} -R {input.merg} | awk '{{ if ($0 ~ "transcript_id") print $0; else print $0" transcript_id \\"\\";"; }}' | gtf2bed - | cut -f1,2,3,8,10  > {output}
    """

SnakeMake tabix GFFutils From line 1235 of master/Snakefile

shell:
    """
    paste <( bedtools coverage -b <(grep "exon" {input.anno}) -a {input.bed} | cut -f 1,2,3,5 ) <(bedtools coverage -b <(grep "transcript" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "gene" {input.anno} ) -a {input.bed} | cut -f 5) <( bedtools coverage -b <( grep "start_codon" {input.anno} ) -a {input.bed} | cut -f 5 ) <( bedtools coverage -b <(grep "stop_codon" {input.anno}) -a {input.bed} | cut -f 5) <(bedtools coverage -b <(grep "three_prime_utr" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "five_prime_utr" {input.anno}) -a {input.bed} | cut -f 5 ) <(bedtools coverage -b <(grep "CDS" {input.anno}) -a {input.bed} | cut -f 5 ) > {output}
    """

SnakeMake BEDTools From line 1250 of master/Snakefile

shell:
    """
    grep "exon" {input.anno} | bedtools closest -d -t first -b stdin -a {input.bed} |cut -f 1,2,3,9 |paste - <(grep "gene" {input.anno} | bedtools closest -d -t first -b stdin -a {input.bed} | cut -f 9)|paste - <(grep "start_codon" {input.anno} |bedtools closest -d -t first -b stdin -a {input.bed} | cut -f 9 ) >> {output}   """

SnakeMake BEDTools From line 1267 of master/Snakefile

shell:
    """
    bedtools map -b {input.anno} -a {input.bed} -c 4 -o distinct > {output}
    """

SnakeMake BEDTools From line 1281 of master/Snakefile

shell:
    """
    Rscript --vanilla scripts/PLIextract.R {input.gn} {input.pli} {output}
    """

SnakeMake From line 1296 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}')| cat annotations/dummy8.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4,4,5,5,6,6,7,7,8,8 -o max,sum,max,sum,max,sum,max,sum,max,sum; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 1311 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | wc -l); done < {input.bed}) > {output}
    """

SnakeMake From line 1325 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}')| cat annotations/dummy5_nchr.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o max; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 1344 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | wc -l); done < {input.bed}) > {output}
    """

SnakeMake From line 1358 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | awk '{{ total+=$4 }}END{{ printf("%f",total) }}'); done < {input.bed}) > {output}

    """

SnakeMake From line 1373 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.ep} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 1392 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4,4 -o max,min > {output}
    """

SnakeMake BEDTools From line 1407 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 1418 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 1431 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 1446 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 1462 of master/Snakefile

shell:
    """
    bedtools closest -d -t first -a {input.bed} -b {input.hic} | Rscript --vanilla scripts/annotateHIC.R stdin {output.o1}
    """

SnakeMake BEDTools From line 1475 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4 -o max > {output}
    """

SnakeMake BEDTools From line 1489 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} \
    -b {input.anno} -c 4,4 -o max,min > {output}
    """

SnakeMake BEDTools From line 1504 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 1516 of master/Snakefile

shell:
    """
    cut -f 4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65,69,70,74,75,79,80,84,85,89,90 {input} | awk '{{m=$1;for(i=1;i<=NF;i++)if($i<m)m=$i;print m}}' > {output.mingg}
    cut -f 4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65,69,70,74,75,79,80,84,85,89,90 {input} | awk '{{m=$1;for(i=1;i<=NF;i++)if($i>m)m=$i;print m}}' > {output.maxgg}"""

SnakeMake From line 1526 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 4 -o mean > {output}
    """

SnakeMake BEDTools From line 1540 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') | cat annotations/dummy4_nchr.bed - ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4 -o mean; done < {input.bed}) > {output}
    """

SnakeMake BEDTools From line 1554 of master/Snakefile

shell:
    "tabix {input.anno} -R {input.merg} | cat annotations/dummy5_nchr.bed - | bedtools map -a {input.bed} -b - -c 4,4 -o max,sum > {output}"

SnakeMake BEDTools tabix From line 1570 of master/Snakefile

shell:
    "paste {input} > {output}"

SnakeMake From line 1579 of master/Snakefile

shell:
    """
    (while read -r line; do bedtools map -b <(cat annotations/dummy_chrhmm.bed; tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3+1}}') ) -a <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') -c 4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28 -o max; done < {input.bed}) > {output}
    """

SnakeMake BEDTools tabix From line 1592 of master/Snakefile

shell:
    """
    bedtools coverage -a {input.bed} -b {input.anno} -counts  > {output}
    """

SnakeMake BEDTools From line 1609 of master/Snakefile

shell:
    """
    bedtools map -a {input.bed} -b {input.anno} -c 5 -o max > {output}
    """

SnakeMake BEDTools From line 1623 of master/Snakefile

shell:
    """
    (while read -r line; do paste <(echo $line | awk 'BEGIN{{ OFS="\t" }}{{ print $1,$2,$3}}') <(tabix {input.anno} $(echo $line | awk '{{ print $1":"$2"-"$3}}') | awk 'BEGIN{{ maxVal="." }}{{ if ((maxVal == ".") || ($4 > maxVal)) {{ maxVal=$4 }} }}END{{ print maxVal }}'); done < {input.bed}) > {output}
    """

SnakeMake From line 1637 of master/Snakefile

shell:
    """
    paste <(cut -f1-11 {input.cadd}) <(cut -f4 {input.cadd2}) <(cut -f4 {input.ccr}) <(cut -f4-28 {input.chromHMM}) <(cut -f4,5,7 {input.ctcf}) <(cut -f1 {input.di_min}) <(cut -f1 {input.di_max}) <(cut -f4,5,9,10,14,15,19,20,24,25,29,30,34,35,39,40,44,45,49,50,54,55,59,60,64,65 {input.encode}) <(cut -f4-7 {input.ep}) <(cut -f4,5,9,10,14,15,19,20,24,25 {input.fire}) <(cut -f4 {input.gc}) <(cut -f4-11 {input.gm}) <(cut -f4 {input.gerp}) <(cut -f4 {input.gerp2}) <(cut -f4,5,6,7,11,12,13,14,18,19,20,21,25,26,27,28 {input.hic}) <(cut -f4-7 {input.hesc}) <(cut -f4-7 {input.microsyn}) <(cut -f4 {input.mpc}) <(cut -f4 {input.pli}) <(cut -f4,5,6 {input.g_dist}) <(cut -f4 {input.remapTF}) <(cut -f4 {input.f5}) <(cut -f4 {input.hi}) <(cut -f4 {input.deepc}) <(cut -f4,5,7 {input.ultrac}) <(cut -f4 {input.linsight})| cat {input.header} - > {output}
    """

SnakeMake From line 1676 of master/Snakefile

shell:
    """
    Rscript --vanilla scripts/scoring.R {params.name} {input.span} {input.flank_up} {input.flank_down} {output}
    """

SnakeMake From line 1699 of master/Snakefile

shell:
    """
    bedtools sort -i {input.score} | cat {input.header} - > {output}
    """

SnakeMake BEDTools From line 1714 of master/Snakefile

ShowHide 106 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://cadd-sv.bihealth.org

Name: cadd-sv

Version: v1.1

Badge:

Insert copied code into your website to add a link to this workflow.

License: MIT License

Keywords:

tabix BEDTools GFFutils Snakemake

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free