Repo for running custom snakemake for IBDne

public public 1yr ago 0 bookmarks

snake-IBDne

Repo for running custom snakemake for IBDne.

Citations for programs in pipeline:

merge-ibd-segments.16May19.ad5.jar and ibdne.04Sep15.e78.jar :

B L Browning and S R Browning (2013). Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194(2):459-71. doi:10.1534/genetics.113.150029

Running snake-IBDne

Project Folder

Please run each dataset in its own separate folder for clarity.

/share/hennlab/projects/snake-IBDNe/

Required software:

  • snakemake

  • shapeit v2.r904

  • R 3.6.1

  • plink2 1.90p

  • GERMLINE2

  • ibdne.04Sep15.e78.jar (download from here )

  • merge-ibd-segments.16May19.ad5.jar

Setting up conda environment

  1. For Henn lab users: run this command every time before you use conda:
source /share/hennlab/progs/miniconda3/etc/profile.d/conda.sh
  1. To create the environment:
conda create -n IBDne-env r-base shapeit plink scipy python=2.7 r-biocmanager

You only need to create the environment the first time, after that you just go straight to the next step (activate it)

  1. To activate the environment: (every time you run pipeline)
conda activate IBDne-env

Installing required r-packages: (once only, when setting up the environment)

conda install -c bioconda bioconductor-genomicranges
conda install -c r r-data.table
  1. Once inside the environment, you must manually add the following programs to your path (before each run of the pipeline):
export PATH="/share/hennlab/progs/GERMLINE2-master:$PATH"

Required files in working directory: (Please note - you must run each population in its own directory.)

  • data folder with input data:

    • if data is NOT phased yet: the data folder must contain bim, bed, fam files named in the format {dataset}.bim /bed/fam

    • if data IS phased already: the data folder must contain plink .haps, .sample files for individual chromosomes named in the format {dataset}.chr{chrnum}.phased.haps /.sample

  • regions folder with exclude_regions_hg19.txt

  • scripts folder with the following required scripts:

    • bits_matching.sh

    • calc_genome_length.R

    • calculate_concordance.sh

    • choose_params.sh

    • combine_chrom.sh

    • comb_match_files.sh

    • compare_IBD_kinship.R

    • compare_RoH.R

    • define_regions.sh

    • filter_RoH-IBD_segs.R

    • find_lowDensityRegs.R

    • get_outlier_regions.R

    • get_outlier_SNPs.R

    • get_seg_depthv3.R

    • join_RoHsegs.R

    • shapeit_to_germline.py

    • trim_bimfile.R

    • shapeit_iterate.sh

  • progs folder with the following java programs:

    • merge-ibd-segments.16May19.ad5.jar

    • ibdne.07May18.6a4.jar

  • Snakefile

    • the file entitled "ibdne_2versions.smk" is an alternative snakefile for comparing different versions of ibdne. Please use "Snakefile" for regular runs of the pipeline

How to run:

The snakemake command must be run with four config parameters

  • dataset: filename of initial QC'ed bim/bed/fam files (without file extension) ex: if the files are named "Himba_merged.bim" then set dataset=Himba_merged

  • phased: FALSE if input data is not phased, TRUE if input data is phased.

  • gmap-chr_dir: directory containing plink format recombination maps for separate chromosomes.

    • filenames MUST be named in the format chr{chrnum}.gmap.txt

    • remember to put a / at the end of the path. for example: /share/hennlab/reference/snake-IBDne_maps/

  • ref: file prefix of the reference population 1000G .haps.gz, .legend, .sample, excluding the _chr{chrnum} portion

    • example: if the files are titled 1000GP_Phase3/1000GP_Phase3_chr6.hap.gz 1000GP_Phase3/1000GP_Phase3_chr6.legend and 1000GP_Phase3/1000GP_Phase3.sample provide the command line argument as such: ref=1000GP_Phase3/1000GP_Phase3 note that the .hap and .legend files must be named with _chr{number}.hap.gz and _chr{number}.legend following the prefix you provide in the ref command line option

    • Data downloaded from this link : https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html Data downloaded from this link

    • For Henn lab users: location of these files is /share/hennlab/reference/1000G_Phase3_haps-sample-legend/

Example:

### Dry run - testing workflow
/share/hennlab/progs/miniconda3/bin/snakemake -n --config dataset=xal phased=TRUE ref=/share/hennlab/reference/1000G_Phase3_haps-sample-legend/1000GP_Phase3/1000GP_Phase3 gmap_chr_dir=/share/hennlab/projects/snake-IBDNe/austin_files/ -p -j 10
## Generate DAG of pipeline
/share/hennlab/progs/miniconda3/bin/snakemake --config dataset=xal phased=TRUE ref=/share/hennlab/reference/1000G_Phase3_haps-sample-legend/1000GP_Phase3/1000GP_Phase3 gmap_chr_dir=/share/hennlab/projects/snake-IBDNe/austin_files/ --rulegraph | dot -Tpng > rulegraph.png
## Run the pipeline!
/share/hennlab/progs/miniconda3/bin/snakemake --config dataset=xal phased=TRUE ref=/share/hennlab/reference/1000G_Phase3_haps-sample-legend/1000GP_Phase3/1000GP_Phase3 gmap_chr_dir=/share/hennlab/projects/Xal_snake-IBDne/austin_files/ -p -j 20

Pipeline Overview

DAG

Running ibdne_2versions.smk

This snakefile will output results from two versions of IBDne (ibdne.23Apr20.ae9.jar and ibdne.04Sep15.e78.jar) for comparison. To run this version, add the flag -s ibdne_2versions.smk

Code Snippets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
DATASET=$1
BITS_RANGE=`seq 5 5 200` 
ERRHOM_RANGE=`seq 1 2`


echo Pop hap/notHap bits err_hom err_het diff/total match/total > results/other/${DATASET}_GL_IBD_3500kb.txt


for BITS in ${BITS_RANGE[@]}
  do
    for ERRHOM in ${ERRHOM_RANGE[@]}
      do
        grep -i $DATASET results/RoH_param_combs/${BITS}/IBD_${BITS}_${ERRHOM}_hap_segsJoined_QCed.ibd > results/RoH_param_combs/${BITS}/${DATASET}_IBD_${BITS}_${ERRHOM}_hap_segsJoined_QCed.ibd
        echo $DATASET hap $BITS $ERRHOM NA `Rscript scripts/compare_RoH.R results/calc_RoH/${DATASET}_RoH_segsJoined_QCed.hom results/RoH_param_combs/${BITS}/IBD_${BITS}_${ERRHOM}_hap_segsJoined_QCed.ibd 3500` >> results/other/${DATASET}_GL_IBD_3500kb.txt
      done
  done
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
DATASET=$1
BITS_RANGE=`seq 5 5 200`
ERRHOM_RANGE=`seq 1 2`

echo 'BITS ERRHOM ERRHET NRMSE' > results/other/${DATASET}_NormRMSE.txt
GENOME_LENGTH=`Rscript scripts/calc_genome_length.R results/other/${DATASET}_HWE-MAFfiltered.bim`

for BITS in ${BITS_RANGE[@]}
  do
    for ERRHOM in ${ERRHOM_RANGE[@]}
      do
        echo $BITS $ERRHOM NA `Rscript scripts/compare_IBD_kinship.R results/RoH_param_combs/${BITS}/IBD_${BITS}_${ERRHOM}_hap_segsJoined_QCed.ibd results/other/${DATASET}.genome $GENOME_LENGTH` >> results/other/${DATASET}_NormRMSE.txt
      done
  done
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
IBD_3500=$1
NormRMSE=$2
P=$3

paste -d" " <(awk '{print $0}' $IBD_3500) <(awk '{print $4}' $NormRMSE) > temp
awk '{print $1, $2, $3, $4, $5, $6, $7, $8, (1-$6 + $7 + 1-$8)/3}' temp > temp2
sed -r '1s/\S+//9' temp2 > temp
sed -i '1s/$/score/' temp
# Look only at haploid
mv temp ${P}_GL_IBD_3500kb_onlyHap.txt; rm temp2

head -n 1 ${P}_GL_IBD_3500kb_onlyHap.txt > header.txt
sed '1d' ${P}_GL_IBD_3500kb_onlyHap.txt > data.txt

sort -nk9 data.txt > sorted_data.txt ; rm data.txt
cat header.txt sorted_data.txt > ${P}_GL_IBD_3500kb_onlyHap.txt ; rm header.txt sorted_data.txt

cat ${P}_GL_IBD_3500kb_onlyHap.txt

PARAMS=`tail -n 1 ${P}_GL_IBD_3500kb_onlyHap.txt | cut -f 2-9 -d" "`
PARAMS=(${PARAMS//,/ })
echo $PARAMS
BITS=${PARAMS[1]}
ERRHOM=${PARAMS[2]}
PERC_DIFF=${PARAMS[4]}
PERC_MATCH=${PARAMS[5]}
KIN_RESIDS=${PARAMS[6]}
SCORE=${PARAMS[7]}

echo -e $P GERMLINE2 parameters"\n"Type: ${TYPE}"\n"Bits: ${BITS}"\n"Err_hom: ${ERRHOM}"\n"Percent difference: ${PERC_DIFF}"\n"Percent match: ${PERC_MATCH}"\n"Kin residuals: ${KIN_RESIDS}"\n"Score: ${SCORE} > results/other/${P}_GL_parameters.txt
# awk -v Pop="$P" -v IGNORECASE=1 '$1 ~ Pop && $3 ~ Pop {print}' RoH_param_combs/${BITS}/IBD_${BITS}_${ERRHOM}_hap_segsJoined_QCed.ibd | cut -f1-7,9 > ${P}_only_QCed.ibd


cat results/RoH_param_combs/${BITS}/IBD_${BITS}_${ERRHOM}_hap_segsJoined_QCed.ibd > results/other/${P}_only_QCed.ibd
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
BITS_RANGE=`seq 5 5 200` # GERMLINE2 IBD segment seed
ERRHOM_RANGE=`seq 1 2`
CHRS=`seq 1 22`
DATASET=$1

for BITS in ${BITS_RANGE[@]}
  do 
    for ERRHOM in ${ERRHOM_RANGE[@]}
      do 
        for chr in ${CHRS[@]}
          do
            cat results/RoH_param_combs/${BITS}/${DATASET}.chr${chr}.phased_GERMLINE2_RoH_${BITS}_${ERRHOM}_hap.match >> results/RoH_param_combs/${BITS}/IBD_${BITS}_${ERRHOM}_hap.match
          done 
        done 
    done
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
FILE=$1
DATASET=$2
BITS=$(echo $FILE | cut -d '_' -f5)
ERRHOM=$(echo $FILE | cut -d '_' -f6)

N=`wc -l $FILE | cut -f1 -d" "`
if [ $N -gt 0 ]
  then
    Rscript scripts/get_outlier_regions.R results/RoH_param_combs/${BITS}/GERMLINE2_IBDdepth_${BITS}_${ERRHOM}_hap_IBDoutliers 3e6 2.5e5 > results/RoH_param_combs/${BITS}/GERMLINE2_IBDdepth_${BITS}_${ERRHOM}_hap_IBDoutliers_regions
    cat regions/exclude_regions_hg19.txt results/RoH_param_combs/${BITS}/GERMLINE2_IBDdepth_${BITS}_${ERRHOM}_hap_IBDoutliers_regions  results/calc_RoH/${DATASET}_lowDensityRegions.txt > results/RoH_param_combs/${BITS}/exclude_regions_hg19_forIBDNe_${BITS}_${ERRHOM}_hap.txt
  else
    cat regions/exclude_regions_hg19.txt results/calc_RoH/${DATASET}_lowDensityRegions.txt > results/RoH_param_combs/${BITS}/exclude_regions_hg19_forIBDNe_${BITS}_${ERRHOM}_hap.txt
  fi
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
args = commandArgs(trailingOnly=TRUE)

# Defaults
perc.overlap <- 0.95

# Error handling and updating default values
if (length(args)<2) {
  stop("At least two arguments must be supplied", call.=FALSE)
} else if (length(args)==3) {
  perc.overlap <- as.numeric(args[3])
} else if (length(args)>3) {
  stop("Too many arguments", call.=FALSE)
}

options(stringsAsFactors=F)
fileNameSplit <- strsplit(args[1], split=".", fixed=T)[[1]]
ext.ind <- which(fileNameSplit %in% c("match", "hom", "ibd"))
stem <- paste(fileNameSplit[-ext.ind], collapse=".")
ext <- fileNameSplit[ext.ind]

if (ext=="hom") {
  seg.file <- read.table(args[1], header=T)
  colnames(seg.file)[7:8] <- c("start", "end")
} else if (ext=="match") {
  seg.file <- read.table(args[1])
  if (ncol(seg.file)==13) {
    colnames(seg.file)[3:5] <- c("CHR", "start", "end")
  } else if (ncol(seg.file)==15) {
    colnames(seg.file)[5:7] <- c("CHR", "start", "end")
  }
} else if (ext=="ibd") {
  seg.file <- read.table(args[1])
  colnames(seg.file)[5:7] <- c("CHR", "start", "end")
}

excl.regs <- read.table(args[2])
  colnames(excl.regs) <- c("CHR", "start", "end", "type")

suppressMessages(library("GenomicRanges"))
Excl.Regs <- makeGRangesFromDataFrame(excl.regs[,c("CHR", "start", "end")], seqnames.field="CHR")
Seg.File <- makeGRangesFromDataFrame(seg.file[ ,c("CHR", "start", "end")], seqnames.field="CHR")

Hits <- findOverlaps(Seg.File, Excl.Regs) # find all hits
Overlaps <- pintersect(Seg.File[queryHits(Hits)], Excl.Regs[subjectHits(Hits)])
percentOverlap <- width(Overlaps)/width(Seg.File[queryHits(Hits)])
HighOverlapHits <- Hits[percentOverlap > perc.overlap] # Which have high overlap? Some will be removed

new.seg.file <- seg.file[-queryHits(HighOverlapHits),]

if (ext=="hom") {
  COLNAMES <- c(colnames(new.seg.file)[1:6], "POS1", "POS2", colnames(new.seg.file)[9:13])
  write.table(new.seg.file, "", row.names=F, col.names=COLNAMES, quote=F, sep="\t")
} else {
  write.table(new.seg.file, "", row.names=F, col.names=F, quote=F, sep="\t")
}
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
args = commandArgs(trailingOnly=TRUE)


# Defaults
perc.thresh <- 0.05

# Error handling and updating default values
if (length(args)<3) {
  stop("At least three arguments must be supplied", call.=FALSE)
} else if (length(args)==4) {
  perc.overlap <- as.numeric(args[4])
} else if (length(args)>4) {
  stop("Too many arguments", call.=FALSE)
}

options(stringsAsFactors=F)
Bim <- read.table(args[1])
WinSize <- as.numeric(args[3])

fileNameSplit <- strsplit(args[1], split=".", fixed=T)[[1]]
ext.ind <- match("bim", fileNameSplit)
stem <- paste(fileNameSplit[-ext.ind], collapse=".")

genome.LDRs <- as.data.frame(matrix(ncol=3, nrow=0))
for (chr in unique(Bim$V1)) {
  # Calculate SNP density across the chromosome in windows
  chr.data <- Bim[Bim$V1==chr,]
  start <- chr.data[1, "V4"]
  end <- chr.data[nrow(chr.data), "V4"]
  Hist <- hist(chr.data$V4, breaks=seq(start, end+WinSize, WinSize), plot=F)
  Counts <- Hist$counts
  Breaks <- Hist$breaks
  # Store count information per region
  chrWinInfo <- as.data.frame(matrix(ncol=4, nrow=length(Counts)))
  colnames(chrWinInfo) <- c("chr", "start", "end", "nSNPs")
  chrWinInfo$chr <- chr
  chrWinInfo$start <- Breaks[1:length(Counts)]
  chrWinInfo$end <- Breaks[2:length(Breaks)]
  chrWinInfo$nSNPs <- Counts
  # Find low density regions
  min.SNPs <- max(sort(Counts)[length(Counts)*perc.thresh], 1/5e4*WinSize)
  low.dens.wins <- chrWinInfo[which(chrWinInfo$nSNPs < min.SNPs), 1:3]
  # If low density regions are found join continguous windows into regions
  if (nrow(low.dens.wins) > 0) {
    Low.Density.Regions <- low.dens.wins[1, ]
    if (nrow(low.dens.wins) > 1) {
      for (x in 2:nrow(low.dens.wins)) {
        if (low.dens.wins[x, "start"] == Low.Density.Regions[nrow(Low.Density.Regions), "end"]) {
          Low.Density.Regions[nrow(Low.Density.Regions), "end"] <- low.dens.wins[x, "end"]
        } else {
          Low.Density.Regions[nrow(Low.Density.Regions)+1, ] <- low.dens.wins[x, ]
        }
      }
    }
  }
  genome.LDRs <- rbind(genome.LDRs, Low.Density.Regions)
}

genome.LDRs <- cbind(genome.LDRs, "low_density_region")

out_pre=args[2]

write.table(genome.LDRs, paste(out_pre, "_lowDensityRegions.txt", sep=""), quote=F, row.names=F, col.names=F, sep="\t")
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
args = commandArgs(trailingOnly=TRUE)

# Defaults
maxRange <- 3e6
buffer <- 2e5

# Error handling and updating default values
if (length(args)<1) {
  stop("At least one argument must be supplied", call.=FALSE)
} else if (length(args)>=2) {
  maxRange <- as.numeric(args[2])
} else if (length(args)==3) {
  buffer <- as.numeric(args[3])
} else if (length(args)>3) {
  stop("Too many arguments", call.=FALSE)
}

FILE <- read.table(args[1], stringsAsFactors=F)

find.IBDoutlier.range <- function(pileup.outliers, maxRange, buffer) {
  outlier.regions <- as.data.frame(matrix(ncol=4, nrow=0))
    colnames(outlier.regions) <- c("CHR", "START", "END", "TYPE")
  # Split into chromosomes and sort the outlier values (going along the chromosome)
  chrs <- unique(pileup.outliers[,1])
  outliers.by.chr <- list()
  for (i in 1:length(chrs)) {
    outliers.by.chr[[i]] <- pileup.outliers[pileup.outliers[,1]==chrs[i],]
    outliers.by.chr[[i]] <- outliers.by.chr[[i]][order(outliers.by.chr[[i]][,2]),2]
  }
  for (i in 1:length(chrs)) {
    # If there is only one outlier SNP, define as the region +/- the buffer around that SNP
    if (length(outliers.by.chr[[i]])==1) {
      outlier.range <- c(outliers.by.chr[[i]][1] - buffer, outliers.by.chr[[i]][1] + buffer)
      outlier.regions[nrow(outlier.regions)+1,] <- c(chrs[i], outlier.range, "depth_outlier")
    }
    # If there is more than one outlier SNP, initiate a new region that begins with the first SNP
    if (length(outliers.by.chr[[i]])>1) {
      outlier.range <- c()
      outlier.range[1] <- outliers.by.chr[[i]][1] - buffer
      # Find endpoints of the region
      endpoint.inds <- which(outliers.by.chr[[i]][1:((length(outliers.by.chr[[i]]))-1)] - outliers.by.chr[[i]][-1] < -maxRange)
      endpoint.inds <- c(endpoint.inds, length(outliers.by.chr[[i]]))
      for (n in 1:length(endpoint.inds)) {
        if (n==length(endpoint.inds)) {
          outlier.range[2] <- outliers.by.chr[[i]][endpoint.inds[n]] + buffer
          outlier.regions[nrow(outlier.regions)+1,] <- c(chrs[i], outlier.range, "depth_outlier")
        } else {
          outlier.range[2] <- outliers.by.chr[[i]][endpoint.inds[n]] + buffer
          outlier.regions[nrow(outlier.regions)+1,] <- c(chrs[i], outlier.range, "depth_outlier")
          outlier.range[1] <- outliers.by.chr[[i]][endpoint.inds[n]+1] - buffer
        }
      }
    }
  }
  return(outlier.regions)
}

results <- find.IBDoutlier.range(FILE, maxRange, buffer)
write.table(results, "", quote=F, col.names=F, row.names=F, sep=" ")
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
args = commandArgs(trailingOnly=TRUE)
n.sds = 3

if (length(args)<1) {
  stop("One argument must be supplied", call.=FALSE)
} else if (length(args)>2) {
  stop("Too many arguments", call.=FALSE)
} else if (length(args)==2) {
  n.sds = as.numeric(args[2])
}

depth.file <- read.table(args[1], sep=" ", header=F)
upper <- mean(depth.file[,3]) + (n.sds * sd(depth.file[,3]))
outliers <- depth.file[which(depth.file[,3] >= upper),]

write.table(outliers, "", quote=F, col.names=F, row.names=F)
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
args = commandArgs(trailingOnly=TRUE)

# Check for correct number of arguments
if (length(args)<2) {
  stop("At least two arguments must be supplied", call.=FALSE)
} else if (length(args)>2) {
  stop("Too many arguments", call.=FALSE)
}

# Define functions
pkgTest <- function(x){
  if (!require(x,character.only = TRUE)){
    install.packages(x,dep=TRUE)
    if(!require(x,character.only = TRUE)) stop("Package not found")
  }
}

# Call packages
pkgTest("data.table")

# Read in data
options(stringsAsFactors=F)

fileName <- unlist(strsplit(args[1], split=".", fixed=TRUE))
ext <- fileName[length(fileName)]
if (ext == "hom") {
  seg.file <- read.table(args[1], header=T)
  colnames(seg.file)[4]<-"chr"
  colnames(seg.file)[7:8]<-c("segment.start", "segment.end")
} else if (ext == "ibd") {
  seg.file <- read.table(args[1], header=F)
  colnames(seg.file)[5:7] <- c("chr", "segment.start", "segment.end")
} else if (ext == "match") {
  seg.file <- read.table(args[1], header=F)
  if (ncol(seg.file==13)) {
    colnames(seg.file)[3:5] <- c("chr", "segment.start", "segment.end")
  } else if (ncol(seg.file==15)) {
    colnames(seg.file)[5:7] <- c("chr", "segment.start", "segment.end")
  }
}

bim <- read.table(args[2])
colnames(bim)[1:4] <- c("chr","rsid","dist","pos")


setDT(seg.file)
setDT(bim)
bim[, pos2 := pos ]
setkey(seg.file, chr, segment.start, segment.end)
setkey(bim, chr, pos, pos2)
newdepthcounts<-foverlaps(bim, seg.file) [, .(count = sum(!is.na(segment.start))), by = .(chr,pos, pos2) ][, pos2 := NULL ]
colnames(newdepthcounts)<-c("CHR","BP","COUNT")

write.table(newdepthcounts, "", quote=F, row.names=F, col.names=F, sep=" ")
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
args = commandArgs(trailingOnly=TRUE)
# Check for correct number of arguments
if (length(args)<2) {
  stop("At least two arguments must be supplied", call.=FALSE)
} else if (length(args)>2) {
  stop("Too many arguments", call.=FALSE)
}

# Define functions
pkgTest <- function(x){
  if (!require(x,character.only = TRUE, quietly = T)){
    install.packages(x,dep=TRUE)
    if(!require(x,character.only = TRUE, quietly = T)) stop("Package not found")
  }
}

# Call packages
pkgTest("data.table")
pkgTest("GenomicRanges")

# Read in data
options(stringsAsFactors=F)

fileName <- unlist(strsplit(args[1], split=".", fixed=TRUE))
ext <- fileName[length(fileName)]
if (ext == "hom") {
  seg.file <- fread(args[1], header=T)
  colnames(seg.file)[4]<-"chr"
  colnames(seg.file)[7:8]<-c("segment.start", "segment.end")
} else if (ext == "ibd") {
  seg.file <- fread(args[1], header=F)
  colnames(seg.file)[5:7] <- c("chr", "segment.start", "segment.end")
} else if (ext == "match") {
  seg.file <- fread(args[1], header=F)
  if (ncol(seg.file==13)) {
    colnames(seg.file)[3:5] <- c("chr", "segment.start", "segment.end")
  } else if (ncol(seg.file==15)) {
    colnames(seg.file)[5:7] <- c("chr", "segment.start", "segment.end")
  }
}

bim <- fread(args[2])
colnames(bim)[1:4] <- c("chr","rsid","dist","start")
bim[, end := start ]

#set key columns for data.tables
setkey(seg.file, chr, segment.start, segment.end)
setkey(bim, chr, start, end)

#calculate depth counts
#assign data.tables to GRanges objects
GRseg <- makeGRangesFromDataFrame(seg.file) 
GRbim <- makeGRangesFromDataFrame(bim)
#make new dataframe combining bim site info with counts
newdepthcounts <- as.data.frame(bim[,c("chr","start")])
#count number of segments that overlap sites in bim file assign to new column
newdepthcounts$count <- countOverlaps(GRbim, GRseg, type = "any") 
colnames(newdepthcounts)<-c("CHR","BP","COUNT") #assign final column names


write.table(newdepthcounts, "", quote=F, row.names=F, col.names=F, sep=" ") #write to out
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

if (!requireNamespace("GenomicRanges", quietly = TRUE))
    BiocManager::install("GenomicRanges")


#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)

# Defaults
perc.thresh <- 0.05
perc.overlap <- 0.8

# Error handling and updating default values
if (length(args)<3) {
  stop("At least three arguments must be supplied", call.=FALSE)
} else if (length(args)==4) {
  perc.thresh <- as.numeric(args[4])
} else if (length(args)==5) {
  perc.thresh <- as.numeric(args[4])
  perc.overlap <- as.numeric(args[5])
} else if (length(args)>5) {
  stop("Too many arguments", call.=FALSE)
}

options(stringsAsFactors=F)
Bim <- read.table(args[1])
WinSize <- as.numeric(args[3])

# Determine type of input segment file
cat("Determining input file type... ")
fileNameSplit <- strsplit(args[2], split=".", fixed=T)[[1]]
ext.ind <- match("hom", fileNameSplit)

if (length(fileNameSplit) > 1 & !is.na(ext.ind)) {
  stem <- paste(fileNameSplit[-ext.ind], collapse=".")
  ext <- "hom"
} else {
  stem <- args[2]
  ext="garlic"
}

if (ext=="hom") {
  Seg <- read.table(args[2], header=T)
  if (ncol(Seg)==13) {
    fileType <- "PLINK"
    colnames(Seg)[9] <- "LENGTH"
  }
  cat(fileType, "file type detected\n")
} else {
  Seg <- read.table(args[2], skip=1)
  Seg <- cbind(Seg, rep("placeholder_ID", nrow(Seg)))
  if (ncol(Seg)==10) {
    fileType <- "GARLIC"
    cat("WARNING: GARLIC file type detected - header line dropped\n")
    colnames(Seg)[c(1,2,3,4,5,10)] <- c("CHR","POS1","POS2","CLASS","LENGTH","IID")
    for (i in 1:22) {
      Seg[which(Seg$CHR==paste("chr", i, sep="")), "CHR"] <- i
    }
    Seg$CHR <- as.numeric(Seg$CHR)
    class.ranges <- list(); length(class.ranges) <- 3; names(class.ranges) <- c("A","B","C")
    for (class in c("A","B","C")) {
      class.ranges[[class]] <- range(as.numeric(Seg[which(Seg$CLASS==class), "LENGTH"]))
    }
    class.ranges[["A"]][2] <- mean(class.ranges[["A"]][2], class.ranges[["B"]][1])
    class.ranges[["B"]][1] <- mean(class.ranges[["A"]][2], class.ranges[["B"]][1]) + 0.5
    class.ranges[["B"]][2] <- mean(class.ranges[["B"]][2], class.ranges[["C"]][1]) 
    class.ranges[["C"]][1] <- mean(class.ranges[["B"]][2], class.ranges[["C"]][1]) + 0.5
  }
}

# Create copy of input file - eventual output file
Seg.new <- Seg

genome.SNP.count <- c()
chr.hists <- list()
  length(chr.hists) <- length(unique(Bim$V1))
  names(chr.hists) <- unique(Bim$V1)

for (chr in unique(Bim$V1)) {
  chr.data <- Bim[Bim$V1==chr,]
  start <- chr.data[1, "V4"]
  end <- chr.data[nrow(chr.data), "V4"]
  chr.hists[[as.character(chr)]] <- hist(chr.data$V4, breaks=seq(start, end+WinSize, WinSize), plot=F)
  genome.SNP.count <- c(genome.SNP.count, chr.hists[[as.character(chr)]]$counts)
}
min.SNPs <- max(sort(genome.SNP.count)[length(genome.SNP.count)*perc.thresh], 1/5e4*WinSize)

suppressMessages(library("GenomicRanges"))
for (chr in sort(unique(Seg$CHR))) {
  cat("Starting chr", chr, "\n")
  # Calculate SNP density across the chromosome in windows
  Hist <- chr.hists[[as.character(chr)]]
  Counts <- Hist$counts
  Breaks <- Hist$breaks
  # Store count information per region
  if (fileType=="PLINK") {
    chrWinInfo <- as.data.frame(matrix(ncol=4, nrow=length(Counts)))
      colnames(chrWinInfo) <- c("chr", "start", "end", "nSNPs")
  } else if (fileType=="GARLIC") {
    chrWinInfo <- as.data.frame(matrix(ncol=3, nrow=length(Counts)))
      colnames(chrWinInfo) <- c("chr", "start", "end")
  }
  chrWinInfo$chr <- chr
  chrWinInfo$start <- Breaks[1:length(Counts)]
  chrWinInfo$end <- Breaks[2:length(Breaks)]
  chrWinInfo$nSNPs <- Counts
  # Find low density regions
  low.dens.wins <- chrWinInfo[which(chrWinInfo$nSNPs < min.SNPs), 1:3]
  # If low density regions are found join continguous windows into regions
  if (nrow(low.dens.wins) > 0) {
    Low.Density.Regions <- low.dens.wins[1, ]
    if (nrow(low.dens.wins) > 1) {
      for (x in 2:nrow(low.dens.wins)) {
        if (low.dens.wins[x, "start"] == Low.Density.Regions[nrow(Low.Density.Regions), "end"]) {
          Low.Density.Regions[nrow(Low.Density.Regions), "end"] <- low.dens.wins[x, "end"]
        } else {
          Low.Density.Regions[nrow(Low.Density.Regions)+1, ] <- low.dens.wins[x, ]
        }
      }
    }
    Low.Density.Ranges <- makeGRangesFromDataFrame(Low.Density.Regions)
    # Join segments in input file
    for (Ind in unique(Seg$IID)) {
      chrSegInd <- Seg[Seg$CHR==chr&Seg$IID==Ind,]
        chrSegInd <- chrSegInd[order(chrSegInd$POS1),]
      if (nrow(chrSegInd)>1) {
        RoH.gaps <- as.data.frame(matrix(ncol=3,nrow=0))
          colnames(RoH.gaps) <- c("chr","start","end")
        for (y in 2:max(2, nrow(chrSegInd))) {
          RoH.gaps[y-1, ] <- c(chr, chrSegInd[y-1, "POS2"], chrSegInd[y, "POS1"])
        }
        RoH.gaps <- RoH.gaps[which(RoH.gaps$end > RoH.gaps$start),]
        if (nrow(RoH.gaps) > 0) {
          RoH.gap.Ranges <- makeGRangesFromDataFrame(RoH.gaps)
          # Do any gaps overlap a low density region?
          Hits <- findOverlaps(RoH.gap.Ranges, Low.Density.Ranges) # find all hits
          Overlaps <- pintersect(Low.Density.Ranges[subjectHits(Hits)], RoH.gap.Ranges[queryHits(Hits)])
          percentOverlap <- width(Overlaps) / width(RoH.gap.Ranges[queryHits(Hits)]) # calculate overlap
          Hits <- Hits[percentOverlap > perc.overlap] # sufficient overlap?
          joinOverGaps <- RoH.gaps[queryHits(Hits),]
          if (nrow(joinOverGaps)>0) {
            for (z in 1:nrow(joinOverGaps)) {
              seg1 <- rownames(chrSegInd[which(chrSegInd$POS2 == joinOverGaps[z, "start"]),])[1]
              seg2 <- rownames(chrSegInd[which(chrSegInd$POS1 == joinOverGaps[z, "end"]),])[1]
              joinedSeg <- chrSegInd[as.character(seg1),]
              joinedSeg$POS2 <- chrSegInd[as.character(seg2),"POS2"]
              joinedSeg$LENGTH <- chrSegInd[as.character(seg1),"LENGTH"] + chrSegInd[as.character(seg2),"LENGTH"]
              if (fileType=="PLINK") {
                joinedSeg$SNP2 <- chrSegInd[as.character(seg2),"SNP2"]
                joinedSeg$NSNP <- chrSegInd[as.character(seg1),"NSNP"] + chrSegInd[as.character(seg2),"NSNP"]
              } else if (fileType=="GARLIC") {
                if (joinedSeg$LENGTH < class.ranges[["A"]][1]) {
                  joinedSeg$CLASS <- "A"
                } else if (joinedSeg$LENGTH >= class.ranges[["A"]][1] & joinedSeg$LENGTH <= class.ranges[["A"]][2]) {
                  joinedSeg$CLASS <- "A"
                } else if (joinedSeg$LENGTH >= class.ranges[["B"]][1] & joinedSeg$LENGTH <= class.ranges[["B"]][2]) {
                  joinedSeg$CLASS <- "B"
                } else if (joinedSeg$LENGTH >= class.ranges[["C"]][1] & joinedSeg$LENGTH <= class.ranges[["C"]][2]) {
                  joinedSeg$CLASS <- "C"
                } else if (joinedSeg$LENGTH > class.ranges[["C"]][2]) {
                  joinedSeg$CLASS <- "C"
                }
              }
              Seg.new <- rbind(Seg.new, joinedSeg)
              # Remove the two split segments from the Seg.joined object
              Seg.new <- Seg.new[-match(as.character(seg1), rownames(Seg.new)),]
              Seg.new <- Seg.new[-match(as.character(seg2), rownames(Seg.new)),]
            }
          }
        }
      }
    }
  }
}

if (fileType=="PLINK") {
  colnames(Seg.new)[9] <- "KB"
  write.table(Seg.new, paste(stem, "_segsJoined.hom", sep=""), quote=F, row.names=F, col.names=T)
} else if (fileType=="GARLIC") {
  Seg.new <- Seg.new[,-ncol(Seg.new)]
  Seg.new$CHR <- paste("chr", Seg.new$CHR, sep="")
  write.table(Seg.new, paste(stem, "_segsJoined.garlic", sep=""), quote=F, row.names=F, col.names=F, sep="\t")
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
INPUT=$1
MAP=$2
LOG=$3
OUTPUT=$4
REF_HAP=$5
REF_LEG=$6
REF_SAMP=$7

# ShapeIT phasing and conversion to plink files
#----------------------------------------------
# Divide plink file by chromosome, trim SNPs that lie outside the boundaries of the genetic map
# Divide plink file by chromosome, trim SNPs that lie outside the boundaries of the genetic map

Rscript scripts/trim_bimfile.R ${INPUT} ${MAP}
plink --bfile ${INPUT} --exclude ${INPUT}_removeSNPs --make-bed --out ${INPUT}_endsTrimmed
#cat ${FILE}.chr${chr}_endsTrimmed.bim >> dat_QCed_endsTrimmed.bim

# Run SHAPEIT
shapeit -B ${INPUT}_endsTrimmed -M ${MAP} --input-ref ${REF_HAP} ${REF_LEG} ${REF_SAMP} --duohmm -W 5 -O ${OUTPUT} --output-log ${LOG}.log -T 4

files=( ${LOG}'.snp.strand' )
if (( ${#files[@]} )); then
  N=`ls -l ${LOG}.snp.strand | wc -l`
  while [ $N -gt 0 ]
  do
      grep Missing ${LOG}.snp.strand | cut -f4 > ${INPUT}_missing_from_ref.txt
      grep Strand ${LOG}.snp.strand | awk '$11==1 {print $4}' > ${INPUT}_flip_in_dataset.txt
      grep Strand ${LOG}.snp.strand | awk '$11==0 {print $4}' > ${INPUT}_cannot_flip_in_dataset.txt
      cat ${INPUT}_missing_from_ref.txt ${INPUT}_cannot_flip_in_dataset.txt > ${INPUT}_remove_from_dataset.txt
      rm -f ${LOG}.snp.strand*
      rm -f ${INPUT}_missing_from_ref.txt ${INPUT}_cannot_flip_in_dataset.txt
      plink --bfile ${INPUT}_endsTrimmed --flip ${INPUT}_flip_in_dataset.txt --exclude ${INPUT}_remove_from_dataset.txt --make-bed --out ${INPUT}_endsTrimmed_misalignRepaired
      shapeit -B ${INPUT}_endsTrimmed_misalignRepaired -M ${MAP} --input-ref ${REF_HAP} ${REF_LEG} ${REF_SAMP} --duohmm -W 5 -O ${OUTPUT} --output-log ${LOG}.log -T 4
      N=`ls -l ${LOG}.snp.strand | wc -l`
  done
fi
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from numpy import *
from scipy.interpolate import interp1d
from sys import argv

# use, e.g.:
# python shapeit_to_germline.py shapeitoutputprefix geneticmapfile
# python shapeit_to_germline.py Ethiopians_chr21_SHAPEITphased_duoHMM genetic_map_chr21_combined_b37.txt

prefix = argv[-2]
hapsfile = prefix + '.haps'
samplefile = prefix + '.sample'

hapmapfile = argv[-1]

famdata = [line.strip().split() for line in file(samplefile)][2:]

# read in haplotypes
print 'reading in haps data...'
hapsdata = [line.strip().split() for line in file(hapsfile)]
print 'splitting haps data'
mapinfo = array([line[:3] for line in hapsdata])
variants = array([line[3:5] for line in hapsdata])
alleles = array([line[5:] for line in hapsdata],dtype=int)


#hapsdata = array([line.strip().split()[:3] for line in file(hapsfile)])
bps = array( mapinfo[:,2] , dtype=int)

# read in hapmap file
mapdata = array([line.strip().split() for line in file(hapmapfile)][2:],dtype=float)
happos = mapdata[:,0]
hapcms = mapdata[:,2]

# now create interpolation function
f = interp1d(happos,hapcms,fill_value='extrapolate')

bpcms = f(bps)

# make map file
outmap = file(prefix + '.map' , 'w')
print 'writing mapfile to %s' % (outmap)
for i, line in enumerate(mapinfo):
	outmap.write('%s\t%s\t%s\t%s\n' % (line[0],line[1],bpcms[i],line[2]))


# make ped file. More annoying for sure!
# brute force for now, hopefully it runs okay (not huge files)
outped = file(prefix + '.ped' , 'w')
print 'writing pedfile to %s' % (outped)
alleles_acgt = array([variants[i][alleles[i]] for i in range(len(variants))])
for i, line in enumerate(famdata):
	if i % 20 == 0:
		print 'on individual %s' % (i)
	# zip together haplotypes 2*i and 2*i+1 so they're on the same line
	plinkalleles = array(ndarray.flatten(array(zip(alleles_acgt[:,2*i],alleles_acgt[:,2*i+1] ) ) ), dtype='S1')
	outped.write('%s %s\n' % ( ' '.join(line[:2] + line[3:]) , ' '.join(plinkalleles)))

outmap.close()
outped.close()

print 'done'
81
82
83
84
shell:
  """
  plink --bfile data/{wildcards.dataset} --chr {wildcards.chrnum} --make-bed --out {params}
  """
 99
100
101
102
shell:
  """
  bash scripts/shapeit_iterate.sh {params.in_pre} {params.inputmap} {params.log_pre} {params.out_pre} {REF}_chr{wildcards.chrnum}.hap.gz {REF}_chr{wildcards.chrnum}.legend.gz {REF}.sample
  """
114
115
116
117
shell:
  """
  python scripts/shapeit_to_germline.py {params.prefix} {params.gmap}
  """
SnakeMake From line 114 of master/Snakefile
129
130
131
132
shell:
  """
  shapeit -convert --input-haps {params.in_pre} --output-vcf {output.vcf} --output-log {output.log}
  """
141
142
143
144
145
shell:
  """
  sed -n -e '/^#/ p' prepare/chroms_phased_vcf/{wildcards.dataset}.chr1.phased.vcf > {output} #initialize allchr.vcf with headers
  for i in {input}; do sed '/^#/d' $i >> {output} ; done
  """
SnakeMake From line 141 of master/Snakefile
154
155
156
157
shell:
  """
  plink --vcf {input} --recode --out {params}
  """
168
169
170
171
shell:
  """
  plink --file {params.in_pre} --cm-map {params.gmap} --recode --out {params.out_pre}
  """
182
183
184
185
shell:
  """
  plink --vcf {input} --make-just-bim -out {params.out_pre}
  """
196
197
198
199
shell:
  """
  Rscript scripts/find_lowDensityRegs.R {input} {params} 1e6 0.05
  """
SnakeMake From line 196 of master/Snakefile
210
211
212
213
shell:
  """
  gawk ' {{ t = $1; $1 = $2; $2 = t; print; }} ' {input} > {output}
  """
SnakeMake From line 210 of master/Snakefile
223
224
225
226
shell:
  """
  GERMLINE2 -mapfile {input.map} -pedfile {input.ped} -outfile {params.prefix} -err_hom {wildcards.err} -bits {wildcards.bits} -w_extend -haploid
  """
233
234
235
236
shell:
  """
  scripts/comb_match_files.sh {DATASET}
  """
SnakeMake From line 233 of master/Snakefile
243
244
245
246
shell:
  """
  awk '{{print $1 , "0" , $2 , "0" , $3 , $4 , $5, "20"}}' {input} > {output}
  """
SnakeMake From line 243 of master/Snakefile
255
256
257
258
shell:
  """
  cat {input.ibd} | java -jar progs/merge-ibd-segments.16May19.ad5.jar {input.vcf} {input.map} 0.6 1 | awk '{{print $1, $2, $3, $4, $5, $6, $7, $9}}' > {output}
  """
SnakeMake From line 255 of master/Snakefile
267
268
269
270
shell:
  """
  Rscript scripts/get_seg_depthv3.R {input.ibd} {input.bim} > {output}
  """
SnakeMake From line 267 of master/Snakefile
277
278
279
280
shell:
  """
  Rscript scripts/get_outlier_SNPs.R {input} 3 > {output}
  """
SnakeMake From line 277 of master/Snakefile
287
288
289
290
shell:
  """
  bash scripts/define_regions.sh {input} {DATASET}
  """
SnakeMake From line 287 of master/Snakefile
298
299
300
301
shell:
  """
  Rscript scripts/filter_RoH-IBD_segs.R {input.ibd} {input.txt} 0.85 > {output}
  """
SnakeMake From line 298 of master/Snakefile
317
318
319
320
shell:
  """
  plink --bfile {params.in_pre} --homozyg --homozyg-snp {N_SNPS} --homozyg-window-missing {MISSING} --homozyg-window-het {HET} --homozyg-kb {RoH_LEN} --out {params.out_pre}
  """
330
331
332
333
shell:
  """
  Rscript scripts/join_RoHsegs.R {input.bim} {input.hom} 1e6 0.05 0.8
  """
SnakeMake From line 330 of master/Snakefile
349
350
351
352
353
shell:
  """
  Rscript scripts/find_lowDensityRegs.R {input} {params.out_pre} 1e6 0.05
  cat regions/exclude_regions_hg19.txt {output.ldr} > {output.reg}
  """
SnakeMake From line 349 of master/Snakefile
363
364
365
366
shell:
  """
  Rscript scripts/get_seg_depthv2.R {input.hom} {input.bim} > {output}
  """
SnakeMake From line 363 of master/Snakefile
378
379
380
381
382
383
shell:
  """
  Rscript scripts/get_outlier_SNPs.R {input.roh} > {output.outliers}
  Rscript scripts/get_outlier_regions.R {output.outliers} > {output.regions}
  cat {input.txt} {output.regions} > {output.txt}
  """
SnakeMake From line 378 of master/Snakefile
393
394
395
396
shell:
  """
  Rscript scripts/filter_RoH-IBD_segs.R {input.hom} {input.txt} 0.85 > {output}
  """
SnakeMake From line 393 of master/Snakefile
408
409
410
411
shell:
  """
  bash scripts/bits_matching.sh {DATASET}
  """
SnakeMake From line 408 of master/Snakefile
427
428
429
430
431
shell:
  """
  plink --bfile {params.pre_one} --maf {MAF} --make-bed --out {params.pre_two}_HWE-MAFfiltered
  plink --bfile {params.pre_two}_HWE-MAFfiltered --genome --out {params.pre_two}
  """
446
447
448
449
shell:
  """
  bash scripts/calculate_concordance.sh {DATASET}
  """
SnakeMake From line 446 of master/Snakefile
464
465
466
467
shell:
  """
  bash scripts/choose_params.sh {input.ibd} {input.norm} {wildcards.dataset}
  """
SnakeMake From line 464 of master/Snakefile
476
477
478
479
shell:
  """
  cat {input.ibd} | java -jar progs/ibdne.04Sep15.e78.jar map={input.map} out=results/IBDne_output/{wildcards.dataset}_GERMLINE2_IBDNe_minibd-{cM} minibd={cM} nthreads={IBDne_THREADs} nboots=80 gmax={GMAX}
  """
SnakeMake From line 476 of master/Snakefile
ShowHide 38 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/hennlab/snake-IBDne
Name: snake-ibdne
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...