Flashlite-Juicer: Juicer Implementation for FlashLite HPC

public 1yr ago Version: Version 1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

juicer

PBS version of https://github.com/aidenlab/juicer for Flashlite HPC.

If you find this useful, please acknowledge us, e.g:

The authors acknowledge code provided the Sydney Informatics Hub, a Core Research Facility of the University of Sydney.

Simple Intructions

To replicate the workflow from https://github.com/aidenlab/juicer/wiki/Running-Juicer-on-a-cluster follow these steps:

#Change to a directory you want to install and run your juicer analyses, e.g.
cd /30days/natbutter
#Clone this repo to that folder
git clone https://github.com/natbutter/juicer.git
#Change into the repo directory
cd juicer
#Link the PBS scripts for ease (as is done on the official repo)
ln -s PBS/scripts scripts
#Make and get references and restriction_site files (put your own here as needed)
mkdir references; cd references
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.amb
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.ann
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.bwt
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.pac
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.sa
mkdir ../restriction_sites; cd ../restriction_sites
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/restriction_sites/hg19_MboI.txt
#Get the juicer tools file to run with your version (if you need a different one to the repo)
cd ../scripts
wget https://hicfiles.tc4ga.com/public/juicer/juicer_tools.1.9.9_jcuda.0.8.jar
ln -s juicer_tools.1.9.9_jcuda.0.8.jar juicer_tools.jar
#Make a directory where you will be running your analysis (and populating with output files)
mkdir ../HIC003; cd ../HIC003
mkdir fastq; cd fastq
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R1_001.fastq.gz
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R2_001.fastq.gz

Now edit the scripts/juicer.sh script to point to the correct project, and adjust other configuration options. Note , you may need to adjust some of the hardcoded ram/cpu resource requests throughout the workflow in addition to those fixed in the config header lines.

juiceDir="/30days/natbutter/juicer"
project='NCMAS-xx99'

Finally, run the script which will launch all the jobs and submit them to the queue:

cd /30days/natbutter/juicer/HIC003
../scripts/juicer.sh

Directory structure

This is what my directory structrue looked like after setting everything up and running (to two levels deep):

natbutter@flashlite1:/30days/natbutter/juicer> tree -L 2
.
|-- HIC003
| |-- aligned
| |-- fastq
| |-- logs
| `-- splits
|-- PBS
| `-- scripts
|-- README.md
|-- references
| |-- Homo_sapiens_assembly19.fasta
| |-- Homo_sapiens_assembly19.fasta.amb
| |-- Homo_sapiens_assembly19.fasta.ann
| |-- Homo_sapiens_assembly19.fasta.bwt
| |-- Homo_sapiens_assembly19.fasta.pac
| `-- Homo_sapiens_assembly19.fasta.sa
|-- restriction_sites
| `-- hg19_MboI.txt
`-- scripts -> PBS/scripts

Significant changes and updates to get running on the Flashlite HPC

https://rcc.uq.edu.au/flashlite

Changes:

Added $project variable and spceifed in every qsub with #PBS -A $project
nodes=1:ppn:1 -> select=1:ncpus=1
Changed greps of "qstat" to cut at the correct place for Flashlite PBS
Hardcoded a few mem= and ncpus= that may need to adjusted for specific runs
Using local versions module load bwa/0.7.13 and module load Java/1.8.0_45
No GPUs on Flashlite, so the HiCCUPs step is not tested.

Original README for the PBS verison below

Very new to coding and this is my first "big" project in modifying a script. So welcome to make it better!

This PBS version is modified mainly based on the LSF version, but also took other versions as reference. Main changed to create this version is build the job dependencies to guarantee the sequential steps of each job in the original juicer.sh script. #PBS -W depend=afterok:jobID headerline is used. The launch stats steps and post processing steps are moved into two separate scripts to avoid the variable value problems (expansion) in multiple level of heredocs. These two scripts will be called from the main juicer.sh at the appropriate step. As a summary, the follwing scripts have been modified: juicer.sh split_rmdups.awk mega.sh The following two scripts has been added (extracted from the juicer.sh) launch_stats.sh postprocessing.sh

All steps has been tested till the ARROWHEAD step. *HiCCUP step has not been tested in this version because of some un-resolved error in loading Jcuda native libraries.

One important notice for successful run of this PBS version: The $groupname variable has to be maximally 7 characters long ** This length may change according to your cluster setup. See below for the reason. The jobs' dependency of this PBS version juicer is based on jobID of each step. And the jobID of each job is obtained through the qstat output based on the specific job Name, jobID_stepx= $(qstat |grep |cut -c 1-7). Unfortunately, PBS can not use job Name in building job dependency. The job Name column in qstat output has a maximum length of 16 characters by default. When the job Name exceeds this max length, the beginning part of the job Name string will be replaced by “…” and then followed by the last 13 characters (see examples below). If the job name is not fully displayed, the jobID value may be null due to the failure at grep step. Each job Name contains the $groupname, which gives each run specific timestamp and avoid's disruption from other runs of juicer from the same user. A potential cause of this problem is the ${groupname} value being too long. I have added comments in the juicer.sh script for setting this variable. Below is a example of job names being too long: $qstat Job ID Name User Time Use S Queue

2294690.pbs STDIN mzhibo 00:00:15 R batch
2294778.pbs make4Cbedgraph mzhibo 0 R batch
2294784.pbs ...dgraph1111111 mzhibo 0 Q batch
$

Zhibo

Code Snippets

res1=`cat ${splitdir}/*norm*res*`
check1=`cat ${splitdir}/*norm*res* | awk '{s2+=$2; s3+=$3; s4+=$4; s5+=$5; s6+=$6}END{if (s2 != s3+s4+s5+s6){print 0}else{print 1}}'`
if [ $check1 -eq 0 ] || [ -z "$res1" ]
then
    echo "***! Error! The statistics do not add up. Alignment likely failed to complete on one or more files. Run relaunch_prep.sh"
    echo "Stats don't add up.  Check ${outputdir} for results"
    exit 1
fi

# Check the sizes of merged_sort versus the dups/no dups files to be sure
# no reads were lost
total=1
total2=0
total=`ls -l ${outputdir}/merged_sort.txt | awk '{print $5}'`
total2=`ls -l ${outputdir}/merged_nodups.txt ${outputdir}/dups.txt ${outputdir}/opt_dups.txt | awk '{sum = sum + $5}END{print sum}'`

if [ -z $total ] || [ -z $total2 ] || [ $total -ne $total2 ]
then
    echo "***! Error! The sorted file and dups/no dups files do not add up, or were empty. Merge or dedupping likely failed, restart pipeline with -S merge or -S dedup"
    echo "Dups don't add up.  Check ${outputdir} for results"
    exit 1
fi

wctotal=`cat ${splitdir}/*_linecount.txt | awk '{sum+=$1}END{print sum/4}'`
check2=`cat ${splitdir}/*norm*res* | awk '{s2+=$2;}END{print s2}'`

if [ $wctotal -ne $check2 ]
then
    echo "***! Error! The number of reads in the fastqs (${wctotal}) is not the same as the number of reads reported in the stats (${check2}), likely due to a failure during alignment"
    echo "Reads don't add up.  Check ${outputdir} for results"
    exit 1
fi

if [ -n "$early" ]
then
    echo "(-: Pipeline successfully completed (-:";
    echo "Run cleanup.sh to remove the splits directory";
    echo "Check ${outputdir} for results"
elif [ -f ${outputdir}/inter.hic ] && [ -s ${outputdir}/inter.hic ] && [ -f ${outputdir}/inter_30.hic ] && [ -s ${outputdir}/inter_30.hic ]
then
    echo "(-: Pipeline successfully completed (-:";
    echo "Run cleanup.sh to remove the splits directory";
    echo "Check ${outputdir} for results"
else
    echo "***! Error! Either inter.hic or inter_30.hic were not created"
    echo "Either inter.hic or inter_30.hic were not created.  Check ${outputdir} for results"
    exit 1
fi

Shell From line 31 of scripts/check.sh

total=`ls -l aligned/merged_sort.txt | awk '{print $5}'`
total2=`ls -l aligned/merged_nodups.txt aligned/dups.txt aligned/opt_dups.txt | awk '{sum = sum + $5}END{print sum}'`
if [ $total -eq $total2 ] 
then 
    rm aligned/merged_sort.txt 
    rm -r splits 
    for i in fastq/*.fastq
    do
        gzip $i
    done
    gzip aligned/merged_nodups.txt
    gzip aligned/dups.txt
    gzip aligned/opt_dups.txt
    gzip aligned/abnormal.sam
    gzip aligned/collisions.txt
    gzip aligned/unmapped.sam
else 
    echo "Problem: The sum of merged_nodups and the dups files is not the same size as merged_sort.txt"
    echo "Did NOT clean up";
fi

Shell From line 28 of scripts/cleanup.sh

if [ "$usegzip" -eq 1 ]
then 
    num1=$(paste -d "" <(zcat ${name1}${ext}) <(zcat ${name2}${ext}) | grep -c $ligation)
    num2=$(zcat ${name1}${ext} | wc -l | awk '{print $1}')
else
    num1=$(paste -d "" ${name1}${ext} ${name2}${ext}| grep -c $ligation)
    num2=$(wc -l ${name1}${ext} | awk '{print $1}')
fi
echo -ne "$num1 " > ${name}${ext}_norm.txt.res.txt
echo "$num2" > ${name}${ext}_linecount.txt

Shell From line 28 of scripts/countligations.sh

use POSIX;

$site_file = "/opt/juicer/restriction_sites/hg19_DpnII.txt";
# Check arguments
if (scalar(@ARGV) == 2) {
  ($infile,$outfile) = @ARGV;
}
elsif (scalar(@ARGV) == 3) {
  ($infile,$outfile,$site_file) = @ARGV;
}
else {
  print "Usage: fragment_4dnpairs.pl <infile> <outfile> [site file]\n";
  print " <infile>: file in intermediate format to calculate statistics on\n";
  print " <outfile>: output, results of fragment search\n";  
  print " [site file]: list of restriction sites, one line per chromosome (default DpnII hg19)\n";
  exit;
}
# Global variables for calculating statistics
my %chromosomes;
my %hindIII;

# read in restriction site file and store as multidimensional array
open FILE, $site_file or die $!;

while (<FILE>) {
  my @locs = split;
  my $key = shift(@locs);
  my $ref = \@locs;
  $chromosomes{$key} = $ref;
	if ($key == "14") {
		$chromosomes{$key."m"} = $ref;  
		$chromosomes{$key."p"} = $ref;
	}
}
close(FILE);

# read in infile and calculate statistics
open INFILE, $infile or die $!;
open OUTFILE,">", $outfile or die $!;
$"="\t";
while (<INFILE>) {
  chomp;
  if(/^#columns:\s*/) { 
      if(/frag1/ || /frag2/) { die "frag columns already exist. Aborting..\n"; }
      print OUTFILE "$_ frag1 frag2\n"; next; 
  }
  elsif(/^#/) { print OUTFILE "$_\n"; next; }
  my @original_record = split;
  my @record = @original_record[5,1,2,6,3,4];

  # find upper index of position in sites array via binary search
  my $index1 = &bsearch($record[2],$chromosomes{$record[1]});
  my $index2 = &bsearch($record[5],$chromosomes{$record[4]});
  print OUTFILE "@original_record\t$index1\t$index2\n";
}
close(INFILE);
close(OUTFILE);

# Binary search, array passed by reference
# search array of integers a for given integer x
# return index where found or upper index if not found
sub bsearch {
  my ($x, $a) = @_;            # search for x in array a
  my ($l, $u) = (0, @$a - 1);  # lower, upper end of search interval
  my $i;                       # index of probe
  while ($l <= $u) {
    $i = int(($l + $u)/2);
    if ($a->[$i] < $x) {
      $l = $i+1;
    }
    elsif ($a->[$i] > $x) {
      $u = $i-1;
    } 
    else {
      return $i+1; # found
    }
  }
  return $l;         
}

Perl From line 36 of scripts/fragment_4dnpairs.pl

use POSIX;

$site_file = "/opt/juicer/restriction_sites/hg19_DpnII.txt";
# Check arguments
if (scalar(@ARGV) == 2) {
  ($infile,$outfile) = @ARGV;
}
elsif (scalar(@ARGV) == 3) {
  ($infile,$outfile,$site_file) = @ARGV;
}
else {
  print "Usage: fragment.pl <infile> <outfile> [site file]\n";
  print " <infile>: file in intermediate format to calculate statistics on\n";
  print " <outfile>: output, results of fragment search\n";  
  print " [site file]: list of restriction sites, one line per chromosome (default DpnII hg19)\n";
  exit;
}
# Global variables for calculating statistics
my %chromosomes;
my %hindIII;

# read in restriction site file and store as multidimensional array
open FILE, $site_file or die $!;

while (<FILE>) {
  my @locs = split;
  my $key = shift(@locs);
  my $ref = \@locs;
  $chromosomes{$key} = $ref;
	if ($key == "14") {
		$chromosomes{$key."m"} = $ref;  
		$chromosomes{$key."p"} = $ref;
	}
}
close(FILE);

# read in infile and calculate statistics
open INFILE, $infile or die $!;
open OUTFILE,">", $outfile or die $!;

while (<INFILE>) {
  my @record = split;

  # find upper index of position in sites array via binary search
  my $index1 = &bsearch($record[2],$chromosomes{$record[1]});
  my $index2 = &bsearch($record[5],$chromosomes{$record[4]});
  print OUTFILE $record[0] . " " . $record[1] . " " . $record[2] . " " . $index1 . " " . $record[3] . " " . $record[4] . " " . $record[5] . " " . $index2 . " ";
	for (my $i=6; $i < scalar(@record); $i++){
		print OUTFILE $record[$i] . " ";
	}
	print OUTFILE "\n";
}
close(INFILE);
close(OUTFILE);

# Binary search, array passed by reference
# search array of integers a for given integer x
# return index where found or upper index if not found
sub bsearch {
  my ($x, $a) = @_;            # search for x in array a
  my ($l, $u) = (0, @$a - 1);  # lower, upper end of search interval
  my $i;                       # index of probe
  while ($l <= $u) {
    $i = int(($l + $u)/2);
    if ($a->[$i] < $x) {
      $l = $i+1;
    }
    elsif ($a->[$i] > $x) {
      $u = $i-1;
    } 
    else {
      return $i+1; # found
    }
  }
  return $l;         
}

Perl From line 37 of scripts/fragment.pl

usageHelp="Usage: ${0} [-h] -j <juicer_tools_file_path> -i <hic_file_path>"

printHelpAndExit() {
    echo "$usageHelp"
    exit $1
}

#set defaults
genomeID="hg19"
hic_file_path="$(pwd)/aligned/inter_30.hic"
juicer_tools_path="/opt/juicer/scripts/juicer_tools"

while getopts "h:j:i:" opt; do
    case $opt in
	h) printHelpAndExit 0;;
	j) juicer_tools_path=$OPTARG ;;
	i) hic_file_path=$OPTARG ;;
	[?]) printHelpAndExit 1;;
    esac
done

## Check that juicer tools exists 
if [ ! -e "${juicer_tools_path}" ]; then
  echo "***! Can't find juicer tools in ${juicer_tools_path}";
  exit 100;
fi

## Check that hic file exists    
if [ ! -e "${hic_file_path}" ]; then
  echo "***! Can't find inter.hic in ${hic_file_path}";
  exit 100;
fi

echo -e "${juicer_tools_path} is post-processing Hi-C for ${genomeID}\nData read from ${hic_file_path}.\nMotifs read from ${bed_file_dir}\n"
echo -e "ARROWHEAD:\n"
${juicer_tools_path} arrowhead ${hic_file_path} ${hic_file_path%.*}"_contact_domains.txt"
if [ $? -ne 0 ]; then
    echo "***! Problem while running Arrowhead";
    exit 100
else
    echo -e "\n(-: Arrowhead Postprocessing successfully completed (-:"
fi

Shell From line 30 of scripts/juicer_arrowhead.sh

usageHelp="Usage: ${0} [-h] -j <juicer_tools_file_path> -i <hic_file_path> -m <bed_file_dir> -g <genome ID>"

printHelpAndExit() {
    echo "$usageHelp"
    exit $1
}

#set defaults
genomeID="hg19"
hic_file_path="$(pwd)/aligned/inter_30.hic"
juicer_tools_path="/opt/juicer/scripts/juicer_tools"
bed_file_dir="/opt/juicer/references/motif"

while getopts "h:g:j:i:m:" opt; do
    case $opt in
	h) printHelpAndExit 0;;
	j) juicer_tools_path=$OPTARG ;;
	i) hic_file_path=$OPTARG ;;
	m) bed_file_dir=$OPTARG ;; 
	g) genomeID=$OPTARG ;;
	[?]) printHelpAndExit 1;;
    esac
done

## Check that juicer_tools exists 
if [ ! -e "${juicer_tools_path}" ]; then
  echo "***! Can't find juicer tools in ${juicer_tools_path}";
  exit 100;
fi

## Check that hic file exists    
if [ ! -e "${hic_file_path}" ]; then
  echo "***! Can't find inter.hic in ${hic_file_path}";
  exit 100;
fi

## Check that bed folder exists    
if [ ! -e "${bed_file_dir}" ]; then
  echo "***! Can't find folder ${bed_file_dir}";
  exit 100;
fi

echo -e "\nHiCCUPS:\n"
${juicer_tools_path} hiccups ${hic_file_path} ${hic_file_path%.*}"_loops.txt"
if [ $? -ne 0 ]; then
    echo "***! Problem while running HiCCUPS";
    exit 100
fi

if [ -f ${hic_file_path%.*}"_loops.txt" ]
then
    echo -e "\nAPA:\n"
    ${juicer_tools_path} apa ${hic_file_path} ${hic_file_path%.*}"_loops.txt" "apa_results"
    echo -e "\nMOTIF FINDER:\n"
    ${juicer_tools_path} motifs ${genomeID} ${bed_file_dir} ${hic_file_path%.*}"_loops.txt"
    echo -e "\n(-: Feature annotation successfully completed (-:"
else
    # if loop lists do not exist but Juicer tools didn't return an error, likely 
    # too sparse
    echo -e "\n(-: Postprocessing successfully completed, maps too sparse to annotate (-:"
fi

Shell From line 30 of scripts/juicer_hiccups.sh

juiceDir="/lustre1/mzhibo/hic/apps/juicer"
# default queue, can also be set in options via -q
queue="batch"
# default queue time, can also be set in options via -Q
walltime="walltime=24:00:00"
# default long queue, can also be set in options via -l
long_queue="batch"
# default long queue time, can also be set in options via -L
long_walltime="walltime=120:00:00"
read1str="_R1"
read2str="_R2"
roupname="C$(date "+%s"|cut -c 6-11)"
## Default options, overridden by command line arguments
groupname=$(date +"%s" |cut -c 6-11)
# top level directory, can also be set in options
topDir=$(pwd)
# restriction enzyme, can also be set in options
site="DpnII"
# genome ID, default to human, can also be set in options
genomeID="dm6_clean"
# normally both read ends are aligned with long read aligner; 
# if one end is short, this is set                 
shortreadend=0
# description, default empty
about=""
nofrag=0

# Directories to be created and regex strings for listing files
splitdir=${topDir}"/splits"
donesplitdir=$topDir"/done_splits"
fastqdir=${topDir}"/fastq/*_R*.fastq"
outputdir=${topDir}"/aligned"
tmpdir=${topDir}"/HIC_tmp"
logdir=${topDir}"/logs"
read1=${splitdir}"/*${read1str}*.fastq"
    timestamp=$(date +"%s" | cut -c 4-10)
    qsub <<ALIGNWRAP
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l nodes=1:ppn=1:AMD
    #PBS -l mem=2gb
    #PBS -o ${logdir}/${timestamp}_alnwrap_${groupname}.log
    #PBS -j oe
    #PBS -N AlnWrap_${groupname}
    #PBS -M mzhibo@uga.edu
    #PBS -m a
    for i in ${read1}
    do

        countjobs=\$(( countjobs + 1 ))
        ext=\${i#*$read1str}
        name=\${i%$read1str*} 
        # these names have to be right or it'll break
        name1=\${name}${read1str}
        name2=\${name}${read2str}    
        jname=\$(basename "\$name")\${ext}
        usegzip=0
        if [ "\${ext}" == ".gz" ]
        then
            usegzip=1
        fi
       qsub <<- MRGALL
        #PBS -S /bin/bash
        #PBS -q $queue
        #PBS -l $long_walltime
        #PBS -l nodes=1:ppn=1:AMD
        #PBS -l mem=24gb
        #PBS -M mzhibo@uga.edu
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_\${jname}_merge_\${countjobs}_${groupname}.log
        #PBS -j oe
        #PBS -N Mrg_\${countjobs}${groupname}
        #PBS -v name=\${name}
        #PBS -v name1=\${name1}
        #PBS -v name2=\${name2}
        #PBS -v ext=\${ext}
        #PBS -v countjobs=\${countjobs}

        date +"%Y-%m-%d %H:%M:%S"
        export LC_ALL=C

        # sort read 1 aligned file by readname 
        # remove header, add read end indicator to read name
        awk -v name1=\${name1} -v ext=\${ext} -v name=\${name} -f ${juiceDir}/scripts/read1sort.awk \${name1}\${ext}_sort.sam > \${name1}\${ext}_sort1.sam

        #echo awk 'NF >= 11{ \$1 = \$1"/1";print}' \${name1}\${ext}_sort.sam > \${name1}\${ext}_sort1.sam'

        awk -v name1=\${name2} -v ext=\${ext} -v name=\${name} -f ${juiceDir}/scripts/read2sort.awk \${name1}\${ext}_sort.sam > \${name2}\${ext}_sort1.sam

        #awk '{if (NF>=11) {\$1=\$1"read2"; print $0}}' \${name1}\${ext}_sort.sam > \${name1}\${ext}_sort1.sam

        #awk 'NF >= 11{ \$1 = \$1"/2";print}' \${name2}\${ext}_sort.sam > \${name2}\${ext}_sort1.sam    
        # merge the two sorted read end files
        sort -T $tmpdir -k1,1 -m \${name1}\${ext}_sort1.sam \${name2}\${ext}_sort1.sam > \${name}\${ext}.sam
        if [ $? -ne 0 ]
        then
            echo "***! Failure during merge of read files"
            echo "Merge of \${name}\${ext}.sam failed"
            exit 1
        else
            # rm \${name1}\${ext}.sa* \${name2}\${ext}.sa* \${name1}\${ext}_sort*.sam \${name2}\${ext}_sort*.sam
            echo "\${name}\$next.sam created successfully."
        fi 
MRGALL
    wait

    jID_3=\$(qstat | grep "Mrg_\${countjobs}${groupname}" | cut -c 1-7 )
    echo "merging align1 and align2 \${coutjobs} id is \${jID_3}"
    echo "starting chimeric step after alignment"
    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<- CHIMERIC
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l nodes=1:ppn=1:AMD
    #PBS -l mem=24gb
    #PBS -M mzhibo@uga.edu
    #PBS -m a
    #PBS -o ${logdir}/\${timestamp}_\${jname}_chimeric_\${countjobs}_${groupname}.log
    #PBS -j oe
    #PBS -N Chmr_\${countjobs}${groupname}
    #PBS -W depend=afterok:\${jID_3}
    #PBS -v name=\${name}
    #PBS -v ext=\${ext}

    date +"%Y-%m-%d %H:%M:%S"
    export LC_ALL=C    
    # call chimeric_blacklist.awk to deal with chimeric reads; sorted file is sorted by read name at this point
    awk -v "fname1"=\${name}\${ext}_norm.txt -v "fname2"=\${name}\${ext}_abnorm.sam -v "fname3"=\${name}\${ext}_unmapped.sam -f ${juiceDir}/scripts/chimeric_blacklist.awk \${name}\${ext}.sam

    if [ \$? -ne 0 ]                                              
    then                                       
        echo "***! Failure during chimera handling of \${name}\${ext}"
        echo "Chimera handling of \${name}\${ext}.sam failed."
        exit 1                                                     
    fi  
    # if any normal reads were written, find what fragment they correspond to and store that
    if [ -e "\${name}\${ext}_norm.txt" ] && [ "$site" != "none" ] 
    then  
        ${juiceDir}/scripts/fragment.pl \${name}\${ext}_norm.txt \${name}\${ext}.frag.txt $site_file
    elif [ "$site" == "none" ]
    then
        awk '{printf("%s %s %s %d %s %s %s %d", \$1, \$2, \$3, 0, \$4, \$5, \$6, 1); for (i=7; i<=NF; i++) {printf(" %s",\$i);}printf("\n");}' \${name}\${ext}_norm.txt > \${name}\${ext}.frag.txt 
    else 
        echo "***! No \${name}\${ext}_norm.txt file created"
        echo "Creation of \${name}\${ext}_norm.txt failed. for results"
        exit 1
    fi
    if [ \$? -ne 0 ] 
    then 
        echo "***! Failure during fragment assignment of \${name}\${ext}"
        echo "Fragment assignment of \${name}\${ext}.sam failed."
        exit 1 
    fi
    # sort by chromosome, fragment, strand, and position
    sort -T $tmpdir -k2,2d -k6,6d -k4,4n -k8,8n -k1,1n -k5,5n -k3,3n \${name}\${ext}.frag.txt > \${name}\${ext}.sort.txt
    if [ \$? -ne 0 ]
    then  
        echo "***! Failure during sort of \${name}\${ext}"
        echo "Sort of \${name}\${ext}.frag.txt failed."
        exit 1                                                       
    else
        rm \${name}\${ext}_norm.txt \${name}\${ext}.frag.txt
    fi    
CHIMERIC
    wait

    jID_4=\$(qstat | grep "Chmr_\${countjobs}${groupname}" | cut -c 1-7)
    echo "chimeric \$countjobs id is \$jID_4"
    jobIDstring="\${jobIDstring}:\${jID_4}"
    echo "jobIDstring \$countjobs is \${jobIDstring}"

    # done looping over all fastq split files
    done


    # if error occored, we will kill the remaining jobs
    # output an error message of error detection and killing the remaining jobs
    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<- CKALIGNFAIL
    #PBS -S /bin/bash
    #PBS -q $queue  
    #PBS -l nodes=1:ppn=1:AMD
    #PBS -l mem=2gb
    #PBS -l $walltime
    #PBS -M mzhibo@uga.edu
    #PBS -m a
    #PBS -o ${logdir}/\${timestamp}_check_alnOK_${groupname}.log
    #PBS -j oe
    #PBS -W depend=afterok\${jobIDstring}
    #PBS -N AlnOK_${groupname}

    date +"%Y-%m-%d %H:%M:%S"
    echo "Sucess: All alignment jobs were successfully finished without failure!"
CKALIGNFAIL

    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<- CKALIGNFAILCLN
    #PBS -S /bin/bash
    #PBS -q $queue  
    #PBS -l select=1:ncpus=1:mem=4gb
    #PBS -l $walltime
    #PBS -o ${logdir}/\${timestamp}_alignfailclean_${groupname}.log
    #PBS -j oe
    #PBS -W depend=afternotok\${jobIDstring}
    #PBS -N Alncln_${groupname}

    date +"%Y-%m-%d %H:%M:%S"
    echo "Error with alignment jobs, deleting all remaining jobs of this pipeline."
    remrem=\$(qstat |grep "$groupname" |grep " Q \| H \| R " | awk 'BEGIN{FS=" "}{print $1}')                                                                echo $remrem                                                                                                                                             RemJob=\$(qstat |grep "$groupname" |grep " Q \| H \| R " | awk 'BEGIN{FS=" "}{print $1}'| cut -d '.' -f 1)                                               echo ${RemJob}  
    RemJob=\$(qstat |grep ${groupname} |grep " Q \| H \| R " | awk 'BEGIN{FS=" "}{print \$1}'| cut -c 1-7)
    qdel \$RemJob

CKALIGNFAILCLN

ALIGNWRAP
    #done fastq alignment && alignment jobs failure checking.

Shell qsub From line 1 of scripts/juicer_mergepart.sh

usageHelp="Usage: ${0} [-h] -j <juicer_tools_file_path> -i <hic_file_path> -m <bed_file_dir> -g <genome ID>"

printHelpAndExit() {
    echo "$usageHelp"
    exit $1
}

#set defaults
genomeID="9"
hic_file_path="$(pwd)/aligned/inter_30.hic"
juicer_tools_path="/opt/juicer/scripts/juicer_tools"
bed_file_dir="/opt/juicer/references/motif"

while getopts "h:g:j:i:m:" opt; do
    case $opt in
	h) printHelpAndExit 0;;
	j) juicer_tools_path=$OPTARG ;;
	i) hic_file_path=$OPTARG ;;
	m) bed_file_dir=$OPTARG ;; 
	g) genomeID=$OPTARG ;;
	[?]) printHelpAndExit 1;;
    esac
done

## Check that juicer_tools exists 
if [ ! -e "${juicer_tools_path}" ]; then
    echo "***! Can't find juicer tools in ${juicer_tools_path}";
    exit 1
fi

## Check that hic file exists    
if [ ! -e "${hic_file_path}" ]; then
    echo "***! Can't find inter.hic in ${hic_file_path}";
    exit 1
fi

echo -e "${juicer_tools_path} is post-processing Hi-C for ${genomeID}\nData read from ${hic_file_path}.\nMotifs read from ${bed_file_dir}\n"
echo -e "ARROWHEAD:\n"
${juicer_tools_path} arrowhead ${hic_file_path} ${hic_file_path%.*}"_contact_domains.txt"
if [ $? -ne 0 ]; then
    echo "***! Problem while running Arrowhead";
    exit 1
fi
echo -e "\nHiCCUPS:\n"
if hash nvcc 2>/dev/null 
then 
    ${juicer_tools_path} hiccups ${hic_file_path} ${hic_file_path%.*}"_loops.txt"
    if [ $? -ne 0 ]; then
	echo "***! Problem while running HiCCUPS";
	exit 1
    fi
else 
    echo "GPUs are not installed so HiCCUPs cannot be run";
fi

if [ -f ${hic_file_path%.*}"_loops.txt" ]
then
    echo -e "\nAPA:\n"
    ${juicer_tools_path} apa ${hic_file_path} ${hic_file_path%.*}"_loops.txt" "apa_results"
    ## Check that bed folder exists    
    if [ ! -e "${bed_file_dir}" ]; then
	echo "***! Can't find folder ${bed_file_dir}";
	echo "***! Not running motif finder";
    else
	echo -e "\nMOTIF FINDER:\n"
	${juicer_tools_path} motifs ${genomeID} ${bed_file_dir} ${hic_file_path%.*}"_loops.txt"
    fi
    echo -e "\n(-: Feature annotation successfully completed (-:"
else
  # if loop lists do not exist but Juicer Tools didn't return an error, likely 
  # too sparse
    echo -e "\n(-: Postprocessing successfully completed, maps too sparse to annotate or GPUs unavailable (-:"
fi

Shell From line 32 of scripts/juicer_postprocessing.sh

shopt -s extglob
juicer_version="1.5.6"

# timestamp the pipeline and report the juicer app version, and shell
date +"%Y-%m-%d %H:%M:%S"
echo "Juicer version:$juicer_version"
echo "$0 $@"

## Set the following variables to work with your system
# path additionals, make sure paths are correct for your system
#myPath=/bin:$PATH
# set global tmpdir so no problems with /var/tmp
## use cluster load commands:
#usePath=""
load_bwa="module load bwa/0.7.13"
load_java='module load Java/1.8.0_45'
#load_cluster=""
#load_coreutils=""
#load_cuda='module load cuda/7.5.18/gcc/4.4.7'

# Juicer directory, contains scripts/, references/, and restriction_sites/
# can also be set in options via -D
juiceDir="/30days/nathanielpeterbutterworth/juice/"
# default queue, can also be set in options via -q
queue="workq"
# default queue time, can also be set in options via -Q
walltime="walltime=24:00:00"
# default long queue, can also be set in options via -l
long_queue="workq"
# default long queue time, can also be set in options via -L
long_walltime="walltime=120:00:00"
# size to split fastqs. adjust to match your needs. 4000000=1M reads per split
# can also be changed via the -C flag
# give your email address to be used in #PBS -M to receive notifications when job error occurs.
# Must be either set with an email address or skipped
# This email is not included in the launch stat and postprocessing steps, add manually if needed
EMAIL='' #'#PBS -M nathaniel.butterworth@sydney.edu.au'
project='NCMAS-ch81' 
splitsize=90000000
# fastq files should look like filename_R1.fastq and filename_R2.fastq
# if your fastq files look different, change this value
read1str="_R1"
read2str="_R2"


# unique groupname for jobs submitted in each run. lab initial with an timestamp
# Length of $groupname in this PBS version needs to be no longer than 8 characters.
# Groupname Length limitatioin is critical, because jobID will be obtained through qstat output using jobName.
# The string being searched from qstat output is "jobstr_${groupname}", where "jobstr" is
# the job specific name string in the qsub -N option. max length of "jobstr" in this PBS version script is 7.
# our PBS cluster has an maximum jobName column width of 16.
# Change the $groupname max length accordingly based on your PBS cluster qstat output setup.
groupname="C$(date "+%s"|cut -c 6-7)"
## Default options, overridden by command line arguments

# top level directory, can also be set in options
topDir=$(pwd)
# restriction enzyme, can also be set in options
site="MboI"
# genome ID, default to human, can also be set in options
genomeID="hg19"
# normally both read ends are aligned with long read aligner;
# if one end is short, this is set
shortreadend=0
# description, default empty
about=""
nofrag=0

## Read arguments
usageHelp="Usage: ${0##*/} [-W group_list=genomeID] [-d topDir] [-q queue] [-l long queue] [-s site]\n                 [-a about] [-R end] [-S stage] [-p chrom.sizes path]\n                 [-y restriction site file] [-z reference genome file]\n                 [-C chunk size] [-D Juicer scripts directory]\n                 [-Q queue time limit] [-L long queue time limit] [-r] [-h] [-x]"
genomeHelp="* [genomeID] must be defined in the script, e.g. \"hg19\" or \"mm10\" (default \n  \"$genomeID\"); alternatively, it can be defined using the -z command"
dirHelp="* [topDir] is the top level directory (default\n  \"$topDir\")\n     [topDir]/fastq must contain the fastq files\n     [topDir]/splits will be created to contain the temporary split files\n     [topDir]/aligned will be created for the final alignment"
queueHelp="* [queue] is the queue for running alignments (default \"$queue\")"
longQueueHelp="* [long queue] is the queue for running longer jobs such as the hic file\n  creation (default \"$long_queue\")"
siteHelp="* [site] must be defined in the script, e.g.  \"HindIII\" or \"MboI\" \n  (default \"$site\")"
aboutHelp="* [about]: enter description of experiment, enclosed in single quotes"
shortHelp="* -r: use the short read version of the aligner, bwa aln\n  (default: long read, bwa mem)"
shortHelp2="* [end]: use the short read aligner on read end, must be one of 1 or 2 "
stageHelp="* [stage]: must be one of \"merge\", \"dedup\", \"final\", \"postproc\", or \"early\".\n    -Use \"merge\" when alignment has finished but the merged_sort file has not\n     yet been created.\n    -Use \"dedup\" when the files have been merged into merged_sort but\n     merged_nodups has not yet been created.\n    -Use \"final\" when the reads have been deduped into merged_nodups but the\n     final stats and hic files have not yet been created.\n    -Use \"postproc\" when the hic files have been created and only\n     postprocessing feature annotation remains to be completed.\n    -Use \"early\" for an early exit, before the final creation of the stats and\n     hic files"
pathHelp="* [chrom.sizes path]: enter path for chrom.sizes file"
siteFileHelp="* [restriction site file]: enter path for restriction site file (locations of\n  restriction sites in genome; can be generated with the script\n  misc/generate_site_positions.py)"
chunkHelp="* [chunk size]: number of lines in split files, must be multiple of 4\n  (default ${splitsize}, which equals $(awk -v ss=${splitsize} 'BEGIN{print ss/4000000}') million reads)"
scriptDirHelp="* [Juicer scripts directory]: set the Juicer directory,\n  which should have scripts/ references/ and restriction_sites/ underneath it\n  (default ${juiceDir})"
refSeqHelp="* [reference genome file]: enter path for reference sequence file, BWA index\n  files must be in same directory"
queueTimeHelp="* [queue time limit]: time limit for queue, i.e. -l 12:00 is 12 hours\n  (default ${walltime})"
longQueueTimeHelp="* [long queue time limit]: time limit for long queue, i.e. -l 168:00 is one week\n  (default ${long_walltime})"
excludeHelp="* -x: exclude fragment-delimited maps from hic file creation"
helpHelp="* -h: print this help and exit"

printHelpAndExit() {
    echo -e "$usageHelp"
    echo -e "$genomeHelp"
    echo -e "$dirHelp"
    echo -e "$queueHelp"
    echo -e "$longQueueHelp"
    echo -e "$siteHelp"
    echo -e "$aboutHelp"
    echo -e "$shortHelp"
    echo -e "$shortHelp2"
    echo -e "$stageHelp"
    echo -e "$pathHelp"
    echo -e "$siteFileHelp"
    echo -e "$refSeqHelp"
    echo -e "$chunkHelp"
    echo -e "$scriptDirHelp"
    echo -e "$queueTimeHelp"
    echo -e "$longQueueTimeHelp"
    echo "$excludeHelp"
    echo "$helpHelp"
    exit "$1"
}

while getopts "d:g:R:k:a:hrq:s:p:l:y:z:S:C:D:Q:L:x" opt; do
    case $opt in
    g) genomeID=$OPTARG ;;
    h) printHelpAndExit 0;;
    d) topDir=$OPTARG ;;
    l) long_queue=$OPTARG ;;
    q) queue=$OPTARG ;;
    s) site=$OPTARG ;;
    R) shortreadend=$OPTARG ;;
    r) shortread=1 ;;  #use short read aligner
    a) about=$OPTARG ;;
    p) genomePath=$OPTARG ;;
    y) site_file=$OPTARG ;;
    z) refSeq=$OPTARG ;;
    S) stage=$OPTARG ;;
    C) splitsize=$OPTARG ;;
    D) juiceDir=$OPTARG ;;
    Q) walltime=$OPTARG ;;
    L) long_walltime=$OPTARG ;;
    x) nofrag=1 ;; #no fragment maps
    [?]) printHelpAndExit 1;;
    esac
done

if [ ! -z "$stage" ]
then
    case $stage in
        merge) merge=1 ;;
        dedup) dedup=1 ;;
        early) earlyexit=1 ;;
        final) final=1 ;;
        postproc) postproc=1 ;;
        *)  echo "$usageHelp"
        echo "$stageHelp"
        exit 1
    esac
fi

## Set reference sequence based on genome ID
if [ -z "$refSeq" ]
then
    case $genomeID in
    mm9) refSeq="${juiceDir}/references/Mus_musculus_assembly9_norandom.fasta";;
    mm10) refSeq="${juiceDir}/references/Mus_musculus_assembly10.fasta";;
    hg38) refSeq="${juiceDir}/references/hg38.fa";;
    hg19) refSeq="${juiceDir}/references/Homo_sapiens_assembly19.fasta";;
    *)  echo "$usageHelp"
        echo "$genomeHelp"
        exit 1
    esac
else
    # Reference sequence passed in, so genomePath must be set for the .hic file
    # to be properly created
    if [ -z "$genomePath" ]
        then
        echo "***! You must define a chrom.sizes file via the \"-p\" flag that delineates the lengths of the chromosomes in the genome at $refSeq";
        exit 1;
    fi
fi

## Check that refSeq exists
if [ ! -e "$refSeq" ]; then
    echo "***! Reference sequence $refSeq does not exist";
    exit 1;
fi

## Check that index for refSeq exists
if [ ! -e "${refSeq}.bwt" ]; then
    echo "***! Reference sequence $refSeq does not appear to have been indexed. Please run bwa index on this file before running juicer.";
    exit 1;
fi

## Set ligation junction based on restriction enzyme
case $site in
    HindIII) ligation="AAGCTAGCTT";;
    DpnII) ligation="GATCGATC";;
    MboI) ligation="GATCGATC";;
    NcoI) ligation="CCATGCATGG";;
    none) ligation="XXXX";;
    *)  ligation="XXXX"
    echo "$site not listed as recognized enzyme. Using $site_file as site file"
    echo "Ligation junction is undefined"
    exit 1
esac

## If DNAse-type experiment, no fragment maps
if [ "$site" == "none" ]
then
    nofrag=1;
fi

## If short read end is set, make sure it is 1 or 2
case $shortreadend in
    0) ;;
    1) ;;
    2) ;;
    *)    echo "$usageHelp"
    echo "$shortHelp2"
    exit 1
esac

if [ -z "$site_file" ]
then
    site_file="${juiceDir}/restriction_sites/${genomeID}_${site}.txt"
fi

## Check that site file exists, needed for fragment number for merged_nodups
if [ ! -e "$site_file" ] && [ "$nofrag" -ne 1 ]
then
    echo "***! $site_file does not exist. It must be created before running this script."
    exit 1
fi

## Directories to be created and regex strings for listing files
splitdir=${topDir}"/splits"
donesplitdir=$topDir"/done_splits"
fastqdir=${topDir}"/fastq/*_R*.fastq*"
outputdir=${topDir}"/aligned"
tmpdir=${topDir}"/HIC_tmp"
logdir=${topDir}"/logs"

echo "dirs:"$fastqdir

## Check that fastq directory exists and has proper fastq files
if [ ! -d "${topDir}/fastq" ]; then
    echo "Directory \"${topDir}/fastq\" does not exist."
    echo "Create \"$topDir/fastq\" and put fastq files to be aligned there."
    echo "Type \"juicer.sh -h\" for help"
    exit 1
else
    if stat -t ${fastqdir} >/dev/null 2>&1
    then
    echo "(-: Looking for fastq files...fastq files exist"
    else
    if [ ! -d "$splitdir" ]; then
        echo "***! Failed to find any files matching ${fastqdir}"
        echo "***! Type \"juicer.sh -h \" for help"
        exit 1
    fi
    fi
fi

## Create output directory, only if not in merge, dedup, final, or postproc stages
if [[ -d "$outputdir" && -z "$final" && -z "$merge" && -z "$dedup" && -z "$postproc" ]]
then
    echo "***! Move or remove directory \"$outputdir\" before proceeding."
    echo "***! Type \"juicer.sh -h \" for help"
    exit 1
else
    if [[ -z "$final" && -z "$dedup" && -z "$merge" && -z "$postproc" ]]; then
        mkdir "$outputdir" || { echo "***! Unable to create ${outputdir}, check permissions." ; exit 1; }
    fi
fi

## Create split directory
if [ -d "$splitdir" ]; then
    splitdirexists=1
else
    mkdir "$splitdir" || { echo "***! Unable to create ${splitdir}, check permissions." ; exit 1; }
fi

## Create temporary directory, used for sort later
if [ ! -d "$tmpdir" ] && [ -z "$final" ] && [ -z "$dedup" ] && [ -z "$postproc" ]; then
    mkdir "$tmpdir"
    chmod 777 "$tmpdir"
fi

## Create a log directory, used to store log files (standout and standerr of each submitted jobs)
if [ ! -d "$logdir" ] && [ -z "$final" ] && [ -z "$dedup" ] && [ -z "$postproc" ]; then
    mkdir "$logdir"
    chmod 777 "$tmpdir"
fi

## Arguments have been checked and directories created. Now begins the real work of the pipeline
#source $usePath
#$load_cluster

## jobIDstring holds the jobIDs as they are submitted
countjobs=0
jobIDstring=""
#check the total fastq input files size, determine the threads number
#threads value could be customized according to user's job and cluster resources
fastqsize=$(ls -lL ${fastqdir} | awk '{sum+=$5}END{print sum}')
threads=2
if [ "$fastqsize" -gt "2592410750" ]
then
    threads=8
fi

testname=$(ls -l ${fastqdir} | awk 'NR==1;{print $9}')
if [ "${testname: -3}" == ".gz" ]
then
    skipsplit=1
    read1=${splitdir}"/*${read1str}*.fastq.gz"
else
    read1=${splitdir}"/*${read1str}*.fastq"
fi



## Not in merge, dedup, final, or postproc stage, i.e. need to split and align files.
if [[ -z "$final" && -z "$merge" && -z "$dedup" && -z "$postproc" ]]
then
    echo -e "(-: Aligning files matching $fastqdir\n in queue $queue to genome $genomeID with site file $site_file"

    ## Split fastq files into smaller portions for parallelizing alignment
    ## Do this by creating a text script file for the job on STDIN and then
    ## sending it to the cluster
    if [ ! $splitdirexists ]
    then
        echo "(-: Created $splitdir and $outputdir."
        if [ -n "$threads" ] && [ -z "$skipsplit" ]
        then
            echo " Splitting files"
            jIDs_spit=""
            for i in ${fastqdir}
            do
                filename=$(basename "$i")
                filename=${filename%.*}

                #wait till the previous qsub job has finished sucessfully
                #submitted job might get delayed due to time in the queue.
                timestamp=$(date +"%s" | cut -c 4-10)
                jID_split=$(qsub <<SPLITEND
                #PBS -S /bin/bash
                #PBS -q $queue
                #PBS -l $walltime
		#PBS -l select=1:ncpus=1:mem=20gb                
		#PBS -A $project
                #PBS -m a
                #PBS -o ${logdir}/${timestamp}_split_${filename}_${groupname}.log
                #PBS -j oe
                #PBS -N split_${filename}_${groupname}

                date +"%Y-%m-%d %H:%M:%S"
                echo "Split file: $filename"
                #Following line is used in coreutils >= 8.16, if not, simpler version is used below.
                #split -a 3 -l $splitsize -d --additional-suffix=.fastq ${i} $splitdir/$filename
                split -a 3 -l $splitsize -d ${i} ${splitdir}/${filename}
SPLITEND
)
                jIDs_split="${jIDs_split}:${jID_split}"
           done

            ## wait till the fastq spliting job has sucessfully finished
            ## Once split succeeds, rename the splits as fastq files
            ## PBS users change queue below to $queue
            timestamp=$(date +"%s" | cut -c 4-10)
            jID_splitmv=$(qsub << SPLITMV
            #PBS -S /bin/bash
            #PBS -q $queue
            #PBS -l $walltime
            #PBS -l select=1:ncpus=1:mem=20gb    
            #PBS -A $project
            #PBS -m a
            #PBS -o ${logdir}/${timestamp}_move_${groupname}.log
            #PBS -j oe
            #PBS -N move_${groupname}
            #PBS -W depend=afterok${jIDs_split}
            date +"%Y-%m-%d %H:%M:%S"
            for j in ${splitdir}/*;do mv \${j} \${j}.fastq;done
SPLITMV
)

        else # we're not splitting but just copying
            cp -rs ${fastqdir} ${splitdir}
            wait
        fi
    else # split dir already exists, no need to re-split fastqs
        echo -e "---  Using already created files in $splitdir\n"
    fi

    ## Launch job. Once split/move is done, set the parameters for the launch.

    echo "(-: Starting job to launch other jobs once splitting is complete"

    # Loop over all read1 fastq files and create jobs for aligning read1,
    # aligning read2, and merging the two. Keep track of merge names for final
    # merge. When merge jobs successfully finish, can launch final merge job.

    if [ -n "$threads" ] && [ -z "$skipsplit" ]
    then
        waitstring_alnwrp="#PBS -W depend=afterok:${jID_splitmv}"
    fi
    echo "waitstring_alnwrp is ${waitstring_alnwrp}"

    timestamp=$(date +"%s" | cut -c 4-10)
    qsub <<ALIGNWRAP
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l select=1:ncpus=1:mem=6gb
    #PBS -o ${logdir}/${timestamp}_alnwrap_${groupname}.log
    #PBS -j oe
    #PBS -N AlnWrp${groupname}
    #PBS -A $project
    $waitstring_alnwrp
    #PBS -m a
    for i in ${read1}
    do

        countjobs=\$(( countjobs + 1 ))
        ext=\${i#*$read1str}
        name=\${i%$read1str*}
        # these names have to be right or it'll break
        name1=\${name}${read1str}
        name2=\${name}${read2str}
        jname=\$(basename "\$name")\${ext}
        usegzip=0
        if [ "\${ext: -3}" == ".gz" ]
        then
            usegzip=1
        fi

        ## count ligations
        timestamp=\$(date +"%s" | cut -c 4-10)
        qsub <<-CNTLIG
        #PBS -S /bin/bash
        #PBS -q $queue
        #PBS -l $walltime
        #PBS -l select=1:ncpus=1:mem=4gb
	#PBS -A $project 
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_\${jname}_CntLig_\${countjobs}_${groupname}.log
        #PBS -j oe
        #PBS -N CtLig\${countjobs}${groupname}
        #PBS -v name=\${name}
        #PBS -v name1=\${name1}
        #PBS -v name2=\${name2}
        #PBS -v ext=\${ext}
        #PBS -v ligation=${ligation}
        #PBS -v usezip=\${usezip}

        date +"%Y-%m-%d %H:%M:%S"
        export usegzip=\${usegzip}
        export name=\${name}
        export name1=\${name1}
        export name2=\${name2}
        export ext=\${ext}
        export ligation=${ligation}
        ${juiceDir}/scripts/countligations.sh
CNTLIG
        jID_cntlig=\$(qstat | grep "CtLig\${countjobs}${groupname}" | cut -d '.' -f 1 )
        echo "jID_cntlig \${countjobs} id is \${jID_cntlig}"
        ## Align read1
        # align read1 fastq
        # allocate memory for alignment according to threads number
        # set-up the max memory according to the cluster set-up
        # this local cluster uses shared mem per node when in resource quest
        alloc_mem_value=\$(( threads * 5))
        echo "allocmem:"\$alloc_mem_value
	alloc_mem_vale=100
        alloc_mem=\${alloc_mem_value}"gb"
        #set the max mem cap accordingly to ensure the job resource request meet the cluster-setup
        if [ \$alloc_mem_value -gt 240 ]
        then
            alloc_mem=240gb
        fi

        echo "starting read1 alignemnt"
	echo "threads:"\$threads "mem:"\$allocmem
        timestamp=\$(date +"%s" | cut -c 4-10)
        qsub <<ALGNR1
        #PBS -S /bin/bash
        #PBS -q $queue
        #PBS -l $walltime
        # #PBS -l select=1:ncpus=\${threads}:mem=\${alloc_mem}
        #PBS -l select=1:ncpus=8:mem=80gb
        #PBS -A $project
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_\${jname}_align1_\${countjobs}_${groupname}.log
        #PBS -j oe
        #PBS -N ALN1\${countjobs}${groupname}
        #PBS -W depend=afterok:\$jID_cntlig
        #PBS -v name=\${name}
        #PBS -v name1=\${name1}
        #PBS -v ext=\${ext}

        date +"%Y-%m-%d %H:%M:%S"
        $load_bwa
        if [ -n "$shortread" ] || [ "$shortreadend" -eq 1 ]
        then
            echo 'Running command bwa aln -q 15 $refSeq \${name1}\${ext} > \${name1}\${ext}.sai && bwa samse $refSeq \${name1}\${ext}.sai \${name1}\${ext} > \${name1}\${ext}.sam'
            bwa aln -q 15 $refSeq \${name1}\${ext} > \${name1}\${ext}.sai && bwa samse $refSeq \${name1}\${ext}.sai \${name1}\${ext} > \${name1}\${ext}.sam
            if [ \$? -ne 0 ]
            then
                echo "Alignment of \${name1}\${ext} failed. Check ${topDir}/pbs.out for results"
                exit 1
            else
                echo " Short align of \${name1}\${ext}.sam done successfully"
            fi
        else
            echo 'Running command bwa mem \$threads $refSeq \${name1}\${ext} > \${name1}\${ext}.sam '
            bwa mem -t 8 $refSeq \${name1}\${ext} > \${name1}\${ext}.sam
            if [ \$? -ne 0 ]
            then
                echo "***!Error, failed to align \${name1}\${ext}"
                exit 1
            else
                echo "Mem align of \${name1}\${ext}.sam done successfully"
            fi
        fi
        echo "below is the number of lines in align1 .sam file"
        cat \${name1}\${ext}.sam |wc -l
ALGNR1
        wait
        # Get the jobID from qstat ouput by searching job specific string,"read1\${countjobs}" , in job Name.
        jID_1=\$( qstat | grep "ALN1\${countjobs}${groupname}" |cut -d '.' -f 1 )
        echo "align1 \$countjobs id is: \${jID_1}"
        # Align read2
        echo "starting read2 alignment"
        # align read2 fastq
        timestamp=\$(date +"%s" | cut -c 4-10)
        qsub <<ALGNR2
        #PBS -S /bin/bash
        #PBS -q $queue
        #PBS -l $walltime
        # #PBS -l select=1:ncpus=${threads}:mem=\$alloc_mem
        #PBS -l select=1:ncpus=8:mem=80gb
        #PBS -A $project 
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_\${jname}_align2_\${countjobs}_${groupname}.log
        #PBS -j oe
        #PBS -N ALN2\${countjobs}${groupname}
        #PBS -W depend=afterok:\$jID_cntlig
        #PBS -v name=\${name}
        #PBS -v name2=\${name2}
        #PBS -v ext=\${ext}

        date +"%Y-%m-%d %H:%M:%S"
        $load_bwa
        if [ -n "$shortread" ] || [ "$shortreadend" -eq 2 ]
        then
            echo 'Running command bwa aln -q 15 $refSeq \${name2}\${ext} > \${name2}\${ext}.sai && bwa samse $refSeq \${name2}\${ext}.sai \${name2}\${ext} > \${name2}\${ext}.sam '
            bwa aln -q 15 $refSeq \${name2}\${ext} > \${name2}\${ext}.sai && bwa samse $refSeq \${name2}\${ext}.sai \${name2}\${ext} > \${name2}\${ext}.sam
            if [ \$? -ne 0 ]
            then
                echo "Alignment of \${name2}\${ext} failed. Check ${topDir}/pbs.out for results"
                exit 1
            else
                echo "Short align of \${name2}\${ext}.sam done successfully"
            fi

        else
            echo 'Running command bwa mem \$threads $refSeq \${name2}\${ext} > \${name2}\${ext}.sam'
            bwa mem -t 8 $refSeq \${name2}\${ext} > \${name2}\${ext}.sam
            if [ \$? -ne 0 ]
            then
                exit 1
            else
                echo "Mem align of \${name2}\${ext}.sam done successfully"
            fi
        fi
        echo "below is the number of lines in aligned .sam file before sorting"
        cat \${name2}\${ext}.sam |wc -l
ALGNR2
        wait

        # Get the jobID of job above from qstat ouput using job specific string,"read2\${countjobs}" , in jobName.
        jID_2=\$(qstat | grep "ALN2\${countjobs}${groupname}" |cut -d '.' -f 1 )
        echo "align2 \${countjobs} id is: \${jID_2}"
        echo "starting merging from read1 and read2"
        # wait for align1 and align2 jobs finish,then merge
        timestamp=\$(date +"%s" | cut -c 4-10)
        qsub <<- MRGALL
        #PBS -S /bin/bash
        #PBS -q $queue
        #PBS -l $long_walltime
	#PBS -l select=1:ncpus=1:mem=24gb        
        #PBS -A $project
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_\${jname}_merge_\${countjobs}_${groupname}.log
        #PBS -j oe
        #PBS -N Mrg\${countjobs}${groupname}
        #PBS -W depend=afterok:\${jID_1}:\${jID_2}
        #PBS -v name=\${name}
        #PBS -v name1=\${name1}
        #PBS -v name2=\${name2}
        #PBS -v ext=\${ext}
        #PBS -v countjobs=\${countjobs}

        date +"%Y-%m-%d %H:%M:%S"
        export LC_ALL=C

        # sort read 1 aligned file by readname
        sort -T $tmpdir -k1,1f \${name1}\${ext}.sam > \${name1}\${ext}_sort.sam
        if [ \$? -ne 0 ]
        then
            echo "***! Error while sorting \${name1}\${ext}.sam"
            echo "Sort of \${name1}\${ext}.sam failed."
            exit 1
        else
            echo "Sort read 1 aligned file by readname completed."
        fi
        echo "below is the number of lines in sorted read1 .sam files"
        cat \${name1}\${ext}_sort.sam | wc -l

        # sort read 2 aligned file by readname
        sort -T $tmpdir -k1,1f \${name2}\${ext}.sam > \${name2}\${ext}_sort.sam
        if [ \$? -ne 0 ]
        then
            echo "***! Error while sorting \${name2}\${ext}.sam"
            echo "Sort of \${name2}\${ext}.sam failed."
            exit 1
        else
            echo "Sort read 2 aligned file by readname completed."
        fi
        echo "below is the number of lines in sorted read2 .sam files"
        cat \${name2}\${ext}_sort.sam | wc -l

        # remove header, add read end indicator to read name
        awk -f ${juiceDir}/scripts/read1_sortproc.awk \${name1}\${ext}_sort.sam > \${name1}\${ext}_sort1.sam
        awk -f ${juiceDir}/scripts/read2_sortproc.awk \${name2}\${ext}_sort.sam > \${name2}\${ext}_sort1.sam
        echo "below is the number of lines in \${name1}\${ext}_sort1.sam"
        cat \${name1}\${ext}_sort1.sam | wc -l
        echo "below is the number of lines in \${name2}\${ext}_sort1.sam"
        cat \${name2}\${ext}_sort1.sam | wc -l

        # merge the two sorted read end files
        sort -T $tmpdir -k1,1f -m \${name1}\${ext}_sort1.sam \${name2}\${ext}_sort1.sam > \${name}\${ext}.sam
        if [ $? -ne 0 ]
        then
            echo "***! Failure during merge of read files"
            echo "Merge of \${name}\${ext}.sam failed"
            exit 1
        else
            rm \${name1}\${ext}.sa* \${name2}\${ext}.sa* \${name1}\${ext}_sort*.sam \${name2}\${ext}_sort*.sam
            echo "\${name}\$\{ext}.sam created successfully."
        fi
        echo "below is the number of lines in \${name}\${ext}.sam after merging sorted aln1 and aln2"
        cat \${name}\${ext}.sam | wc -l

MRGALL
    wait

    jID_3=\$(qstat | grep "Mrg\${countjobs}${groupname}" | cut -d '.' -f 1 )
    echo "merging align1 and align2 \${coutjobs} id is \${jID_3}"
    echo "starting chimeric step after alignment"
    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<- CHIMERIC
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l select=1:ncpus=1:mem=24gb
    #PBS -A $project
    #PBS -m a
    #PBS -o ${logdir}/\${timestamp}_\${jname}_chimeric_\${countjobs}_${groupname}.log
    #PBS -j oe
    #PBS -N Chmr\${countjobs}${groupname}
    #PBS -W depend=afterok:\${jID_3}
    #PBS -v name=\${name}
    #PBS -v ext=\${ext}

    date +"%Y-%m-%d %H:%M:%S"
    export LC_ALL=C
    # call chimeric_blacklist.awk to deal with chimeric reads; sorted file is sorted by read name at this point
    awk -v fname1="\${name}\${ext}_norm.txt" -v fname2="\${name}\${ext}_abnorm.sam" -v fname3="\${name}\${ext}_unmapped.sam" -f ${juiceDir}/scripts/chimeric_blacklist.awk \${name}\${ext}.sam

    if [ \$? -ne 0 ]
    then
        echo "***! Failure during chimera handling of \${name}\${ext}"
        echo "Chimera handling of \${name}\${ext}.sam failed."
        exit 1
    fi
    # if any normal reads were written, find what fragment they correspond to and store that
    if [ -e "\${name}\${ext}_norm.txt" ] && [ "$site" != "none" ]
    then
        ${juiceDir}/scripts/fragment.pl \${name}\${ext}_norm.txt \${name}\${ext}.frag.txt $site_file
    elif [ "$site" == "none" ]
    then
        awk -f ${juiceDir}/scripts/chimeric_nonsites.awk \${name}\${ext}_norm.txt > \${name}\${ext}.frag.txt
    else
        echo "***! No \${name}\${ext}_norm.txt file created"
        echo "Creation of \${name}\${ext}_norm.txt failed."
        exit 1
    fi
    if [ \$? -ne 0 ]
    then
        echo "***! Failure during fragment assignment of \${name}\${ext}"
        echo "Fragment assignment of \${name}\${ext}.sam failed."
        exit 1
    fi
    # sort by chromosome, fragment, strand, and position
    echo "Sorting..."
    echo \${name}\${ext}
    sort -T $tmpdir -k2,2d -k6,6d -k4,4n -k8,8n -k1,1n -k5,5n -k3,3n \${name}\${ext}.frag.txt > \${name}\${ext}.sort.txt
    if [ \$? -ne 0 ]
    then
        echo "***! Failure during sort of \${name}\${ext}"
        echo "Sort of \${name}\${ext}.frag.txt failed."
        exit 1
    else
        echo "removing temperary files \${name}\${ext}_norm.txt \${name}\${ext}.frag.txt"
        rm \${name}\${ext}_norm.txt \${name}\${ext}.frag.txt
    fi
CHIMERIC
    wait

    jID_4=\$(qstat | grep "Chmr\${countjobs}${groupname}" | cut -d '.' -f 1)
    echo "chimeric \$countjobs id is \$jID_4"
    exitstatus=\$(qstat -f \${jID_4} |grep "exit_status" )
    echo "the exit status of \${jID_4} is \${exitstatus}"
    jobIDstring="\${jobIDstring}:\${jID_4}"
    echo "jobIDstring \$countjobs is \${jobIDstring}"

    # done looping over all fastq split files
    done


    # if error occored, we will kill the remaining jobs
    # output an error message of error detection and killing the remaining jobs
    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<- CKALIGNFAIL
    #PBS -S /bin/bash
    #PBS -q $queue  
    #PBS -l select=1:ncpus=1:mem=2gb   
    #PBS -l $walltime
    #PBS -A $project
    #PBS -m a
    #PBS -o ${logdir}/\${timestamp}_check_alnOK_${groupname}.log
    #PBS -j oe
    #PBS -W depend=afterok\${jobIDstring}
    #PBS -N AlnOK_${groupname}

    date +"%Y-%m-%d %H:%M:%S"
    echo "Sucess: All alignment jobs were successfully finished without failure!"
CKALIGNFAIL

    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<- CKALIGNFAILCLN
    #PBS -S /bin/bash
    #PBS -q $queue  
    #PBS -l select=1:ncpus=1:mem=4gb    
    #PBS -l $walltime
    #PBS -A $project
    #PBS -o ${logdir}/\${timestamp}_alignfailclean_${groupname}.log
    #PBS -j oe
    #PBS -W depend=afternotok\${jobIDstring}
    #PBS -N Alncln${groupname}

    date +"%Y-%m-%d %H:%M:%S"
    echo "Error with alignment jobs, deleting all remaining jobs of this pipeline."
    echo "next line.."
    RemJob=\$(qstat |grep "$groupname" |grep " Q \| H \| R " | awk 'BEGIN{FS=" "}{print $1}'| cut -d '.' -f 1)    
    echo \${RemJob}
    echo "trying qdel with no slash"
    qdel ${RemJob}
    echo "trying qdel with slash"
    qdel \${RemJob}

CKALIGNFAILCLN

ALIGNWRAP
    #done fastq alignment && alignment jobs failure checking.

fi
jID_alignwrap=$( qstat | grep AlnWrp${groupname} | cut -d '.' -f 1 )
if [ -z $merge ]
then
    waitstring_mrgsrtwrp="#PBS -W depend=afterok:${jID_alignwrap}"
fi
echo "below is the jID_alignwrap jobid"
echo ${jID_alignwrap}
echo ${waitstring_mrgsrtwrp}
if [ -z $final ] && [ -z $dedup ] && [ -z $postproc ]
then
    ## merge the sorted files into one giant file that is also sorted.
    ## change queue below to $long_queue
    timestamp=$(date +"%s" | cut -c 4-10)
    qsub <<MRGSRTWRAP
    #PBS -S /bin/bash
    #PBS -q $queue  
    #PBS -l select=1:ncpus=1:mem=24gb
    #PBS -l $walltime
    #PBS -A $project
    #PBS -m a
    #PBS -o ${logdir}/${timestamp}_mergesortwrap_${groupname}.log
    #PBS -j oe
    #PBS -N MStWrp${groupname}
    ${waitstring_mrgsrtwrp}
    date +"%Y-%m-%d %H:%M:%S"
    echo "all alignment done, all splitting and alignment jobs succeeded!" 
    jID_alnOK=\$( qstat | grep AlnOK_${groupname} | cut -d '.' -f 1 )
    echo "jID_aln-OK job id is \$jID_alnOK "
    timestamp=\$(date +"%s" | cut -c 4-10)
    if [ -z $merge ]
    then
        waitstring_alnOK="#PBS -W depend=afterok:\${jID_alnOK}"
    fi
    echo "waitstring_anlOK is \${waitstring_alnOK}"
    echo "below without backslash"
    echo ${waitstring_alnOK}
    echo "below with backslash"
    echo \${waitstring_alnOK}
    qsub <<MRGSRT
        #PBS -S /bin/bash
        #PBS -q $queue  
	#PBS -l select=1:ncpus=1:mem=24gb        
        #PBS -l $walltime
        #PBS -A $project
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_fragmerge_${groupname}.log
        #PBS -j oe
        #PBS -N frgmrg${groupname}
        \${waitstring_alnOK}
        date +"%Y-%m-%d %H:%M:%S"
        export LC_ALL=C
        if [ -d $donesplitdir ]
        then
            mv $donesplitdir/* $splitdir/.
        fi
        if ! sort -T $tmpdir -m -k2,2d -k6,6d -k4,4n -k8,8n -k1,1n -k5,5n -k3,3n $splitdir/*.sort.txt  > $outputdir/merged_sort.txt
        then
            echo "***! Some problems occurred somewhere in creating  sorted align files."
        else
            echo "Finished sorting all sorted files into a single merge."
            rm -r ${tmpdir}
        fi
MRGSRT
        jID_mrgsrt=\$( qstat | grep frgmrg${groupname} | cut -d '.' -f 1 )

        ##kill all remaining jobs if previous mergesort step exited with error
        timestamp=\$(date +"%s" | cut -c 4-10)
        qsub <<MRGSRTFAILCK
        #PBS -S /bin/bash
        #PBS -q $queue  
	#PBS -l select=1:ncpus=1:mem=2gb        
        #PBS -l $walltime
        #PBS -A $project
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_clean1_${groupname}.log
        #PBS -j oe
        #PBS -N clean1${groupname}
        #PBS -W depend=afternotok:\${jID_mrgsrt}

        date +"%Y-%m-%d %H:%M:%S"
        echo "Error with merging sorted files job, ${jID_mrgsort}, deleting all remaining jobs of this pipeline."
        RemJob1=\$(qstat |grep "$groupname" |grep " Q \| H \| R " | awk 'BEGIN{FS=" "}{print $1}'| cut -d '.' -f 1)
        qdel \${RemJob1}
MRGSRTFAILCK
MRGSRTWRAP
fi
wait
jID_mrgsrtwrap=$( qstat| grep MStWrp${groupname} | cut -d '.' -f 1 )
if [ -z $dedup ]
then
    waitstring_RDpWrp="#PBS -W depend=afterok:${jID_mrgsrtwrap}"
fi
echo "waitstring_RDpWrp is below:"
echo ${waitstring_RDpWrp}
wait
if [ -z $final ] && [ -z $postproc ]
then
    ##remove duplicates from the big sorted file if merge sorted job exited successfully
    timestamp=$(date +"%s" | cut -c 4-10)
    qsub <<RMDUPWRAP
    #PBS -S /bin/bash
    #PBS -q $queue  
    #PBS -l select=1:ncpus=1:mem=4gb
    #PBS -l $walltime
    #PBS -A $project
    #PBS -m a
    #PBS -o ${logdir}/${timestamp}_rmdupwrap_${groupname}.log
    #PBS -j oe
    #PBS -N RDpWrp${groupname}
    ${waitstring_RDpWrp}
    date +"%Y-%m-%d %H:%M:%S"
    echo ${waitstring_RDpWrp}
    jID_mrgsrt=\$( qstat | grep frgmrg${groupname} | cut -d '.' -f 1 )
    if [ -z $dedup ]
    then
        waitstring_osplit="#PBS -W depend=afterok:\${jID_mrgsrt}"
    fi
    echo "jID_mrgsrt jobid is \$jID_mrgsrt "
    echo "waitstring_osplit is:\${waitstring_osplit}"
    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<RMDUPLICATE
        #PBS -S /bin/bash
        #PBS -q $queue  
	#PBS -l select=1:ncpus=1:mem=4gb        
        #PBS -l $walltime
        #PBS -A $project
        #PBS -m a
        #PBS -o ${logdir}/\${timestamp}_osplit_${groupname}.log
        #PBS -j oe
        #PBS -N osplit${groupname}
        #PBS -v timestamp=\${timestamp}
        \${waitstring_osplit}
        date +"%Y-%m-%d %H:%M:%S"
        echo "Sucess: All mergefragments jobs were successfully finished!"
        echo "now starts to remove duplicates from the big sorted file"
        awk -v project=${project} -v queue=${long_queue} -v outfile=${logdir}/\${timestamp}_awksplit_rmdunps -v juicedir=${juiceDir} -v dir=$outputdir -v groupname=$groupname -v walltime=$long_walltime -f ${juiceDir}/scripts/split_rmdups.awk $outputdir/merged_sort.txt
RMDUPLICATE

RMDUPWRAP
fi

jID_rmdupwrap=$( qstat | grep RDpWrp${groupname} | cut -d '.' -f 1 )
echo "jID_rmdupwrap ID: $jID_rmdupwrap"
wait
if [ -z "$genomePath" ]
then
    #If no path to genome is given, use genome ID as default.
    genomePath=$genomeID
fi

# if early exit, we stop here, once the merged_nodups.txt file is created.
if [ -z "$earlyexit" ]
then
    waitstring0="#PBS -W depend=afterok:${jID_rmdupwrap}"

    #Skip if post-processing only is required
    if [ -z $postproc ]
    then
        if [ -z $final ]
        then
            echo "final not set, superwrap1 job depend=afterok:jID_rmdupwrap"
        else
            waitstring0=""
        fi
        echo "waitstring0 is: $waitstring0"
        timestamp=$(date +"%s" | cut -c 4-10)
		qsub <<SUPERWRAP1
        #PBS -S /bin/bash
        #PBS -q $queue
	#PBS -l select=1:ncpus=1:mem=1gb        
        #PBS -l $walltime
        #PBS -A $project
        #PBS -m a
        #PBS -o ${logdir}/${timestamp}_superwrap1_${groupname}.log
        #PBS -j oe
        #PBS -N SpWrp1${groupname}
        ${waitstring0}
        timestamp=$(date +"%s" | cut -c 4-10)
        echo "start submitting the lauch job!"
        export groupname=$groupname
        export juiceDir=$juiceDir
        export load_java="${load_java}"
        export about=$about
        export site_file=$site_file
        export ligation=$ligation
        export logdir=${logdir}
        export outputdir=$outputdir
        export splitdir=$splitdir
        export nofrag=$nofrag
        export genomePath=$genomePath
        export final=$final
        export queue=$queue
        export walltime=$walltime
        export long_walltime=$long_walltime
        export nofrag=$nofrag
        export project=$project
	${juiceDir}/scripts/launch_stats.sh
SUPERWRAP1
    fi
    jID_superwrap1="$( qstat | grep SpWrp1${groupname} | cut -d '.' -f 1 )"
    wait
    if [ -z $postproc ]
    then
        waitstring6="#PBS -W depend=afterany:${jID_superwrap1}"
    fi
    wait
    echo "waitstring6 is : ${waitstring6}"

    timestamp=$(date +"%s" | cut -c 4-10)
    qsub <<SUPERWRAP2
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l select=1:ncpus=1:mem=4gb   
    #PBS -o ${logdir}/${timestamp}_super_wrap2_${groupname}.log
    #PBS -j oe
    #PBS -N SpWrp2${groupname}
    #PBS -A $project
    #PBS -m a
    ${waitstring6}
    wait
    export groupname=$groupname
    export load_java="${load_java}"
    export load_cuda="${load_cuda}"
    export juiceDir=$juiceDir
    export genomeID=$genomeID
    export outputdir=$outputdir
    export logdir=${logdir}
    export splitdir=$splitdir
    export queue=$queue
    export walltime=$walltime
    export long_walltime=$long_walltime
    export postproc=$postproc
    export project=$project
    jID_launch=\$(qstat | grep Lnch_${groupname} | cut -d '.' -f 1)
    echo \$jID_launch
    echo "waitstring3 is :\${waitstring3} : probably blank here, executing postprocessing.sh"
    ${juiceDir}/scripts/postprocessing.sh
    echo "done postprocessing.sh"
SUPERWRAP2



## After removing duplicates,if early exit is set,we directly go to the final check step.
else
    echo "earlyexit is set, stat,hic, and postprocess were not done."
    timestamp=$(date +"%s" | cut -c 4-10)
    qsub <<FINCK2
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l select=1:ncpus=1:mem=1gb
    #PBS -o ${logdir}/${timestamp}_prep_done_${groupname}.out
    #PBS -j oe
    #PBS -A $project
    #PBS -m a
    #PBS -N prepd_${groupname}
    #PBS -W depend=afterok:${jID_rmdupwrap}
    date +"%Y-%m-%d %H:%M:%S"

    jID_osplit=\$( qstat | grep osplit${groupname} | cut -d '.' -f 1 )
    jID_rmsplit=\$( qstat | grep RmSplt${groupname} | cut -d '.' -f 1)        
    wait
    timestamp=\$(date +"%s" | cut -c 4-10)
    qsub <<PREPDONE
        #PBS -S /bin/bash
        #PBS -q $queue
        #PBS -l $walltime
	#PBS -l select=1:ncpus=1:mem=1gb
        #PBS -o ${logdir}/\${timestamp}_done_${groupname}.log
        #PBS -j oe
        #PBS -N ${groupname}_done
        #PBS -A $project
        #PBS -m a
        #PBS -W depend=afterok:\${jID_osplit}:\${jID_rmsplit}

        date +"%Y-%m-%d %H:%M:%S"
        export splitdir=${splitdir}
        export outputdir=${outputdir}
        ${juiceDir}/scripts/check.sh
PREPDONE
FINCK2

fi
echo "Finished adding all jobs... please wait while processing."
qstat

Shell BWA oschmod qsub extglob From line 66 of scripts/juicer.sh

echo "check the variables:"
echo "${juiceDir}/scripts/launch_stats.sh"
echo "genomename: $groupname"
echo "juiceDir: $juiceDir"
echo "about: $about"
echo "site_file: $site_file"
echo "logdir: ${logdir}"
echo "ligation: $ligation"
echo "outputdir: $outputdir"
echo "splitdir: $splitdir"
echo "nofrag: $nofrag"
echo "final: $final"
echo "genomePath: $genomePath"
echo "queue $queue"
echo "resolutions: $resolutions"
echo "walltime: $walltime"
echo "long_walltime: $long_walltime"
echo "project: $project"
echo $load_java
jID_osplit=$( qstat | grep osplit${groupname} | cut -d '.' -f 1 )

if [ -z $final ]
then
	 waitstring2="#PBS -W depend=afterok:${jID_osplit}"
fi

echo "jID_osplit: $jID_osplit"
echo "waitstring2 is"
echo $waitstring2
timestamp=$(date +"%s" | cut -c 4-10)
qsub <<DOSTAT
#PBS -S /bin/bash
#PBS -q $queue
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -l $walltime
#PBS -o ${logdir}/${timestamp}_launch_${groupname}.log
#PBS -j oe
#PBS -N Lnch_${groupname}
#PBS -A $project
${waitstring2}

date +"%Y-%m-%d %H:%M:%S"
echo "Alignment and merge done, launching the stats job"
#jID_osplit=\$( qstat | grep osplit${groupname} | cut -d '.' -f 1 )
jID_rmsplit=\$( qstat | grep RmSplt${groupname} | cut -d '.' -f 1)

if [ -z $final ]
then
    waitstring22="#PBS -W depend=afterok:\${jID_rmsplit}"
fi
echo "waitstring22 is below"
echo ${waitstring22}
echo "jID_rmsplit value is: \${jID_rmsplit}"
echo "this is the value of waitstring22: \${waitstring22}"
timestamp=\$(date +"%s" | cut -c 4-10)
echo "start sbumitting stats job"
qsub <<STATS0
	#PBS -S /bin/bash
	#PBS -q $queue
	#PBS -l select=1:ncpus=1:mem=20gb
	#PBS -l $walltime
	\${waitstring22}
	#PBS -o ${logdir}/\${timestamp}_stats0_${groupname}.log
	#PBS -j oe
	#PBS -N stats0${groupname}
	#PBS -A $project

	date +"%Y-%m-%d %H:%M:%S"
	$load_java
	export _JAVA_OPTIONS=-Xmx16384m; 
	export LC_ALL=en_US.UTF-8

	tail -n1 $headfile | awk '{printf"%-1000s\n", \\\$0}' > $outputdir/inter.txt 
	${juiceDir}/scripts/statistics.pl -s $site_file -l $ligation -o $outputdir/stats_dups.txt $outputdir/dups.txt
	cat $splitdir/*.res.txt | awk -f ${juiceDir}/scripts/stats_sub.awk >> $outputdir/inter.txt
	java -cp ${juiceDir}/scripts/ LibraryComplexity $outputdir inter.txt >> $outputdir/inter.txt
	${juiceDir}/scripts/statistics.pl -s $site_file -l $ligation -o $outputdir/inter.txt -q 1 $outputdir/merged_nodups.txt

STATS0

qsub <<STATS30
	#PBS -S /bin/bash
	#PBS -q $queue
	#PBS -l select=1:ncpus=1:mem=20gb 
	#PBS -l $walltime
	#PBS -o ${logdir}/\${timestamp}_stats_${groupname}.log
	#PBS -j oe
	#PBS -N stats30${groupname}
	#PBS -A $project
	\${waitstring22}

	date +"%Y-%m-%d %H:%M:%S"
	$load_java
	export _JAVA_OPTIONS=-Xmx16384m; 
	export LC_ALL=en_US.UTF-8
	echo -e 'Experiment description: $about' > $outputdir/inter_30.txt; 
	tail -n1 $headfile | awk '{printf"%-1000s\n", \\\$0}' > $outputdir/inter_30.txt 
	cat $splitdir/*.res.txt | awk -f ${juiceDir}/scripts/stats_sub.awk >> $outputdir/inter_30.txt
	java -cp ${juiceDir}/scripts/ LibraryComplexity $outputdir inter_30.txt >> $outputdir/inter_30.txt
	${juiceDir}/scripts/statistics.pl -s $site_file -l $ligation -o $outputdir/inter_30.txt -q 30 $outputdir/merged_nodups.txt

STATS30


qsub <<- ABNORMAL
	#PBS -S /bin/bash
	#PBS -q $queue
	#PBS -l select=1:ncpus=1:mem=60gb 
	#PBS -l $long_walltime
	#PBS -o ${logdir}/\${timestamp}_abnormal_${groupname}.log
	#PBS -j oe
	#PBS -N abnorm_${groupname}
	#PBS -A $project
	\$waitstring22

	cat $splitdir/*_abnorm.sam > $outputdir/abnormal.sam
	cat $splitdir/*_unmapped.sam > $outputdir/unmapped.sam
	awk -f ${juiceDir}/scripts/collisions.awk $outputdir/abnormal.sam > $outputdir/collisions.txt
ABNORMAL

jID_stats0=\$( qstat | grep stats0${groupname} | cut -d '.' -f 1)

wait
echo "this is the value of jID_stats: \${jID_stats}"
timestamp=\$(date +"%s" | cut -c 4-10)
echo "start submitting hic job"
qsub <<- HICWORK
	#PBS -S /bin/bash
	#PBS -q $queue
	#PBS -l select=1:ncpus=2:mem=60gb
	#PBS -l $long_walltime
	#PBS -o ${logdir}/\${timestamp}_hic0_${groupname}.log
	#PBS -j oe
	#PBS -N hic0_${groupname}
	#PBS -W depend=afterok:\${jID_stats0}
	#PBS -A $project

	date +"%Y-%m-%d %H:%M:%S"
	echo "finished stats job,now launching the hic job."    
	${load_java}
	export _JAVA_OPTIONS=-Xmx16384m
	if [ \"$nofrag\" -eq 1 ]
	then
		echo "running juicer_tools 1st..." 
		${juiceDir}/scripts/juicer_tools pre -s $outputdir/inter.txt -g $outputdir/inter_hists.m -q 1 $outputdir/merged_nodups.txt $outputdir/inter.hic $genomePath
	else 
		echo "running juicer_tools 2nd..."
		${juiceDir}/scripts/juicer_tools pre -f $site_file -s $outputdir/inter.txt -g $outputdir/inter_hists.m -q 1 $outputdir/merged_nodups.txt $outputdir/inter.hic $genomePath
	fi
HICWORK
jID_stats30=\$( qstat | grep stats30${groupname} | cut -d '.' -f 1)
timestamp=\$(date +"%s" | cut -c 4-10)
echo "start submitting hic30 job"
qsub <<- HIC30WORK
	#PBS -S /bin/bash
	#PBS -q $queue
	#PBS -l select=1:ncpus=2:mem=60gb
	#PBS -l $long_walltime
	#PBS -o ${logdir}/\${timestamp}_hic30_${groupname}.log
	#PBS -j oe
	#PBS -N hic30_${groupname}
	#PBS -W depend=afterok:\${jID_stats30}
	#PBS -A $project

	date +"%Y-%m-%d %H:%M:%S"
	$load_java
	export _JAVA_OPTIONS=-Xmx16384m
	export LC_ALL=en_US.UTF-8
	echo "running hic30"
	echo $nofrag
	echo -e 'Experiment description: $about' > $outputdir/inter_30.txt
	cat $splitdir/*.res.txt | awk -f ${juiceDir}/scripts/stats_sub.awk >> $outputdir/inter_30.txt
	java -cp ${juiceDir}/scripts/ LibraryComplexity $outputdir inter_30.txt >> $outputdir/inter_30.txt
	${juiceDir}/scripts/statistics.pl -s $site_file -l $ligation -o $outputdir/inter_30.txt -q 30 $outputdir/merged_nodups.txt
	if [  \"$nofrag\" -eq 1 ]
	then
		echo "running hic30 1st..." 
	${juiceDir}/scripts/juicer_tools pre -s $outputdir/inter_30.txt -g $outputdir/inter_30_hists.m -q 30 $outputdir/merged_nodups.txt $outputdir/inter_30.hic $genomePath
	else 
		echo "running hic30 2nd..."
		${juiceDir}/scripts/juicer_tools pre -f $site_file -s $outputdir/inter_30.txt -g $outputdir/inter_30_hists.m -q 30 $outputdir/merged_nodups.txt $outputdir/inter_30.hic $genomePath
	fi
HIC30WORK

DOSTAT

Shell qsub From line 2 of scripts/launch_stats.sh

juicer_version="1.5" 
## Set the following variables to work with your system

## use cluster load commands:
#usePath=""
load_java="module load java/jdk1.8.0_131"
load_cuda="module load cuda/7.5.18/gcc/4.4.7"
# Juicer directory, contains scripts/ and restriction_sites/
juiceDir="/lustre1/mzhibo/hic/apps/juicer"
# default queue and time
#queue="batch"
#walltime="walltime=12:00:00"

# unique name for jobs in this run
groupname="C$(date +"%s"|cut -c 6-11)"


## Default options, overridden by command line arguments

# top level directory, can also be set in options
topDir=$(pwd)
# restriction enzyme, can also be set in options
site="MboI"
# genome ID, default to human, can also be set in options
genomeID="hg19"

## Read arguments
usageHelp="Usage: ${0##*/} -g genomeID [-d topDir] [-s site] [-r resolutions] [-hx]"
genomeHelp="   genomeID must be defined in the script, e.g. \"hg19\" or \"mm10\" (default \"$genomeID\")"
dirHelp="   [topDir] is the top level directory (default \"$topDir\") and must contain links to all merged_nodups files underneath it"
siteHelp="   [site] must be defined in the script, e.g.  \"HindIII\" or \"MboI\" (default \"$site\"); alternatively, this can be the restriction site file"
resolutionsHelp="   [resolutions] is a comma-delimited list of resolutions, such as 10000,5000,1000,5f (default is 2.5M,1M,500K,250K,100K,50K,25K,10K,5K in base pair and 500f,250f,100f,50f,25f,10f,5f,2f,1f)"
excludeHelp="   -x: exclude fragment-delimited maps from Hi-C mega map (will run much faster)"
helpHelp="   -h: print this help and exit"

printHelpAndExit() {
    echo "$usageHelp"
    echo "$genomeHelp"
    echo "$dirHelp"
    echo "$siteHelp"
    echo "$resolutionsHelp"
    echo "$excludeHelp"
    echo "$helpHelp"
    exit "$1"
}

while getopts "d:g:r:h:x:s" opt; do
    case $opt in
	g) genomeID=$OPTARG ;;
	h) printHelpAndExit 0;;
	d) topDir=$OPTARG ;;
	s) site=$OPTARG ;;
	x) exclude=1 ;;
	r) resolutions=$OPTARG ;;
	[?]) printHelpAndExit 1;;
    esac
done
echo $site
## Set ligation junction based on restriction enzyme
case $site in
    HindIII) ligation="AAGCTAGCTT";;
    DpnII) ligation="GATCGATC";;
    MboI) ligation="GATCGATC";;
    none) ligation="XXXX";;
    *)  ligation="XXXX"
      site_file=$site
      echo "$site not listed as recognized enzyme, so trying it as site file."
      echo "Ligation junction is undefined";;
esac
echo $ligation
if [ -z "$site_file" ]
then
    site_file="${juiceDir}/restriction_sites/${genomeID}_${site}.txt"
fi
echo $site_file
## Check that site file exists, needed for fragment number for merged_nodups
if [ ! -e "$site_file" ] && [ "$site" != "none" ]
then
    echo "***! $site_file does not exist. It must be created before running this script."
    echo "The site file is used for statistics even if fragment delimited maps are excluded"
    exit 100
fi

## Directories to be created and regex strings for listing files
megadir=${topDir}"/mega"
outputdir=${megadir}"/aligned"
tmpdir=${megadir}"/HIC_tmp"
# set global tmpdir so no problems with /var/tmp
export TMPDIR=${tmpdir}
#output messages
logdir=${megadir}"/debug"

## Check for existing merge_nodups files:

merged_count=`find -L ${topDir} | grep merged_nodups.txt | wc -l`
if [ "$merged_count" -lt "1" ]
then
	echo "***! Failed to find at least one merged_nodups files under ${topDir}"
	exit 100
fi

merged_names1=$(find -L ${topDir} | grep merged_nodups.txt)
merged_names=$(echo $merged_names1 | tr '\n' ' ')
inter_names=$(find -L ${topDir} | grep inter.txt | tr '\n' ' ')

if [[ $merged_names == *".txt.gz"* ]]
then
    gzipped=1
    echo "***! Mega map of gzipped files not yet supported, please unzip before running."
    exit 100
    # we need to unzip here
    for i in $merged_names1
    do
	if [[ $i != *".txt.gz"* ]]
	then
	    echo "***! Mixture of gzipped and unzipped merged_nodups files"
	    echo "Ensure that the merged_nodups are all either unzipped or gzipped then rerun"
	    echo "Files: $merged_names"
	    exit 100
	fi
    done
fi
## Create output directory, exit if already exists
if [[ -d "${outputdir}" ]] 
then
    echo "***! Move or remove directory \"${outputdir}\" before proceeding."
    exit 101			
else
    mkdir -p ${outputdir}
fi

## Create temporary directory
if [ ! -d "$tmpdir" ]; then
    mkdir $tmpdir
    chmod 777 $tmpdir
fi

## Create log directory
if [ ! -d "$logdir" ]; then
    mkdir $logdir
    chmod 777 $logdir
fi

if [ -n "$resolutions" ]; then
    resolutions="-r $resolutions"
fi

## Arguments have been checked and directories created. Now begins
## the real work of the pipeline
qsub -o ${logdir}/header.log -j oe -q batch -N ${groupname}_cmd <<-EOF
  date
  echo "Juicer version:$juicer_version"
  echo "$0 $@"
EOF

jid1=$(qsub -o ${logdir}/topstats.log -j oe -N ${groupname}_Tstats -l mem=20gb -l walltime=24:00:00 -l nodes=1:ppn=1:AMD -q batch <<-TOPSTATS
export LC_ALL=C
if ! awk -f ${juiceDir}/scripts/makemega_addstats.awk ${inter_names} > ${outputdir}/inter.txt
then  
    echo "***! Some problems occurred somewhere in creating top stats files."
    exit 100
else
    cp ${outputdir}/inter.txt ${outputdir}/inter_30.txt
fi
TOPSTATS
)
jobIDstr=${jid1}
# Merge all merged_nodups.txt files found under current dir
jid2=$(qsub -o ${logdir}/merge.log -j oe -q batch -N ${groupname}_merge -l mem=20gb -l walltime=24:00:00 -l nodes=1:ppn=1:AMD <<- MRGSRT
if ! sort -T ${tmpdir} -m -k2,2d -k6,6d ${merged_names} > ${outputdir}/merged_nodups.txt
then 
echo "***! Some problems occurred somewhere in merging sorted merged_nodups files."
    exit 100
else
echo "Finished sorting all merged_nodups files into a single merge."
  rm -r ${tmpdir}
fi
MRGSRT
)
jobIDstr="${jobIDstr}:${jid2}"

# Create statistics files for MQ > 0
jid3=$(qsub -o ${logdir}/inter0.log -j oe -q batch -N ${groupname}_inter0 -l mem=20gb -l walltime=24:00:00 -l nodes=1:ppn=1:AMD -W depend=afterok:${jid1}:${jid2} <<- INTER0
${juiceDir}/scripts/statistics.pl -q 1 -o${outputdir}/inter.txt -s $site_file -l $ligation ${outputdir}/merged_nodups.txt
INTER0
)
jobIDstr="${jobIDstr}:${jid3}"

# Create statistics files for MQ > 30
jid4=$(qsub -o ${logdir}/inter30.log -j oe -q batch -N ${groupname}_inter30 -l mem=20gb -l walltime=24:00:00 -l nodes=1:ppn=1:AMD -W depend=afterok:${jid1}:${jid2}  <<- INTER30
${juiceDir}/scripts/statistics.pl -q 30 -o${outputdir}/inter_30.txt -s $site_file -l $ligation ${outputdir}/merged_nodups.txt 
INTER30
)
jobIDstr="${jobIDstr}:${jid4}"

# Create HIC maps file for MQ > 0
jid5=$(qsub -o ${logdir}/hic0_${groupname}.log -j oe -q batch -M ${EMAIL} -m ae -N ${groupname}_hic0 -l mem=40gb -l walltime=168:00:00 -l nodes=1:ppn=1:AMD -W depend=afterok:${jid4} <<- HIC0
$load_java
if [ -z "$exclude" ]
then
  echo "Launching ${juiceDir}/scripts/juicer_tools pre ${resolutions} -f ${site_file} -s ${outputdir}/inter.txt -g ${outputdir}/inter_hists.m -q 1 ${outputdir}/merged_nodups.txt ${outputdir}/inter.hic ${genomeID}"
  ${juiceDir}/scripts/juicer_tools pre ${resolutions} -f ${site_file} -s ${outputdir}/inter.txt -g ${outputdir}/inter_hists.m -q 1 ${outputdir}/merged_nodups.txt ${outputdir}/inter.hic ${genomeID}
else
  echo "Launching ${juiceDir}/scripts/juicer_tools pre ${resolutions} -s ${outputdir}/inter.txt -g ${outputdir}/inter_hists.m -q 1 ${outputdir}/merged_nodups.txt ${outputdir}/inter.hic ${genomeID}"
  ${juiceDir}/scripts/juicer_tools pre ${resolutions} -s ${outputdir}/inter.txt -g ${outputdir}/inter_hists.m -q 1 ${outputdir}/merged_nodups.txt ${outputdir}/inter.hic ${genomeID}
fi
HIC0
)
jobIDstr="${jobIDstr}:${jid5}"
# Create HIC maps file for MQ > 30
jid6=$(qsub -o ${logdir}/hic30_${groupname}.log -j oe -q batch -M ${EMAIL} -m ae -N ${groupname}_hic30 -l mem=60gb -l walltime=168:00:00 -l nodes=1:ppn=1:AMD -W depend=afterok:${jid4} <<- HIC30
$load_java
if [ -z "${exclude}" ]
then
    echo "Launching ${juiceDir}/scripts/juicer_tools pre ${resolutions} -f ${site_file} -s ${outputdir}/inter_30.txt -g ${outputdir}/inter_30_hists.m -q 30 ${outputdir}/merged_nodups.txt ${outputdir}/inter_30.hic ${genomeID}"
    ${juiceDir}/scripts/juicer_tools pre ${resolutions} -f ${site_file} -s ${outputdir}/inter_30.txt -g ${outputdir}/inter_30_hists.m -q 30 ${outputdir}/merged_nodups.txt ${outputdir}/inter_30.hic ${genomeID}
else
   echo "Launching ${juiceDir}/scripts/juicer_tools pre ${resolutions} -s ${outputdir}/inter_30.txt -g ${outputdir}/inter_30_hists.m -q 30 ${outputdir}/merged_nodups.txt ${outputdir}/inter_30.hic ${genomeID}"
   ${juiceDir}/scripts/juicer_tools pre ${resolutions} -s ${outputdir}/inter_30.txt -g ${outputdir}/inter_30_hists.m -q 30 ${outputdir}/merged_nodups.txt ${outputdir}/inter_30.hic ${genomeID}
fi
HIC30
)
jobIDstr="${jobIDstr}:${jid6}"

# Create loop and domain lists file for MQ > 30
jid7=$(qsub  -o ${logdir}/hiccups.log -j oe -q batch -N ${groupname}hiccups -l mem=60gb -l walltime=100:00:00 -l nodes=1:ppn=1:gpus=1:GPU -W depend=afterok:${jid6} <<- HICCUPS
$load_java
$load_cuda
echo $PBS_GPUFILE
export _JAVA_OPTIONS=-Xmx16384m;
export LC_ALL=C
${juiceDir}/scripts/juicer_hiccups.sh -j ${juiceDir}/scripts/juicer_tools -i ${outputdir}/inter_30.hic -m ${juiceDir}/references/motif -g ${genomeID}
B
HICCUPS

)

jobIDstr="${jobIDstr}:${jid7}"

jid8=$(qsub -o ${logdir}/arrowhead.log -j oe -q batch -N ${groupname}_ArwHead -l mem=60gb -l walltime=100:00:00 -l nodes=1:ppn=1:gpus=1:GPU -W depend=afterok:${jid6} <<- ARROWHEAD
$load_java
$load_cuda
echo $PBS_GPUFILE
export _JAVA_OPTIONS=-Xmx16384m;
export LC_ALL=C
${juiceDir}/scripts/juicer_arrowhead.sh -j ${juiceDir}/scripts/juicer_tools -i ${outputdir}/inter_30.hic
ARROWHEAD
)

qsub -o ${logdir}/done.log -j oe -q batch -N ${groupname}_done -W depend=afterok:${jobIDstr} <<- FINAL
echo "All jobs finished processing!"
FINAL
qsub -o ${logdir}/done.out -j oe -q batch -N ${groupname}_fail -W depend=afternotok:${jobIDstr} <<- FINAL
echo "Error occored in placing the jobs. Please check err file of each step to find out"
FINAL

Shell oschmod qsub From line 35 of scripts/mega.sh

jID_launch=$( qstat | grep Lnch_${groupname} | cut -d '.' -f 1 )
if [ -z $postproc ]
then
    waitstring3="#PBS -W depend=afterok:${jID_launch}"
fi
echo "waitstring3 is: ${waitstring3}"
echo $project
timestamp=$(date +"%s" | cut -c 4-10)
echo "running postprocwrap"
qsub <<- POSTPROCWRAP
#PBS -S /bin/bash
#PBS -q $queue
#PBS -A $project
#PBS -l $walltime
#PBS -l select=1:ncpus=1:mem=4g
#PBS -o ${logdir}/${timestamp}_postproc_wrap_${groupname}.log
#PBS -j oe
#PBS -N PPrWrp${groupname}
${waitstring3}

date +"%Y-%m-%d %H:%M:%S"

jID_hic30=\$(qstat | grep hic30_${groupname} |cut -d '.' -f 1)
echo \$jID_hic30
echo "try this..."
echo \$(qstat | grep hic30_${groupname} )
echo "--------"

if [ -z $postproc ]
then
    waitstring4="#PBS -W depend=afterok:\${jID_hic30}" 
fi
echo "waitstring4 is : \${waitstring4}"

timestamp=\$(date +"%s" | cut -c 4-10)
echo "running postprocess"
qsub <<POSTPROCESS
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $long_walltime
    #PBS -l select=1:ncpus=1:mem=60g
    #PBS -o ${logdir}/\${timestamp}_postproc_${groupname}.log
    #PBS -j oe
    #PBS -N PProc_${groupname}
    #PBS -A $project
    \$waitstring4
	#This formerly had an ngpus=1
    date +"%Y-%m-%d %H:%M:%S"
    $load_java
    $load_cuda
    module list 
    export _JAVA_OPTIONS=-Xmx16384m
    export LC_ALL=en_US.UTF-8

    ${juiceDir}/scripts/juicer_postprocessing.sh -j ${juiceDir}/scripts/juicer_tools -i ${outputdir}/inter_30.hic -m ${juiceDir}/references/motif -g $genomeID
echo "done postprocess"
POSTPROCESS
echo "done postprocwrap"
POSTPROCWRAP

jID_postprocwrap=$( qstat |grep PPrWrp${groupname} | cut -d '.' -f 1 )
echo $jID_postprowrap
wait
timestamp=$(date +"%s" | cut -c 4-10)
echo "running finck"
qsub <<- FINCK
#PBS -S /bin/bash
#PBS -q $queue  
#PBS -l $walltime
#PBS -o ${logdir}/${timestamp}_prep_done_${groupname}.log
#PBS -j oe
#PBS -N Pdone_${groupname}
#PBS -W depend=afterok:${jID_postprocwrap}
#PBS -A $project
#PBS -l select=1:ncpus=1:mem=50gb

date +"%Y-%m-%d %H:%M:%S"    
jID_hic30=\$(qstat | grep hic30_${groupname} |cut -d '.' -f 1)
jID_stats0=\$(qstat | grep stats0${groupname} |cut -d '.' -f 1)
jID_stats30=\$(qstat | grep stats30${groupname} |cut -d '.' -f 1)
jID_hic=\$(qstat | grep hic0_${groupname} |cut -d '.' -f 1)
jID_postproc=\$(qstat | grep PProc_${groupname} |cut -d '.' -f 1)

waitstring5="#PBS -W depend=afterok:\${jID_postproc}"
if [ -z $postproc ]
then
    waitstring5="#PBS -W depend=afterok:\${jID_hic30}:\${jID_stats0}:\${jID_stats30}:\${jID_hic}:\${jID_postproc}"
fi
timestamp=\$(date +"%s" | cut -c 4-10)
echo "running done"
qsub <<DONE
    #PBS -S /bin/bash
    #PBS -q $queue
    #PBS -l $walltime
    #PBS -l select=1:ncpus=1:mem=4g
    #PBS -o ${logdir}/\${timestamp}_done_${groupname}.log
    #PBS -j oe
    #PBS -N done_${groupname}
    #PBS -A $project
    \${waitstring5}

    date +"%Y-%m-%d %H:%M:%S"    
    export splitdir=${splitdir}
    export outputdir=${outputdir}
    ${juiceDir}/scripts/check.sh
echo "done done"
DONE
echo "done finck"
FINCK

Shell qsub From line 2 of scripts/postprocessing.sh

ls -l splits > ls_splits

awk '($9 ~/_R1/ && $9 ~/fastq$/) || ($9 ~/_R1/ && $9 ~/gz$/) {split($9, a, "_R1");  print a[1]a[2], $9}($9 ~/_R2/ && $9 ~/fastq$/) || ($9 ~/_R2/ && $9 ~/gz$/){split($9,a,"_R2");print a[1]a[2], $9}' ls_splits > fastq.txt
awk '{name="splits/"$1"_norm.txt.res.txt"; if ((getline line < name) > 0){ split(line, a); if (length(a)<6 || a[2] != a[3]+a[4]+a[5]+a[6] || a[2] == 0){print "mv splits/"$2, "not_done; rm -f splits/"$1"*; rm -f splits/"$2"*;"} close(name);}else {print "mv splits/"$2, "not_done; rm -f splits/"$1"*; rm -f splits/"$2"*;"}}' fastq.txt > mv_me.sh
mkdir not_done
chmod 755 mv_me.sh
./mv_me.sh
if [ "$(ls -A not_done)" ]
    then
        if [ -d done_splits ]
        then
            mv splits/* done_splits/.
            rmdir splits
        else
            mv splits done_splits
        fi
        mv not_done splits
else
    rmdir not_done
    echo "Fastqs are aligned, run juicer.sh with -S merge flag";
fi
rm fastq.txt ls_splits mv_me.sh

Shell oschmod From line 29 of scripts/relaunch_prep.sh

use File::Basename;
use POSIX;
use List::Util qw[min max];
use Getopt::Std;
use vars qw/ $opt_s $opt_l $opt_d $opt_o $opt_q $opt_h /;

# Check arguments
getopts('s:l:o:q:h');

my $site_file = "/opt/juicer/restriction_sites/hg19_DpnII2.txt";
my $ligation_junction = "GATCGATC";
my $stats_file = "stats.txt";
my $mapq_threshold = 1;

if ($opt_h) {
    print "Usage: statistics.pl -s[site file] -l[ligation] -o[stats file] -q[mapq threshold] <infile>\n";
    print " <infile>: file in intermediate format to calculate statistics on, can be stream\n";
    print " [site file]: list of HindIII restriction sites, one line per chromosome (default $site_file)\n";
    print " [ligation]: ligation junction (default $ligation_junction)\n";
    print " [stats file]: output file containing total reads, for library complexity (default $stats_file)\n";
    print " [mapq threshold]: mapping quality threshold, do not consider reads < threshold (default $mapq_threshold)\n";
    exit;
}
if ($opt_s) {
  $site_file = $opt_s;
}
if ($opt_l) {
  $ligation_junction = $opt_l;
}
if ($opt_o) {
  $stats_file = $opt_o;
}
if ($opt_q) {
  $mapq_threshold = $opt_q;
}
if (scalar(@ARGV)==0) {
  print STDOUT "No input file specified, reading from input stream\n";
}

my $dangling_junction = substr $ligation_junction, length($ligation_junction)/2;

# Global variables for calculating statistics
my %chromosomes;
my %hindIII;
my %mapQ;
my %mapQ_inter;
my %mapQ_intra;
my %innerM;
my %outerM;
my %rightM;
my %leftM;
my $three_prime_end=0;
my $five_prime_end=0;
my $total = 0;
my $dangling = 0;
my $ligation = 0;
my $inner = 0;
my $outer = 0;
my $left = 0;
my $right = 0;
my $inter = 0;
my $intra = 0;
my $small = 0;
my $large = 0;
my $very_small = 0;
my $very_small_dangling = 0;
my $small_dangling = 0;
my $large_dangling = 0;
my $inter_dangling = 0;
my $true_dangling_intra_small = 0;
my $true_dangling_intra_large = 0;
my $true_dangling_inter = 0;
my $total_current = 0;
my $under_mapq = 0;
my $intra_fragment = 0;
my $unique = 0;
# logspace bins
my @bins = (10,12,15,19,23,28,35,43,53,66,81,100,123,152,187,231,285,351,433,534,658,811,1000,1233,1520,1874,2310,2848,3511,4329,5337,6579,8111,10000,12328,15199,18738,23101,28480,35112,43288,53367,65793,81113,100000,123285,151991,187382,231013,284804,351119,432876,533670,657933,811131,1000000,1232847,1519911,1873817,2310130,2848036,3511192,4328761,5336699,6579332,8111308,10000000,12328467,15199111,18738174,23101297,28480359,35111917,43287613,53366992,65793322,81113083,100000000,123284674,151991108,187381742,231012970,284803587,351119173,432876128,533669923,657933225,811130831,1000000000,1232846739,1519911083,1873817423,2310129700,2848035868,3511191734,4328761281,5336699231,6579332247,8111308308,10000000000);

if (index($site_file, "none") != -1) {
   #no restriction enzyme, no need for RE distance
 }
else {
  # read in restriction site file and store as multidimensional array
  open FILE, $site_file or die $!;
  while (<FILE>) {
    my @locs = split;
    my $key = shift(@locs);
    my $ref = \@locs;
    $chromosomes{$key} = $ref;
  }
  close(FILE);
}
# read in infile and calculate statistics
#open FILE, $infile or die $!;
while (<>) {
  $unique++;
	my @record = split;
	my $num_records = scalar(@record);
  # don't count as Hi-C contact if fails mapq or intra fragment test
  my $countme = 1;

  if (($record[1] eq $record[5]) && $record[3] == $record[7]) {
    $intra_fragment++;
    $countme = 0;
  }
	elsif ($num_records > 8) {
		my $mapq_val = min($record[8],$record[11]);
    if ($mapq_val < $mapq_threshold) {
      $under_mapq++;
      $countme = 0;
    }
  }


  if ($countme) {
    $total_current++;
    # position distance
    my $pos_dist = abs($record[2] - $record[6]);

    my $hist_dist = &bsearch($pos_dist,\@bins);	 

    my $is_dangling = 0;
    # one part of read pair has unligated end
    if ($num_records > 8 && ($record[10] =~ m/^$dangling_junction/ || $record[13] =~ m/^$dangling_junction/)) {
      $dangling++;
      $is_dangling=1;
    }
    # look at chromosomes
    if ($record[1] eq $record[5]) {
      $intra++;
      # determine right/left/inner/outer ordering of chromosomes/strands
      if ($record[0] == $record[4]) {
        if ($record[0] == 0) {
          if ($pos_dist >= 20000) {
            $right++;
          }
          $rightM{$hist_dist}++;
        }
        else {
          if ($pos_dist >= 20000) {
            $left++;
          }
          $leftM{$hist_dist}++;
        }
      }
      else {
        if ($record[0] == 0) {
          if ($record[2] < $record[6]) {
            if ($pos_dist >= 20000) {
              $inner++;
            }
            $innerM{$hist_dist}++;
          }
          else {
            if ($pos_dist >= 20000) {
              $outer++;
            }
            $outerM{$hist_dist}++;
          }
        }
        else {
          if ($record[2] < $record[6]) {
            if ($pos_dist >= 20000) {
              $outer++;
            }
            $outerM{$hist_dist}++;
          }
          else {
            if ($pos_dist >= 20000) {
              $inner++;
            }
            $innerM{$hist_dist}++;
          }
        }
      }
      # intra reads less than 20KB apart
      if ($pos_dist < 10) {
        $very_small++;
        if ($is_dangling) {
          $very_small_dangling++;
        }
      }
      elsif ($pos_dist < 20000) {
        $small++;
        if ($is_dangling) {
          $small_dangling++;
        }
      }
      else {
        $large++;
        if ($is_dangling) {
          $large_dangling++;
        }
      }
    }
    else {
      $inter++;
      if ($is_dangling) {
        $inter_dangling++;
      }
    }
    if ($num_records > 8) {
      my $mapq_val = min($record[8],$record[11]);
      if ($mapq_val <= 200) {
        $mapQ{$mapq_val}++;
        if ($record[1] eq $record[5]) {
          $mapQ_intra{$mapq_val}++;
        }
        else {
          $mapQ_inter{$mapq_val}++;
        }
      }
      # read pair contains ligation junction
      if ($record[10] =~ m/$ligation_junction/ || $record[13] =~ m/$ligation_junction/) {
        $ligation++;
      }
    }
    # determine distance from nearest HindIII site, add to histogram
    if (index($site_file, "none") == -1) {
    my $report = (($record[1] != $record[5]) || ($pos_dist >= 20000));
    my $dist = &distHindIII($record[0], $record[1], $record[2], $record[3], $report);
    if ($dist <= 2000) {
      $hindIII{$dist}++;
    }

    $dist = &distHindIII($record[4], $record[5], $record[6], $record[7], $report);
    if ($dist <= 2000) {
      $hindIII{$dist}++;
    }
    }
    if ($is_dangling) {
      if ($record[10] =~ m/^$dangling_junction/) {
        $dist = &distHindIII($record[0], $record[1], $record[2], $record[3], 1);
      }
      else { #	$record[13] =~ m/^$dangling_junction/) 
        $dist = &distHindIII($record[4], $record[5], $record[6], $record[7], 1);
      }
      if ($dist == 1) {
        if ($record[1] == $record[5]) {
          if ($pos_dist < 20000) {
            $true_dangling_intra_small++;
          }
          else {
            $true_dangling_intra_large++;
          }
        }
        else {
          $true_dangling_inter++;
        }
      }
    }
  }
 }

my($statsfilename, $directories, $suffix)= fileparse($stats_file, qr/\.[^.]*/);
my $histsfile = $directories . $statsfilename . "_hists.m";

my $seq=0;
if (-e $stats_file) {
  open FILE, $stats_file or die $!;
  while (<FILE>) {
    if (/Sequenced/g) {
      ($label, $reads) = split(':', $_);
      $seq=1;
      $reads =~ s/,//g;
      $reads =~ s/ //g;
    }
  } 
  close FILE;
}
open FILE, " >> $stats_file", or die $!;
if ($unique==0) {
	$unique=1;
}

print FILE "Intra-fragment Reads: " . commify($intra_fragment);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $intra_fragment*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $intra_fragment*100/$unique; 

print FILE "Below MAPQ Threshold: " . commify($under_mapq);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $under_mapq*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $under_mapq*100/$unique; 

print FILE "Hi-C Contacts: " . commify($total_current);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $total_current*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $total_current*100/$unique; 

printf FILE " Ligation Motif Present: %s ", commify($ligation);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $ligation*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $ligation*100/$unique; 

if ($five_prime_end + $three_prime_end > 0) {
  my $f1 = $three_prime_end*100/($five_prime_end + $three_prime_end);
  my $f2 = $five_prime_end*100/($five_prime_end + $three_prime_end);
  printf FILE " 3' Bias (Long Range): %0.0f\%", $f1;
  printf FILE " - %0.0f\%\n", $f2;
}
else {
  print FILE " 3' Bias (Long Range): 0\% \- 0\%\n";
}
if ($large > 0) {
	printf FILE " Pair Type %(L-I-O-R): %0.0f\%", $left*100/$large;
  printf FILE " - %0.0f\%", $inner*100/$large;
  printf FILE " - %0.0f\%", $outer*100/$large;
  printf FILE " - %0.0f\%\n", $right*100/$large;
}
else {
	print FILE " Pair Type %(L-I-O-R): 0\% - 0\% - 0\% - 0\%\n";
}

printf FILE "Inter-chromosomal: %s ", commify($inter);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $inter*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $inter*100/$unique; 

printf FILE "Intra-chromosomal: %s ", commify($intra);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $intra*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $intra*100/$unique; 

printf FILE "Short Range (<20Kb): %s ", commify($small);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $small*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $small*100/$unique; 

printf FILE "Long Range (>20Kb): %s ", commify($large);
if ($seq == 1) {
  printf FILE " (%0.2f\% / ", $large*100/$reads; 
}
else {
  print FILE "(";
}
printf FILE "%0.2f\%)\n", $large*100/$unique; 

close FILE;

open FILE, "> $histsfile", or die $!; 
print FILE "A = [\n";
for (my $i=1; $i <= 2000; $i++) {
	my $tmp =	 $hindIII{$i} || 0;
	print FILE "$tmp ";
}
print FILE "\n];\n";
print FILE "B = [\n";
for (my $i=0; $i <= 200; $i++) {
	my $tmp = $mapQ{$i} || 0;
	my $tmp2 = $mapQ_intra{$i} || 0;
	my $tmp3 = $mapQ_inter{$i} || 0;
	print FILE "$tmp $tmp2 $tmp3\n ";
}
print FILE "\n];\n";
print FILE "D = [\n";
for (my $i=0; $i < scalar(@bins); $i++) {
	my $tmp = $innerM{$i} || 0;
	print FILE "$tmp ";
	$tmp = $outerM{$i} || 0;
	print FILE "$tmp ";
	$tmp = $rightM{$i} || 0;
	print FILE "$tmp ";
	$tmp = $leftM{$i} || 0;
	print FILE "$tmp\n";
}

print FILE "\n];";
print FILE "x = [\n";
for (my $i=0; $i < scalar(@bins); $i++) {
	print FILE "$bins[$i] ";
}
print FILE "\n];\n";
close FILE;

# Find distance to nearest HindIII restriction site
sub distHindIII {
	# find upper index of position in sites array via binary search
	my $index = $_[3];
	# get distance to each end of HindIII fragment
	my $dist1;
	if ($index == 0) {
		# first fragment, distance is position
		$dist1 =	$_[2];	
	}
	else {
		$dist1 = abs($_[2] - $chromosomes{$_[1]}[$index-1]);
	}
	my $dist2 = abs($_[2] - $chromosomes{$_[1]}[$index]);

	# get minimum value -- if (dist1 <= dist2), it's dist1, else dist2
	my $retval = $dist1 <= $dist2 ? $dist1 : $dist2; 
	# get which end of the fragment this is, 3' or 5' (depends on strand)
	if ($retval == $dist1 && $_[4]) {
		$_[0] == 0 ? $five_prime_end++ : $three_prime_end++;
	}
	elsif ($retval == $dist2 && $_[4]) {
		$_[0] == 16 ? $five_prime_end++ : $three_prime_end++;
	}
	return $retval;
}

# Binary search, array passed by reference
# search array of integers a for given integer x
# return index where found or upper index if not found
sub bsearch {
	my ($x, $a) = @_;		 # search for x in array a
	my ($l, $u) = (0, @$a - 1);	 # lower, upper end of search interval
	my $i;	      		 # index of probe
	while ($l <= $u) {
		$i = int(($l + $u)/2);
		if ($a->[$i] < $x) {
	    $l = $i+1;
		}
		elsif ($a->[$i] > $x) {
	    $u = $i-1;
		}
		else {
	    return $i; # found
		}
	}
	return $l;				 # not found, return upper
}
sub commify {
    my $text = reverse $_[0];
    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
    return scalar reverse $text;
}