IRIS: Isoform peptides from RNA splicing for Immunotherapy target Screening

public public 1yr ago Version: v2.0.1 0 bookmarks

Quick guide

Dependencies

Core dependencies (required for major IRIS functions/steps - format, screen, and predict)

Other dependencies (required for processing raw RNA-Seq and MS data)

Installation

1. Download

1.1 Download IRIS program

The IRIS program can be downloaded directly from the repository, as shown below:

git clone https://github.com/Xinglab/IRIS.git
cd IRIS

IRIS is designed to make use of a computing cluster to improve performance. For users who want to enable cluster execution for functions that support it (see Configure for details), please update the contents of snakemake_profile/ to ensure compatibility with the available compute environment.

1.2 Download IRIS db

IRIS loads a big-data reference database of splicing events and other genomic annotations. These data are included in IRIS_data.v2.0.0 (a Google Drive link; size of entire folder is ~400 GB; users can select reference groups to download). The files need to be placed under ./IRIS_data/

The files can be automatically downloaded with google_drive_download.py . Downloading a large amount of data with the API requires authentication:

  • https://cloud.google.com/docs/authentication/production

  • https://cloud.google.com/bigquery/docs/authentication/service-account-file

To use the script, first create a service account:

  • Go to google cloud console -> IAM & Admin -> Service Accounts -> create service account

  • Give the new account: role=owner

  • Click the new service account email on the service account page

  • Download a .json key by clicking: keys -> add key -> create new key -> json

That .json key is passed to google_drive_download.py

1.3 Download IEDB MHC I prediction tools

Download IEDB_MHC_I-2.15.5.tar.gz from the IEDB website (see Dependencies ). Create a folder named IEDB/ in the IRIS folder, then move the downloaded gz file to IEDB/ . From http://tools.iedb.org/main/download/

  • click "MHC Class I"

  • click "previous version"

  • find and download version 2.15.5

The manual download is needed because there is a license that must be accepted.

2. Install

./install can automatically install most dependencies to conda environments:

  • conda must already be installed for the script to work

    • https://docs.conda.io/en/latest/miniconda.html
  • The install script will check if IRIS_data/ has been downloaded

  • The install script will check if IEDB tools has been downloaded

Under the IRIS folder, to install IRIS core dependencies , do:

./install core

To install optional dependencies not needed for the most common IRIS usage:

./install all

3. Configure for compute cluster

Snakefile describes the IRIS pipeline. The configuration for running jobs can be set by editing snakemake_profile/ . The provided configuration adapts IRIS to use Slurm. Other compute environments can be supported by updating this directory

4. Known issues

  • The conda install of Python 2 may give an error like ImportError: No module named _sysconfigdata_x86_64_conda_linux_gnu

    • Check for the error by activating conda_env_2 and running python

    • Resolve with commands similar to

      • cd conda_env_2/lib/python2.7/

      • cp _sysconfigdata_x86_64_conda_cos6_linux_gnu.py _sysconfigdata_x86_64_conda_linux_gnu.py

  • IRIS uses --label-string to determine which fastq files are for read 1 and read 2

    • To avoid any issues name your fastq files so that they end with 1.fastq and 2.fastq to indicate which file represents which pair of the read

Usage

  • For streamlined AS-derived target discovery, please follow major functions and run the corresponding toy example.

  • For customized pipeline development, please check all functions of IRIS.

This flowchart shows how the IRIS functions are organized

iris_diagram

Individual functions

IRIS provides individual functions/steps, allowing users to build pipelines for their customized needs. IRIS_functions.md describes each model/step, including RNA-seq preprocessing, HLA typing, proteo-transcriptomic MS searching, visualization, etc.

usage: IRIS [-h] [--version]
positional arguments:
 {format,screen,predict,epitope_post,process_rnaseq,makesubsh_mapping,makesubsh_rmats,makesubsh_rmatspost,exp_matrix,makesubsh_extract_sjc,extract_sjc,sjc_matrix,index,translate,pep2epitope,screen_plot,screen_sjc,append_sjc,annotate_ijc,screen_cpm,append_cpm,screen_novelss,screen_sjc_plot,makesubsh_hla,parse_hla,ms_makedb,ms_search,ms_parse,visual_summary}
 format Format AS matrices from rMATS, followed by indexing
 for IRIS
 screen Identify AS events of varying degrees of tumor
 association and specificity using an AS reference
 panel
 predict Predict and annotate AS-derived TCR (pre-prediction)
 and CAR-T targets
 epitope_post Post-prediction step to summarize predicted TCR
 targets
 process_rnaseq Process RNA-Seq FASTQ files to quantify gene
 expression and AS
 makesubsh_mapping Make submission shell scripts for running
 'process_rnaseq'
 makesubsh_rmats Makes submission shell scripts for running rMATS-turbo
 'prep' step
 makesubsh_rmatspost
 Make submission shell scripts for running rMATS-turbo
 'post' step
 exp_matrix Make a merged gene expression matrix from multiple
 cufflinks results
 makesubsh_extract_sjc
 Make submission shell scripts for running
 'extract_sjc'
 extract_sjc Extract SJ counts from STAR-aligned BAM file and
 annotates SJs with number of uniquely mapped reads
 that support the splice junction.
 sjc_matrix Make SJ count matrix by merging SJ count files from a
 specified list of samples. Performs indexing of the
 merged file
 index Index AS matrices for IRIS
 translate Translate AS junctions into junction peptides
 pep2epitope Wrapper to run IEDB for peptide-HLA binding prediction
 screen_plot Make stacked/individual violin plots for list of AS
 events
 screen_sjc Identify AS events of varying degrees of tumor
 specificity by comparing the presense-absense of
 splice junctions using a reference of SJ counts
 append_sjc Append "screen_sjc" result as an annotation to PSI-
 based screening results and epitope prediction results
 in a specified screening output folder
 annotate_ijc Annotate inclusion junction count info to PSI-based
 screening results or epitope prediction results in a
 specified screening output folder. Can be called from
 append_sjc to save time
 screen_cpm Identify AS events of varying degrees of tumor
 association and specificity using an AS reference
 panel based on normalized splice junction read counts
 append_cpm Append "screen_cpm" result as an annotation to PSI-
 based screening results and epitope prediction results
 in a specified screening output folder
 screen_novelss Identify AS events of varying degrees of tumor
 association and specificity, including events with
 unannotated splice junctions deteced by rMATS
 "novelss" option
 screen_sjc_plot Make stacked/individual barplots of percentage of
 samples expressing a splice junction for list of AS
 events
 makesubsh_hla Make submission shell scripts for running seq2HLA for
 HLA typing using RNA-Seq
 parse_hla Summarize seq2HLA results of all input samples into
 matrices for IRIS use
 ms_makedb Generate proteo-transcriptomic database for MS search
 ms_search Wrapper to run MSGF+ for MS search
 ms_parse Parse MS search results to generate tables of
 identified peptides
 visual_summary Make a graphic summary of IRIS results
optional arguments:
 -h, --help show this help message and exit
 --version show program's version number and exit
For command line options of each sub-command, type: IRIS COMMAND -h

Streamlined functions of major modules

The core of IRIS immunotherapy target discovery comprises of four steps from three major modules. For a quick test, see Example which uses the snakemake to run a small data set.

  • Step 1 . Generate and index PSI-based AS matrix from rMATS output (RNA-seq data processing module)

    • IRIS format option -d should be used to save the generated PSI-based AS matrix to the downloaded IRIS DB.

    • Example files for rmats_mat_path_manifest and rmats_sample_order can be found under example/example_of_input_to_format_step/ .

    • IRIS index will create an index for the IRIS format generated PSI-based AS matrix, and -o should be the path to the folder containing the generated AS matrix.

    • The option novelSS is experimental and not fully validated. It takes the output from the experimental function in rMATS to identify events with unannotated splice sites. Please refer to the latest rMATS (> v4.0.0) for details.

usage: IRIS format [-h] -t {SE,RI,A3SS,A5SS} -n DATA_NAME -s {1,2}
 [-c COV_CUTOFF] [-i] [-e] [-d IRIS_DB_PATH] [--novelSS]
 [--gtf GTF]
 rmats_mat_path_manifest rmats_sample_order
usage: IRIS index [-h] -t {SE,RI,A3SS,A5SS} -n DATA_NAME
 [-c COV_CUTOFF] [-o OUTDIR]
 splicing_matrix
  • Step 2 . Screen and translate tumor-associated events (IRIS screening module: 'tumor-association screen' + optional 'tumor-recurrence screen')

    • Description of the PARAMETER_FIN input file can be found at example/parameter_file_description.txt , and an example file can be found at example/NEPC_test.para .

    • To perform an optional tumor-recurrence screen, include a 'tumor reference' in the PARAMETER_FIN input file.

    • Users can also use an optional secondary tumor-association screen (not included in Snakemake) by calling IRIS screen_cpm . This screening test accounts for the joint effects of overall gene expression and AS. Commands to run this test and the output result format are similar to tumor-specificity test in the 'Step 4' below.

    • Option -t in IRIS screen runs IRIS translate to generate SJ peptides, a required step for IRIS module for target prediction.

usage: IRIS screen [-h] -p PARAMETER_FIN
 --splicing-event-type {SE,RI,A3SS,A5SS} -o OUTDIR [-t]
 [-g GTF] [--all-reading-frames] [--ignore-annotation]
 [--remove-early-stop] [--min-sample-count MIN_SAMPLE_COUNT]
 [--use-existing-test-result]
usage: IRIS translate [-h] -g REF_GENOME
 --splicing-event-type {SE,RI,A3SS,A5SS} --gtf GTF
 -o OUTDIR [--all-orf] [--ignore-annotation]
 [--remove-early-stop] [-c DELTAPSI_COLUMN] [-d DELTAPSI_CUT_OFF]
 [--no-tumor-form-selection] [--check-novel]
 as_input 
  • Step 3 . Predict both extracellular targets and epitopes( designed for cluster execution ) (IRIS target prediction module)

    • IRIS predict can generate CAR-T annotation results and prepare a job array submission for TCR epitope prediction. TCR prediction preparation is optional and can be disabled by using --extraceullular-only .

    • IRIS epitope_post will summarize TCR epitope prediction results after TCR epitope prediction jobs from IRIS predict are submitted and finished (job array submission step can be done manually or using snakemake)

    • MHC_LIST and MHC_BY_SAMPLE can be generated by running HLA_typing (within or outside of IRIS). Note that it is not necessary to restrict HLA types detected from input RNA samples. It is recommended for users to specify dummy files only containing HLA types of interest or common HLA types as long as HLA types in the dummy hla_types.list and hla_patient.tsv are consistent. Example files for hla_types.list and hla_patient.tsv can be found at example/hla_types_test.list and example/hla_patient_test.tsv respectively.

usage: IRIS predict [-h] --task-dir TASK_DIR -p PARAMETER_FIN
 -t {SE,RI,A3SS,A5SS} [--iedb-local IEDB_LOCAL]
 [-m MHC_LIST] [--extracellular-only] [--tier3-only]
 [--gene-exp-matrix GENE_EXP_MATRIX] [-c DELTAPSI_COLUMN]
 [-d DELTAPSI_CUT_OFF] [-e EPITOPE_LEN_LIST] [--all-reading-frames]
 [--extracellular-anno-by-junction]
 IRIS_screening_result_path
usage: IRIS epitope_post [-h] -p PARAMETER_FIN -o OUTDIR
 -t {SE,RI,A3SS,A5SS} -m MHC_BY_SAMPLE
 -e GENE_EXP_MATRIX [--tier3-only] [--keep-exist]
 [--epitope-len-list EPITOPE_LEN_LIST]
 [--no-match-to-canonical-proteome]
 [--no-uniqueness-annotation]
 [--ic50-cut-off IC50_CUT_OFF]
  • Step 4 . Perform tumor-specificity screen, a more strigent screen comparing the presence-absence of a given SJ between tumor and normal tissues (IRIS screening module: 'tumor-specificity screen')

    • IRIS append_sjc combines screen and screen_sjc results (by appending screen_sjc outputs to screen outputs). This 'integrated' output contains annotations for tumor-specific targets.

    • IRIS append_sjc -i option can be used to execute both IRIS append_sjc and IRIS annotate_ijc functions. If -i option is used, -p and -e arguments are required.

usage: IRIS screen_sjc [-h] -p PARAMETER_FIN
 --splicing-event-type {SE,RI,A3SS,A5SS}
 -e EVENT_LIST_FILE -o OUTDIR
 [--use-existing-test-result]
 [--tumor-read-cov-cutoff TUMOR_READ_COV_CUTOFF]
 [--normal-read-cov-cutoff NORMAL_READ_COV_CUTOFF]
usage: IRIS append_sjc [-h] --sjc-summary SJC_SUMMARY
 --splicing-event-type {SE,RI,A3SS,A5SS} -o OUTDIR
 [-i] [-u] [-p PARAMETER_FILE]
 [-e SCREENING_RESULT_EVENT_LIST]
 [--inc-read-cov-cutoff INC_READ_COV_CUTOFF]
 [--event-read-cov-cutoff EVENT_READ_COV_CUTOFF]
usage: IRIS annotate_ijc [-h] -p PARAMETER_FILE
 --splicing-event-type {SE,RI,A3SS,A5SS}
 -e SCREENING_RESULT_EVENT_LIST -o OUTDIR
 [--inc-read-cov-cutoff INC_READ_COV_CUTOFF]
 [--event-read-cov-cutoff EVENT_READ_COV_CUTOFF]

Snakemake

The Snakemake workflow can be run with ./run . First set the configuration values in snakemake_config.yaml :

  • Set the resources to allocate for each job:

    • {job_name}_{threads}

    • {job_name}_{mem_gb}

    • {job_name}_{time_hr}

  • Set the reference files:

    • Provide the file names for gtf_name: and fasta_name:

    • Either place the files in ./references/ or provide a URL under reference_files: to download the (potentially gzipped) files:

gtf_name: 'some_filename.gtf'
fasta_name: 'other_filename.fasta'
reference_files:
 some_filename.gtf.gz:
 url: 'protocol://url/for/some_filename.gtf.gz'
 other_filename.fasta.gz:
 url: 'protocol://url/for/other_filename.fasta.gz'
  • Set the input files:

    • sample_fastqs: Set the read 1 and read 2 fastq files for each sample. For example:
sample_fastqs:
 sample_name_1:
 - '/path/to/sample_1_read_1.fq'
 - '/path/to/sample_1_read_2.fq'
 sample_name_2:
 - '/path/to/sample_2_read_1.fq'
 - '/path/to/sample_2_read_2.fq'
  • blocklist : an optional blocklist of AS events similar to IRIS/data/blocklist.brain_2020.txt

  • mapability_bigwig : an optional file for evaluating splice region mappability similar to IRIS_data/resources/mappability/wgEncodeCrgMapabilityAlign24mer.bigWig

  • mhc_list : required if not starting with fastq files, similar to example/hla_types_test.list

  • mhc_by_sample : required if not starting with fastq files, similar to example/hla_patient_test.tsv

  • gene_exp_matrix : optional tsv file with geneName as the first column and the expression for each sample in the remaining columns

  • splice_matrix_txt : optional output file from IRIS index that can be used as a starting point

  • splice_matrix_idx : the index file for splice_matrix_txt

  • sjc_count_txt : optional output file from IRIS sjc_matrix that can be used as a starting point. Only relevant if should_run_sjc_steps

  • sjc_count_idx : the index file for sjc_count_txt

  • Set other options

    • run_core_modules : set to true to start with existing IRIS format output and HLA lists

    • run_all_modules : set to true to start with fastq files

    • should_run_sjc_steps : set to true to enable splice junction based evaluation steps

    • star_sjdb_overhang : used by STAR alignment. Should ideally be read_length -1 , but the STAR manual says that 100 works well as a default

    • run_name : used to name output files that will be written to IRIS_data/

    • splice_event_type : one of [SE, RI, A3SS, A5SS]

    • comparison_mode : one of [group, individual]

    • stat_test_type : one of [parametric, nonparametric]

    • use_ratio : set to true to require a ratio of reference groups to pass the checks rather than a fixed count

    • tissue_matched_normal_..._{cutoff} : set cutoffs for the tissue matched normal reference group (tier 1)

    • tissue_matched_normal_reference_group_names : a comma separated list of directory names under IRIS_data/db

    • tumor_..._{cutoff} : set cutoffs for the tumor reference group (tier 2)

    • tumor_reference_group_names : a comma separated list of directory names under IRIS_data/db

    • normal_..._{cutoff} : set cutoffs for the normal reference group (tier 3)

    • normal_reference_group_names : a comma separated list of directory names under IRIS_data/db

Example

The snakemake is configured to run the above IRIS streamlined major functions using example/ . For customized pipeline development, we recommend that users refer to the Snakefile as a reference. Snakefile defines the steps of the pipeline. To run the example, update the /path/to/ values with full paths in snakemake_config.yaml , and make any adjustments to snakemake_profile/ . Then

./run

As mentioned in Usage , the full example is designed to be run with a compute cluster. It will take < 5 min for the formatting and screening steps and usually < 15 min for the prediction step (depending on available cluster resources).

A successful test run will generate the following result files in ./results/NEPC_test/screen/ (row numbers are displayed before each file name):

 0 NEPC_test.SE.notest.txt 1 NEPC_test.SE.test.all_guided.txt 1 NEPC_test.SE.tier1.txt 1 NEPC_test.SE.tier1.txt.integratedSJC.txt 4 NEPC_test.SE.tier2tier3.txt.ExtraCellularAS.txt 4 NEPC_test.SE.tier2tier3.txt.ExtraCellularAS.txt.integratedSJC.txt 6 NEPC_test.SE.tier2tier3.txt 6 NEPC_test.SE.tier2tier3.txt.ijc_info.txt 6 NEPC_test.SE.tier2tier3.txt.integratedSJC.txt 11 NEPC_test.SE.test.all_voted.txt 4 SE.tier2tier3/epitope_summary.junction-based.txt 4 SE.tier2tier3/epitope_summary.junction-based.txt.integratedSJC.txt 9 SE.tier2tier3/epitope_summary.peptide-based.txt 9 SE.tier2tier3/epitope_summary.peptide-based.txt.integratedSJC.txt 11 SE.tier2tier3/pred_filtered.score500.txt

A summary graphic is generated to ./results/NEPC_test/visualization/summary.png

Example output

Final reports are shown in bold font.

Default screen (tumor-associated screen) results

[TASK/DATA_NAME].[AS_TYPE].test.all_guided.txt : All AS events tested by IRIS screening with tissue-matched normal tissue reference panel available. One-sided test will be used to generate p-value.

[TASK/DATA_NAME].[AS_TYPE].test.all_voted.txt : All AS events tested by IRIS screening without tissue-matched normal tissue reference panel. Two-sided test will be used to generate p-value for comparisons to normal panels.

[TASK/DATA_NAME].[AS_TYPE].notest.txt : During screening, AS events skipped due to no variance or no available comparisons

[TASK/DATA_NAME].[AS_TYPE].tier1.txt : Tumor-associated AS events after comparison to tissue-matched normal panel ('tier1' events)

[TASK/DATA_NAME].[AS_TYPE].tier2tier3.txt : Tumor-associated AS events after comparison to tissue-matched normal panel, tumor panel, and normal tissue panel ('tier3' AS events)

CAR-T annotation reports

[TASK/DATA_NAME].[AS_TYPE].tier1.txt.ExtraCellularAS.txt : Tumor-associated AS events in 'tier1' set that are associated with protein extracellular annotation and may be used for CAR-T targets

[TASK/DATA_NAME].[AS_TYPE].tier2tier3.txt.ExtraCellularAS.txt : Tumor-associated AS events in 'tier3' set that are associated with protein extracellular annotation and may be used for CAR-T targets

TCR prediction reports

[AS_TYPE].tier1/pred_filtered.score500.txt : IEDB prediction outputs for SJ peptides from 'tier1' set with HLA-peptide binding IC50 values passing user-defined cut-off

[AS_TYPE].tier1/epitope_summary.peptide-based.txt : AS-derived epitopes from 'tier1' set that are predicted to bind user-defined HLA types

[AS_TYPE].tier1/epitope_summary.junction-based.txt : AS events from 'tier1' set that are predicted to bind user-defined HLA types

[AS_TYPE].tier2tier3/pred_filtered.score500.txt : IEDB prediction outputs for AS junction peptides from 'tier3' set with HLA-peptide binding IC50 value passing user-defined cut-off

[AS_TYPE].tier2tier3/epitope_summary.peptide-based.txt : AS-derived epitopes from 'tier3' set that are predicted to bind user-defined HLA types

[AS_TYPE].tier2tier3/epitope_summary.junction-based.txt : AS events from 'tier3' set that are predicted to bind user-defined HLA types

Tumor-specificity screen reports

Screening or prediction outputs that integrate screen and screen_sjc results contain annotations for tumor-specific targets. These output files are indicated by .integratedSJC.txt , such as [TASK/DATA_NAME].[AS_TYPE]tier2tier3.txt.integratedSJC.txt and [AS_TYPE].tier2tier3/epitope_summary.peptide-based.txt.integratedSJC.txt , etc.

Contact

Yang Pan [email protected]

Eric Kutschera [email protected]

Beatrice Zhang [email protected]

Yi Xing [email protected]

Publication

Pan Y*, Phillips JW*, Zhang BD*, Noguchi M*, Kutschera E, McLaughlin J, Nesterenko PA, Mao Z, Bangayan NJ, Wang R, Tran W, Yang HT, Wang Y, Xu Y, Obusan MB, Cheng D, Lee AH, Kadash-Edmondson KE, Champhekar A, Puig-Saus C, Ribas A, Prins RM, Seet CS, Crooks GM, Witte ON+, Xing Y+. (2023) IRIS: Big data-informed discovery of cancer immunotherapy targets arising from pre-mRNA alternative splicing. PNAS, in press. (+joint corresponding authors; *joint first authors)

Code Snippets

193
194
195
196
197
shell:
    'curl -L \'{params.url}\''
    ' -o {output.ref_file}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 193 of master/Snakefile
214
215
216
217
shell:
    ' gunzip -c {input.gz}'
    ' 1> {output.un_gz}'
    ' 2> {log.err}'
SnakeMake From line 214 of master/Snakefile
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
shell:
    '{params.conda_wrapper} {params.conda_env_3} python {params.script}'
    ' --out-path {output.param_file}'
    ' --group-name {params.group_name}'
    ' --iris-db {params.iris_db}'
    ' --psi-p-value-cutoffs'
    ' {params.matched_psi_cut},{params.tumor_psi_cut},{params.normal_psi_cut}'
    ' --sjc-p-value-cutoffs'
    ' {params.matched_sjc_cut},{params.tumor_sjc_cut},{params.normal_sjc_cut}'
    ' --delta-psi-cutoffs'
    ' {params.matched_delta_psi_cut},{params.tumor_delta_psi_cut},{params.normal_delta_psi_cut}'
    ' --fold-change-cutoffs'
    ' {params.matched_fc_cut},{params.tumor_fc_cut},{params.normal_fc_cut}'
    ' --group-count-cutoffs'
    ' {params.matched_group_cut},{params.tumor_group_cut},{params.normal_group_cut}'
    ' --reference-names-tissue-matched-normal {params.matched_ref_names}'
    ' --reference-names-tumor {params.tumor_ref_names}'
    ' --reference-names-normal {params.normal_ref_names}'
    ' --comparison-mode {params.comparison_mode}'
    ' --statistical-test-type {params.stat_test_type}'
    ' {params.use_ratio}'
    ' {params.blocklist}'
    ' {params.bigwig}'
    ' {params.genome}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 295 of master/Snakefile
342
343
344
shell:
    'cp {input.splice_txt} {output.splice_txt}'
    ' && cp {input.splice_idx} {output.splice_idx}'
SnakeMake From line 342 of master/Snakefile
364
365
366
shell:
    'cp {input.count_txt} {output.count_txt}'
    ' && cp {input.count_idx} {output.count_idx}'
SnakeMake From line 364 of master/Snakefile
400
401
402
403
404
405
406
407
408
409
shell:
    '{params.conda_wrapper} {params.conda_env_2} STAR'
    ' --runMode genomeGenerate'
    ' --runThreadN {threads}'
    ' --genomeDir {params.out_dir}'
    ' --genomeFastaFiles {input.fasta}'
    ' --sjdbGTFfile {input.gtf}'
    ' --sjdbOverhang {params.overhang}'
    ' 1> {log.out}'
    ' 2> {log.err}'
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
run:
    import os
    import os.path

    out_dir = params.out_dir
    if os.path.isdir(out_dir):
        files = os.listdir(out_dir)
        if files:
            raise Exception('organize_fastqs: {} already contains files'
                            .format(out_dir))

    for i, sample_name in enumerate(params.sample_names):
        sample_dir = os.path.join(out_dir, sample_name)
        orig_fastq_path = input.fastqs[i]
        fastq_basename = os.path.basename(orig_fastq_path)
        new_fastq_path = os.path.join(sample_dir, fastq_basename)
        os.makedirs(sample_dir, exist_ok=True)
        os.symlink(orig_fastq_path, new_fastq_path)
SnakeMake From line 460 of master/Snakefile
562
563
564
565
566
567
568
569
570
571
572
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS makesubsh_mapping'
    ' --fastq-folder-dir {params.fastq_dir}'
    ' --starGenomeDir {params.star_dir}'
    ' --gtf {input.gtf}'
    ' --data-name {params.run_name}'
    ' --outdir {params.out_dir}'
    ' --label-string {params.label_string}'
    ' --task-dir {params.task_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 562 of master/Snakefile
605
606
607
608
609
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {input.star_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 605 of master/Snakefile
644
645
646
647
648
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {input.cuff_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 644 of master/Snakefile
712
713
714
715
716
717
718
719
720
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS makesubsh_hla'
    ' --fastq-folder-dir {params.fastq_dir}'
    ' --data-name {params.run_name}'
    ' --outdir {params.out_dir}'
    ' --label-string {params.label_string}'
    ' --task-dir {params.task_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 712 of master/Snakefile
753
754
755
756
757
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {input.hla_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 753 of master/Snakefile
786
787
788
789
790
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS parse_hla'
    ' --outdir {params.out_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 786 of master/Snakefile
845
846
847
848
849
850
851
852
853
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS makesubsh_rmats'
    ' --rMATS-path {params.rmats_path}'
    ' --bam-dir {params.bam_dir}'
    ' --gtf {input.gtf}'
    ' --data-name {params.run_name}'
    ' --task-dir {params.task_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 845 of master/Snakefile
887
888
889
890
891
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {input.rmats_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 887 of master/Snakefile
911
912
913
914
915
916
917
shell:
    '{params.conda_wrapper} {params.conda_env_3} python {params.script}'
    ' --parent-dir {params.parent_dir}'
    ' --run-name {params.run_name}'
    ' --out {output.read_lengths}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 911 of master/Snakefile
949
950
951
952
953
954
955
956
957
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS makesubsh_rmatspost'
    ' --rMATS-path {params.rmats_path}'
    ' --bam-dir {params.bam_dir}'
    ' --gtf {input.gtf}'
    ' --data-name {params.run_name}'
    ' --task-dir {params.task_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 949 of master/Snakefile
 997
 998
 999
1000
1001
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {params.post_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 997 of master/Snakefile
1041
1042
1043
1044
1045
1046
1047
shell:
    '{params.conda_wrapper} {params.conda_env_3} python {params.script}'
    ' --matrix-out {output.matrix}'
    ' --sample-out {output.sample}'
    ' --summaries {input.summaries}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1041 of master/Snakefile
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS format'
    ' {input.matrix_list}'
    ' {input.sample_list}'
    ' --splicing-event-type {params.splice_type}'
    ' --data-name {params.run_name}'
    ' --sample-name-field {params.sample_name_field}'
    ' --sample-based-filter'
    ' --iris-db-path {params.iris_db}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1082 of master/Snakefile
1125
1126
1127
1128
1129
1130
shell:
    '{params.conda_wrapper} {params.conda_env_3} python {params.script}'
    ' --out-manifest {output.manifest}'
    ' --fpkm-files {input.fpkm_files}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1125 of master/Snakefile
1163
1164
1165
1166
1167
1168
1169
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS exp_matrix'
    ' --outdir {params.out_dir}'
    ' --data-name {params.run_name}'
    ' {input.manifest}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1163 of master/Snakefile
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
shell:
    'echo {params.bam_dir} > {output.bam_list}'
    ' && {params.conda_wrapper} {params.conda_env_2} IRIS'
    ' makesubsh_extract_sjc'
    ' --bam-folder-list {output.bam_list}'
    ' --task-name {params.task_name}'
    ' --gtf {input.gtf}'
    ' --genome-fasta {input.fasta}'
    ' --BAM-prefix {params.bam_prefix}'
    ' --task-dir {params.task_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1214 of master/Snakefile
1255
1256
1257
1258
1259
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {input.extract_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1255 of master/Snakefile
1291
1292
1293
1294
1295
1296
shell:
    '{params.conda_wrapper} {params.conda_env_3} python {params.script}'
    ' --sj-out {output.sj_list}'
    ' --sj-files {input.sj_files}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1291 of master/Snakefile
1326
1327
1328
1329
1330
1331
1332
1333
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS sjc_matrix'
    ' --file-list-input {input.sj_list}'
    ' --data-name {params.run_name}'
    ' --sample-name-field {params.sample_name_field}'
    ' --iris-db-path {params.db_sjc}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1326 of master/Snakefile
1377
1378
1379
1380
1381
1382
1383
1384
1385
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS screen'
    ' --parameter-fin {input.parameter_file}'
    ' --splicing-event-type {params.splice_event_type}'
    ' --outdir {params.out_dir}'
    ' --translating'  # runs IRIS translate within IRIS screen
    ' --gtf {input.gtf}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1377 of master/Snakefile
1494
1495
1496
1497
1498
1499
1500
shell:
    '{params.conda_wrapper} {params.conda_env_3} python {params.script}'
    '  --out-list {output.predict_task_list}'
    ' --task-dir {params.task_dir}'
    ' --splice-type {params.splice_type}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1494 of master/Snakefile
1517
1518
1519
1520
1521
shell:
    '{params.conda_wrapper} {params.conda_env_2} bash'
    ' {input.predict_task}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1517 of master/Snakefile
1600
1601
1602
1603
1604
1605
1606
1607
1608
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS epitope_post'
    ' --parameter-fin {input.parameter_file}'
    ' --outdir {params.out_dir}'
    ' --splicing-event-type {params.splice_event_type}'
    ' --mhc-by-sample {input.mhc_by_sample}'
    ' {params.gene_exp}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1600 of master/Snakefile
1642
1643
1644
1645
1646
1647
1648
1649
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS screen_sjc'
    ' --parameter-file {input.parameter_file}'
    ' --splicing-event-type {params.splice_event_type}'
    ' --event-list-file {input.splice_txt}'
    ' --outdir {params.out_dir}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1642 of master/Snakefile
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS append_sjc'
    ' --sjc-summary {input.screen_sjc_out}'
    ' --splicing-event-type {params.splice_event_type}'
    ' --outdir {params.out_dir}'
    ' --add-ijc-info'  # runs IRIS annotate_ijc within IRIS append_sjc
    ' --parameter-file {input.parameter_file}'
    ' --screening-result-event-list {input.event_list}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1710 of master/Snakefile
1759
1760
1761
1762
1763
1764
1765
1766
shell:
    '{params.conda_wrapper} {params.conda_env_2} IRIS visual_summary'
    ' --parameter-fin {input.parameter_file}'
    ' --screening-out-dir {params.screen_dir}'
    ' --out-file-name {output.summary}'
    ' --splicing-event-type {params.splice_event_type}'
    ' 1> {log.out}'
    ' 2> {log.err}'
SnakeMake From line 1759 of master/Snakefile
ShowHide 32 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/Xinglab/IRIS
Name: iris
Version: v2.0.1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...