3DGB is a workflow to build 3D models of genomes from HiC data
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
- Lack of a description for a new keyword tool/pypi/iced .
- Lack of a description for a new keyword tool/pypi/pastis .
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
3D genome builder (3DGB) is a workflow to build 3D models of genomes from HiC raw data and to integrate omics data on the produced models for further visual exploration. 3DGB bundles HiC-Pro , PASTIS and custom Python scripts into a unified Snakemake workflow with limited inputs (see Preparing Required Files ). 3DGB produces annotated 3D models of genome in PDB and G3D formats.
Download this repository
git clone https://github.com/data-fun/3d-genome-builder.git
cd 3d-genome-builder
Install dependencies
Singularity
Download the latest version here
Install Singularity:
sudo apt install -y ./singularity-container_3.8.7_amd64.deb
Verify version:
$ singularity --version
singularity version 3.8.7
Conda environment
Install conda .
Install mamba:
conda install mamba -n base -c conda-forge
Create conda environment and install dependendies:
mamba env create -f binder/environment.yml
Load conda environment:
conda activate 3DGB
Download HiC-Pro Singularity image
wget --ciphers=DEFAULT:@SECLEVEL=1 https://zerkalo.curie.fr/partage/HiC-Pro/hicpro_3.1.0_ubuntu.img -P images
Verify HiC-Pro version with:
$ singularity exec images/hicpro_3.1.0_ubuntu.img HiC-Pro --version
[...]
HiC-Pro version 3.1.0
and bowtie2 version:
$ singularity exec images/hicpro_3.1.0_ubuntu.img bowtie2 --version 2>/dev/null | head -n 1
/usr/local/conda/envs/hicpro/bin/bowtie2-align-s version 2.4.4
Prepare required files
Create the config file
Create and edit a configuration file in
yaml
format. See for instance the template
config_template.yml
Add the reference genome
The reference genome fasta file must be located in
WORKING_DIR/genome.fasta
where
WORKING_DIR
is the name of the working directory as specified in your config file.
Add FASTQ files (optional)
If you already have fastq files stored locally or some fastq files are not available on GEO or SRA, you can use these files providing they are in the proper directory structure:
WORKING_DIR/
├── fastq_files
│ ├── ID1
│ │ ├── ID1_R1.fastq.gz
│ │ └── ID1_R2.fastq.gz
│ ├── ID2
│ │ ├── ID2_R1.fastq.gz
│ │ └── ID2_R2.fastq.gz
│ ├── ID3
│ │ ├── ID3_R1.fastq.gz
│ │ └── ID3_R2.fastq.gz
│ └── ID4
│ ├── ID4_R1.fastq.gz
│ └── ID4_R2.fastq.gz
└── genome.fasta
-
WORKING_DIR
is the name of the working directory as specified in your config file. -
Paired-end fastq files are in the directory
WORKING_DIR/fastq_files/IDx
withIDx
the identifier of the paired fastq files. Fastq identifiers are reported in the config file. Please note fastq files have to follow the pattern<sample ID>_R<1 or 2>.fastq.gz
.
Note
Please strictly follow this file organization as it is required by the 3DGB workflow.
Build model
Run 3DGB:
snakemake --profile smk_profile -j 4 --configfile YOUR-CONFIG.yml
Note
Adapt
YOUR-CONFIG.yml
to the exact name of the config file you created.Option
-j 4
tells Snakemake to use up to 4 cores. If you are more cores available, you can increase this value ( e.g.-j 16
).
Or with debugging options:
snakemake --profile smk_profile_debug -j 4 --configfile YOUR-CONFIG.yml --verbose
Depending on the number and size of fastq files, the 3D construction will take a couple of hours to run.
For troubleshooting, have a look to log files in
WORKING_DIR/logs
, where
WORKING_DIR
is the name of the working directory as specified in your config file.
Map quantitative values on the 3D model
To map quantitative values on the model run:
python ./scripts/map_parameter.py --pdb path/to/structure.pdb --bedgraph path/to/annotation.bedgraph --output path/to/output.pdb
Quantitative values should be formatted in a 4-column bedgraph file (chromosome/start/stop/value):
chr1 0 50000 116.959
chr1 50000 100000 48.4495
chr1 100000 150000 22.8726
chr1 150000 200000 84.3106
chr1 200000 250000 113.109
Each bead of the model will be assigned a quantitative value. The resolution in the bedgraph file should match the resolution used to build the model.
Get results
Upon completion, the
WORKING_DIR
should look like this:
WORKING_DIR/
├── contact_maps
├── dense_matrix
├── fastq_files
├── HiC-Pro
├── logs
├── pastis
├── sequence
└── structure
The following paths contain the most interesting results:
-
WORKING_DIR/contact_maps/*.png
: contact maps. -
WORKING_DIR/HiC-Pro/output/hic_results/pic/*/*.pdf
: graphical summaries of read alignments produced by Hi-C Pro. -
WORKING_DIR/pastis/structure_RESOLUTION.pdb
: raw 3D models (in PDB format) produced by Pastis. -
WORKING_DIR/structure/RESOLUTION/structure_cleaned.*
: final (annotated) 3D models in PDB and G3D formats.
Note
WORKING_DIR
is the name of the working directory as specified in your config file.
RESOLUTION
is the resolution of the Hi-C data specified in the config file.
Examples
Visualize 3D model structures
To visualize 3D model structures (.pdb and .g3d files), follow this quick tutorial .
Build DAG graph
For visualization purpose, you can build the graph of all computational steps involved in the 3D construction of the genome.
snakemake --profile smk_profile --configfile YOUR-CONFIG.yml --rulegraph | dot -Tpdf > rules.pdf
where
YOUR-CONFIG.yml
should be replaced by the name of the config file you created.
With wildcards:
snakemake --profile smk_profile --configfile YOUR-CONFIG.yml --dag | dot -Tpdf > dag.pdf
Code Snippets
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | import argparse from biopandas.pdb import PandasPdb def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments. """ parser = argparse.ArgumentParser(add_help=False) required = parser.add_argument_group("required arguments") optional = parser.add_argument_group("optional arguments") required.add_argument( "--input-pdb", action="store", type=str, help="Input PDB file containing the 3D structure of the genome.", required=True, ) required.add_argument( "--output-pdb", action="store", type=str, help="Output PDB file containing the completed 3D structure of the genome.", required=True, ) # Add help. optional.add_argument( "-h", "--help", action="help", default=argparse.SUPPRESS, help="Show this help message and exit.", ) return parser.parse_args() def interpolate_missing_coordinates(pdb_name_in, pdb_name_out): """Interpolate missing coordinates in PDB file containing a 3D genome structure. 3D coordinates of the missing beads are interpolated one by one. Parameters ---------- pdb_name_in : str PDB file containing the 3D structure of the genome pdb_name_out : str Output PDB file containing the annotated 3D structure of the genome """ pdb = PandasPdb().read_pdb(pdb_name_in) atoms = pdb.df["ATOM"] print(f"Found {atoms.shape[0]} beads in {pdb_name_in}") # Interpolate missing coordinates chromosome by chromosome. coord_columns= ["x_coord", "y_coord", "z_coord"] for residue_number in atoms["residue_number"].unique(): missing_beads = ( atoms.query(f"residue_number == {residue_number}") [coord_columns] .isna() .sum() # Sum NAs over columns (coordinates). .sum() # Sum NAs over lines. ) // 3 # 1 missing atom has 3 missing coordinates. chromosome = atoms.query(f"residue_number == {residue_number}") if not missing_beads: continue print(f"Chromosome {residue_number} has {missing_beads} missing beads") # Interpolate coordinates on the three coordinates (x, y, z). # Only interpolation between existing beads. # No extrapolation at chromosome ends. interpolate_chromosome = chromosome.copy() interpolate_chromosome[coord_columns] = chromosome[coord_columns].interpolate( method="pchip", axis=0, limit_area="inside" ) # Print number of missing beads that will later remove. deleted_atoms_number = interpolate_chromosome["x_coord"].isna().sum() print( f"Removed {deleted_atoms_number} beads from chromosome {residue_number} extremities" ) # Save interpolated chromosome into the full structure. atoms.loc[atoms["residue_number"] == residue_number, coord_columns] = interpolate_chromosome[coord_columns] # Remove in one step all beads with missing coordinates # located at chromosome extremities. atoms = atoms.dropna(subset=coord_columns) print( f"Found {atoms.shape[0]} beads after interpolation and removal of extremities missing beads." ) pdb.df["ATOM"] = atoms pdb.to_pdb(pdb_name_out) print(f"Wrote {pdb_name_out}") if __name__ == "__main__": ARGS = get_cli_arguments() interpolate_missing_coordinates(ARGS.input_pdb, ARGS.output_pdb) |
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | import argparse import math from pathlib import Path from Bio import SeqIO from biopandas.pdb import PandasPdb def is_file(parser, file_path): """Check file exists. Parameters ---------- parser : argparse.ArgumentParser Command line argument parser. file_path : str File path Returns ------- str File path. """ if not Path(file_path).is_file(): parser.error(f"The file {file_path} does not exist") else: return file_path def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments. """ parser = argparse.ArgumentParser(add_help=False) required = parser.add_argument_group("required arguments") optional = parser.add_argument_group("optional arguments") required.add_argument( "--pdb", action="store", type=lambda name: is_file(parser, name), help="PDB file containing the 3D structure of the genome.", required=True, ) required.add_argument( "--fasta", action="store", type=lambda name: is_file(parser, name), help="FASTA file containing the sequence of the genome.", required=True, ) required.add_argument( "--resolution", action="store", type=int, help="HiC resolution from which the structure has been generated.", required=True, ) required.add_argument( "--output", action="store", type=str, help="Output PDB file containing the annotated 3D structure of the genome.", required=True, ) # Add help. optional.add_argument( "-h", "--help", action="help", default=argparse.SUPPRESS, help="Show this help message and exit.", ) return parser.parse_args() def extract_chromosome_length(fasta_name): """Extract chromosome length from a FASTA file. Parameters ---------- fasta_name : str Name of Fasta file containing the sequence of the genome. Returns ------- list List of chromosome lengths. """ chromosome_length_lst = [] with open(fasta_name, "r") as fasta_file: print(f"Reading {fasta_name}") for record in SeqIO.parse(fasta_file, "fasta"): length = len(record.seq) print(f"Found chromosome {record.id} with {length} bases") chromosome_length_lst.append(length) return chromosome_length_lst def assign_chromosome_number( pdb_name_in, chromosome_length, HiC_resolution, pdb_name_out ): """Assign chromosome numbers to the whole genome 3D structure. Note: - The PDB file produced by Pastis is not readable by Biopython because the residue number column is missing. - We use instead the biopandas library: http://rasbt.github.io/biopandas/ Parameters ---------- pdb_name_in : str PDB file containing the 3D structure of the genome. chromosome_length : list List with chromosome lengths. HiC_resolution : int HiC resolution. pdb_name_out : str Output PDB file containing the annotated 3D structure of the genome. """ coordinates = PandasPdb().read_pdb(pdb_name_in) coordinates_df = coordinates.df["ATOM"] # Verify the number of beads in the structure # and the number of beads deduced from the sequence and resolution # are the same. print(f"Number of beads read from structure: {coordinates_df.shape[0]}") beads_per_chromosome = [ math.ceil(length / HiC_resolution) for length in chromosome_length ] print( f"Number of beads deduced from sequence and HiC resolution: {sum(beads_per_chromosome)}" ) if coordinates_df.shape[0] != sum(beads_per_chromosome): raise ValueError( "Number of beads in structure and " "number of beads deduced from sequence and resolution " "are different." ) # Add residue (chromosome) number. residue_number = [] for index, bead_number in enumerate(beads_per_chromosome): residue_number = residue_number + [index + 1] * bead_number coordinates_df["residue_number"] = residue_number # Add residue name based on residue number. coordinates_df["residue_name"] = coordinates_df["residue_number"].apply( lambda x: f"C{x:02d}" ) # Add chain (chromosome) name. # Chromosome 1 -> Chain A # Chromosome 2 -> Chain B # ... # Chromosome 26 -> Chain Z # Chromosome 27 -> Chain A # Chromosome 28 -> Chain B # ... letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # Handle case where there are more than 26 chromosomes. repeat_factor = math.ceil(len(chromosome_length) / len(letters)) residue_to_chain = dict( zip(coordinates_df["residue_number"].unique(), letters * repeat_factor) ) # Map residue number to chain id. coordinates_df["chain_id"] = coordinates_df["residue_number"].map(residue_to_chain) # Fix atom number starting at 1 (instead of 0). coordinates_df["atom_number"] = coordinates_df["atom_number"] + 1 # Write new PDB file to disk. coordinates.df["ATOM"] = coordinates_df coordinates.to_pdb(path=pdb_name_out, records=None, gz=False, append_newline=True) print(f"Wrote {pdb_name_out}") if __name__ == "__main__": ARGS = get_cli_arguments() # Read FASTA file and get chromosome length. CHROMOSOME_LENGTH = extract_chromosome_length(ARGS.fasta) # Assign chromosome number. assign_chromosome_number(ARGS.pdb, CHROMOSOME_LENGTH, ARGS.resolution, ARGS.output) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | import argparse import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt import numpy as np def get_cli_arguments(): """Parse command line. Returns ------- argparse.Namespace Object containing arguments """ parser = argparse.ArgumentParser() parser.add_argument( "--contacts", action="store", type=str, help="Input contact file", required=True, ) parser.add_argument( "--map", action="store", type=str, help="Output contact map image", required=True, ) return parser.parse_args() def create_contact_map(contacts_file, map_file): """Create contact map image. Parameters ---------- contacts_file : str Contact file map_file : str Contact map image """ contacts = np.loadtxt(contacts_file) plt.imshow(np.log2(contacts+1), cmap="hot") plt.colorbar() plt.savefig(map_file, dpi=300) if __name__ == "__main__": ARGS = get_cli_arguments() create_contact_map(ARGS.contacts, ARGS.map) |
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | import argparse from Bio import SeqIO def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments """ parser = argparse.ArgumentParser(add_help=False) required = parser.add_argument_group("required arguments") optional = parser.add_argument_group("optional arguments") required.add_argument( "--fasta", action="store", type=str, help="Fasta file containing the sequence of the genome.", required=True, ) required.add_argument( "--output", action="store", type=str, help="Output text file containing chromosome sizes.", required=True, ) # Add help. optional.add_argument( "-h", "--help", action="help", default=argparse.SUPPRESS, help="Show this help message and exit.", ) return parser.parse_args() if __name__ == "__main__": # Parse command line arguments. ARGS = get_cli_arguments() print(f"Opening {ARGS.fasta}") with open(ARGS.fasta, "r") as fasta_file, open(ARGS.output, "w") as size_file: for record in SeqIO.parse(fasta_file, "fasta"): print(f"{record.id}: {len(record.seq)} bases") size_file.write(f"{record.id}\t{len(record.seq)}\n") |
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | import argparse import math import pandas as pd from pathlib import Path from Bio import SeqIO from biopandas.pdb import PandasPdb def is_file(parser, file_path): """Check file exists. Parameters ---------- parser : argparse.ArgumentParser Command line argument parser file_path : str File path Returns ------- str File path """ if not Path(file_path).is_file(): parser.error(f"The file {file_path} does not exist") else: return file_path def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments """ parser = argparse.ArgumentParser() parser.add_argument( "--pdb", action="store", type=lambda name: is_file(parser, name), help="PDB file containing the 3D structure of the genome", required=True, ) parser.add_argument( "--fasta", action="store", type=lambda name: is_file(parser, name), help="Fasta file containing the sequence of the genome", required=True, ) parser.add_argument( "--resolution", action="store", type=int, help="HiC resolution", required=True, ) parser.add_argument( "--output", action="store", type=str, help="Output g3d file containing the annotated 3D structure of the genome", required=True, ) return parser.parse_args() def extract_chromosome_length(fasta_name): """Extract chromosome length from a FASTA file. Parameters ---------- fasta_name : str Name of Fasta file containing the sequence of the genome Returns ------- list List of chromosome lengthes """ chromosome_length_lst = [] with open(fasta_name, "r") as fasta_file: print(f"Reading {fasta_name}") for record in SeqIO.parse(fasta_file, "fasta"): length = len(record.seq) print(f"Found chromosome {record.id} with {length} bases") chromosome_length_lst.append(length) return chromosome_length_lst def convert_to_g3d(pdb_name_in, chromosome_length, HiC_resolution, g3d_name_out): """Assign chromosome numbers to the whole genome 3D structure. Note: - The PDB file produced by Pastis is not readable by Biopython because the residue number column is missing. - We use instead the biopandas library. http://rasbt.github.io/biopandas/ Parameters ---------- pdb_name_in : str PDB file containing the 3D structure of the genome chromosome_length : list List with chromosome lengths HiC_resolution : int HiC resolution g3d_name_out : str Output g3d file containing the annotated 3D structure of the genome """ pdb = PandasPdb().read_pdb(pdb_name_in) atoms = pdb.df["ATOM"] print(f"Number of beads read from structure: {pdb.df['ATOM'].shape[0]}") # Deduce chromosomal position of each atoms bp_coordinates = list() for i in range(len(chromosome_length)): bp_coordinates_chrom_x = list(range(0, chromosome_length[i], HiC_resolution)) bp_coordinates = bp_coordinates + bp_coordinates_chrom_x # Get the index of the beads without the ones already removed from the pdb file (by interpolate_missing_coordinates.py ) atom_number = list(atoms["atom_number"]-1) # Delete the bp coordinates values corresponding to removed beads bp_coordinates = pd.Series(bp_coordinates) bp_coordinates = bp_coordinates.iloc[atom_number] bp_coordinates.reset_index(drop=True, inplace=True) # Extract chromosomes and 3d coordinates information g3d_data = { 'chrom': atoms["residue_number"], 'locus':bp_coordinates, 'X3D_x': atoms["x_coord"], 'X3D_y': atoms["y_coord"], 'X3D_z': atoms["z_coord"] } result = pd.DataFrame(g3d_data) #save as g3d file result.to_csv(g3d_name_out, sep="\t", index=False) if __name__ == "__main__": ARGS = get_cli_arguments() # Read Fasta file and get chromosome length CHROMOSOME_LENGTH = extract_chromosome_length(ARGS.fasta) # Assign chromosome number convert_to_g3d(ARGS.pdb, CHROMOSOME_LENGTH, ARGS.resolution, ARGS.output) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 | import argparse from pathlib import Path import jinja2 def is_file(parser, file_path): """Check file exists. Parameters ---------- parser : argparse.ArgumentParser Command line argument parser. file_path : str File path Returns ------- str File path. """ if not Path(file_path).is_file(): parser.error(f"The file {file_path} does not exist") else: return file_path def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments. """ parser = argparse.ArgumentParser(add_help=False) required = parser.add_argument_group("required arguments") optional = parser.add_argument_group("optional arguments") required.add_argument( "--template", action="store", type=lambda name: is_file(parser, name), help="Template file for HiC-Pro configuration.", required=True, ) required.add_argument( "--chromosome-sizes", action="store", type=lambda name: is_file(parser, name), help="File with chromosome sizes.", required=True, ) required.add_argument( "--genome-fragment", action="store", type=lambda name: is_file(parser, name), help="File with genome restriction sites.", required=True, ) required.add_argument( "--genome-index-path", action="store", type=str, help="Path to bowtie2 indexes.", required=True, ) required.add_argument( "--resolutions", action="store", type=int, nargs="+", help="HiC resolutions.", required=True, ) required.add_argument( "--output", action="store", type=str, help="HiC-Pro configuration file.", required=True, ) # Add help. optional.add_argument( "-h", "--help", action="help", default=argparse.SUPPRESS, help="Show this help message and exit.", ) return parser.parse_args() if __name__ == "__main__": ARGS = get_cli_arguments() with open(ARGS.template, "r") as template_file, \ open(ARGS.output, "w") as out_file: template = jinja2.Template(template_file.read()) out_file.write( template.render( genome_index_path=str(Path(ARGS.genome_index_path).resolve()) + "/", chromosome_sizes=Path(ARGS.chromosome_sizes).resolve(), genome_fragment=Path(ARGS.genome_fragment).resolve(), resolutions=" ".join([str(res) for res in ARGS.resolutions]) ) ) |
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | import argparse import numpy as np import pandas as pd from biopandas.pdb import PandasPdb # Allow to print wide dataframes. pd.set_option("display.max_columns", None) pd.set_option("display.width", 200) def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments. """ parser = argparse.ArgumentParser(add_help=False) required = parser.add_argument_group("required arguments") optional = parser.add_argument_group("optional arguments") required.add_argument( "--input-pdb", action="store", type=str, help="Input PDB file containing the 3D structure of the genome.", required=True, ) required.add_argument( "--output-pdb", action="store", type=str, help="Output PDB file containing the completed 3D structure of the genome.", required=True, ) optional.add_argument( "--threshold", action="store", type=float, help="Distance threshold to detect flipped contigs. Default: 3", required=False, default=2.0, ) # Add help. optional.add_argument( "-h", "--help", action="help", default=argparse.SUPPRESS, help="Show this help message and exit.", ) return parser.parse_args() def delete_outlier_beads(pdb_name_in, pdb_name_out, threshold): """Delete outlier coordinates in a PDB file containing a 3D genome structure. Parameters ---------- pdb_name_in : str PDB file containing the 3D structure of the genome pdb_name_out : str Output PDB file containing the annotated 3D structure of the genome """ pdb = PandasPdb().read_pdb(pdb_name_in) atoms = pdb.df["ATOM"] print(f"Found {atoms.shape[0]} beads in {pdb_name_in}") coord_columns = ["x_coord", "y_coord", "z_coord"] ATOMS = pd.DataFrame() # delete outlier coordinates chromosome by chromosome. for residue_number in atoms["residue_number"].unique(): # Select beads of one chromosome chromosome_df = atoms.query(f"residue_number == {residue_number}").reset_index( drop=True ) # Compute Euclidean distances between bead n and bead n+1 chromosome_df["distance"] = np.linalg.norm( chromosome_df[coord_columns].to_numpy() - chromosome_df[coord_columns].shift(-1).to_numpy(), axis=1, ) # The last distance is Nan because there is no bead n+1 for the last bead. # We replace it by the distance between bead n and bead n-1. column_index = chromosome_df.columns.get_loc("distance") chromosome_df.iat[-1, column_index] = chromosome_df.iat[-2, column_index] # Compute median distance with possible Nan values median_distance = chromosome_df["distance"].median(skipna=True) print(f"Median distance between beads: {median_distance:.2f}") # Select outlier beads. # i.e. the beads with the greater ditances. # Output beads coordinates with distances outlier_beads = chromosome_df["distance"] >= threshold * median_distance if not outlier_beads.sum(): print(f"Chromosome {residue_number} has no outlier beads") chromosome_df = chromosome_df.drop("distance", axis=1) # Save chromosome into the full stucture. ATOMS = pd.concat([ATOMS, chromosome_df]) else: # Print number of outlier beads that will be later removed. print( f"Chromosome {residue_number} has {outlier_beads.sum()} outlier beads" ) deleted_atoms = chromosome_df[ chromosome_df["distance"] >= threshold * median_distance ] print( deleted_atoms[ [ "record_name", "atom_number", "residue_name", "residue_number", "x_coord", "y_coord", "z_coord", "distance", ] ] ) kept_atoms = chromosome_df[ chromosome_df["distance"] < threshold * median_distance ] kept_atoms = kept_atoms.drop("distance", axis=1) # Save chromosome into the full stucture. ATOMS = pd.concat([ATOMS, kept_atoms]) ATOMS.reset_index(inplace=True, drop=True) ATOMS["line_idx"] = ATOMS.index print(f"Found {ATOMS.shape[0]} beads after removal of outlier beads.") pdb.df["ATOM"] = ATOMS pdb.to_pdb(pdb_name_out) print(f"Wrote {pdb_name_out}") if __name__ == "__main__": ARGS = get_cli_arguments() delete_outlier_beads(ARGS.input_pdb, ARGS.output_pdb, ARGS.threshold) |
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | from watermark import watermark def list_versions(): """List package versions and describe environment.""" print( watermark( machine=True, python=True, packages="numpy,pandas,scipy,biopandas,matplotlib,Bio,jinja2,sklearn,iced,pastis", watermark=True, conda=True ) ) if __name__ == "__main__": list_versions() print("-" * 40) |
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | import argparse import numpy as np import pandas as pd from pathlib import Path import iced from iced import io from pastis.optimization import mds from pastis import dispersion from pastis.optimization import negative_binomial_structure from pastis.io.write_struct.pdb import writePDB from scipy import sparse def is_file(parser, file_path): """Check file exists. Parameters ---------- parser : argparse.ArgumentParser Command line argument parser file_path : str File path Returns ------- str File path """ if not Path(file_path).is_file(): parser.error(f"The file {file_path} does not exist") else: return file_path def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments """ parser = argparse.ArgumentParser() parser.add_argument( "--matrix", action="store", type=lambda name: is_file(parser, name), help="Name of Matrix file", required=True, ) parser.add_argument( "--bed", action="store", type=lambda name: is_file(parser, name), help="Name of Bed file", required=True, ) parser.add_argument( "--output", action="store", type=str, help="Name of output PDB file containing the 3D structure of the genome", required=True, ) return parser.parse_args() def run_pastis_nb(matrix_filename, bed_filename, output_filename, seed=0, percentage_to_filter=0.02): """Build 3D structure of genome with Pastis NB algorithme. Parameters ---------- matrix_filename : str Name of Matrix file bed_filename : str Name of Bed file output_filename : str Name of output PDB file with 3D structure of the genome seed : int (default: 0) Random generator seed percentage_to_filter : float (default: 0.02) Percentage to filter out data """ ############################################################################### #Load chromosome lengths and count data lengths = io.load_lengths(bed_filename) ############################################################################### # The sparse matrix generated by HiC-Pro seems to randomly miss a row. # The dense matrix generated by HiC-Pro_3.1.0/bin/utils/sparseToDense.py is always complete. # The 3D model is thus inferred from it. #counts = io.load_counts(matrix_filename, base=1) counts_maps = pd.read_csv(matrix_filename, sep='\t', header=None) counts_maps = counts_maps.values # to solve : File "iced/_filter_.pyx", line 15, in iced._filter_._filter_csr ValueError: Buffer dtype mismatch, expected 'DOUBLE' but got 'long' counts_maps = counts_maps.astype("double") #counts_maps = np.load(matrix_filename) counts_maps[np.isnan(counts_maps)] = 0 counts = sparse.coo_matrix(counts_maps) counts.setdiag(0) counts.eliminate_zeros() counts = counts.tocoo() ############################################################################### # Normalize the data counts = iced.filter.filter_low_counts( counts, percentage=percentage_to_filter, sparsity=False) normed, bias = iced.normalization.ICE_normalization( counts, max_iter=300, output_bias=True) # Save the normalization data and biases just for other purposes # io.write_counts("./nb_normalized.matrix", normed) # np.savetxt("./nb.biases", bias) random_state = np.random.RandomState(seed) ############################################################################### # First estimate MDS for initialization X = mds.estimate_X(normed, random_state=random_state) ############################################################################### # Estimate constant dispersion parameters dispersion_ = dispersion.ExponentialDispersion(degree=0) _, mean, variance, weights = dispersion.compute_mean_variance(counts, lengths, bias=bias) dispersion_.fit(mean, variance, sample_weights=(weights**0.5)) ############################################################################### # Now perform NB 3D inference. alpha = -3 beta = 1 counts = counts.tocoo() print("Estimating structure") X = negative_binomial_structure.estimate_X( counts, alpha, beta, bias=bias, lengths=lengths, dispersion=dispersion_, use_zero_entries=True, ini=X.flatten()) ############################################################################### # Remove beads that were not infered mask = (np.array(counts.sum(axis=0)).flatten() + np.array(counts.sum(axis=1)).flatten() == 0) mask = mask.flatten() X_ = X.copy() X_[mask] = np.nan ############################################################################### np.savetxt(output_filename + ".txt", X_) print("Results written to", output_filename) writePDB(X_, output_filename) if __name__ == "__main__": ARGS = get_cli_arguments() run_pastis_nb( matrix_filename=ARGS.matrix, bed_filename=ARGS.bed, output_filename=ARGS.output, seed=0, percentage_to_filter=0.02 ) |
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 | import argparse import math from operator import invert import shutil import sys from Bio import SeqIO from Bio.Seq import Seq from biopandas.pdb import PandasPdb import numpy as np import pandas as pd def get_cli_arguments(): """Command line argument parser. Returns ------- argparse.Namespace Object containing arguments """ parser = argparse.ArgumentParser(add_help=False) required = parser.add_argument_group("required arguments") optional = parser.add_argument_group("optional arguments") required.add_argument( "--pdb", action="store", type=str, help="PDB file containing the 3D structure of the genome.", required=True, ) required.add_argument( "--fasta", action="store", type=str, help="Fasta file containing the sequence of the genome.", required=True, ) required.add_argument( "--resolution", action="store", type=int, help="HiC resolution from which the structure has been generated.", required=True, ) required.add_argument( "--output-pdb", action="store", type=str, help="Output PDB file containing the fixed 3D structure of the genome.", required=True, ) required.add_argument( "--output-fasta", action="store", type=str, help="Output FASTA file containing the fixed sequence of the genome.", required=True, ) optional.add_argument( "--threshold", action="store", type=float, help="Distance threshold to detect flipped contigs. Default: 3", required=False, default=3.0, ) optional.add_argument( "--debug", action="store_true", help="Debug flag. Output one TSV file per chromosome with bead distances.", required=False, default=False, ) # Trigger to run the script or not. optional.add_argument( "--run", action="store", choices=("True", "False"), help="Run the script. Default: False.", required=False, default="False", ) # Add help. optional.add_argument( "-h", "--help", action="help", default=argparse.SUPPRESS, help="Show this help message and exit.", ) return parser.parse_args() def extract_chromosome_name_length(fasta_name): """Extract chromosome name and length from a FASTA file. Parameters ---------- fasta_name : str Name of the FASTA file containing the sequence of the genome. Returns ------- tuple List of chromosome names. List of chromosome lengthes. """ chromosome_name_lst = [] chromosome_length_lst = [] with open(fasta_name, "r") as fasta_file: print(f"Reading genome sequence in {fasta_name}") for record in SeqIO.parse(fasta_file, "fasta"): name = record.id length = len(record.seq) print(f"Found chromosome {name} with {length} bases") chromosome_name_lst.append(name) chromosome_length_lst.append(length) return chromosome_name_lst, chromosome_length_lst def find_inverted_contigs( pdb_name_in, chromosome_lengths, HiC_resolution, threshold ): """Find inverted contigs. Inverted contigs are detected based on the eucledian distance between adjacent beads in the 3D structure of the genome. Parameters ---------- pdb_name_in : str PDB file containing the 3D structure of the genome. chromosome_lengths : list List with chromosome lengths. HiC_resolution : int HiC resolution. threshold : float Threshold to detect flipped contigs. Returns ------- inverted_contigs : dict Dictionnary with inverted contigs. """ pdb_structure = PandasPdb().read_pdb(pdb_name_in) structure_df = pdb_structure.df["ATOM"] print(f"Number of beads read from structure: {structure_df.shape[0]}") coord_columns = ["x_coord", "y_coord", "z_coord"] missing_beads_number = structure_df[coord_columns].isna().sum().sum() // 3 print( f"Found {missing_beads_number} missing beads in structure" ) beads_per_chromosome = [ math.ceil(length / HiC_resolution) for length in chromosome_lengths ] print( f"Number of expected beads deduced from sequence and HiC resolution: {sum(beads_per_chromosome)}" ) if structure_df.shape[0] != sum(beads_per_chromosome): sys.exit( f"Cannot process structure {pdb_name_in} because it contains " f"{structure_df.shape[0]} beads " f"instead of {sum(beads_per_chromosome)}" ) inverted_contigs = {} for chrom_num in structure_df["residue_number"].unique(): print(f"\nLooking for inverted contigs into chromosome {chrom_num}") # Select beads of one chromosome chromosome_df = structure_df.query( f"residue_number == {chrom_num}" ).reset_index(drop=True) # Compute Euclidean distances between bead n and bead n+1 chromosome_df["distance"] = np.linalg.norm( chromosome_df[coord_columns].to_numpy() - chromosome_df[coord_columns].shift(-1).to_numpy(), axis=1 ) # Compute median distance with possible Nan values median_distance = chromosome_df["distance"].median(skipna=True) print(f"Median distance between beads: {median_distance:.2f}") # Select extremities of inverted contigs # i.e. beads with distance above a given threshold. # Output beads coordinates with distances if ARGS.debug: filename = f"chr_{chrom_num}.tsv" target_columns = [ "record_name", "atom_number", "atom_name", "residue_name", "chain_id", "residue_number", "x_coord", "y_coord", "z_coord", "line_idx", "distance", ] print(f"DEBUG: writing {filename} with distances.") chromosome_df[target_columns].to_csv( filename, sep="\t", index=False, na_rep="nan" ) beads_selection = ( chromosome_df["distance"] > threshold * median_distance ) inversion_limits = chromosome_df.loc[ beads_selection, "atom_number" ].values if len(inversion_limits) % 2 != 0: print("WARNING: odd number of inversion limits found") print( "WARNING: this might lead to a wrong detection of inverted contigs" ) print(inversion_limits) if len(inversion_limits) != 0: for limit_1, limit_2 in zip( inversion_limits[0::2], inversion_limits[1::2] ): print( f"Chromosome {chrom_num}: found inverted contig between bead {limit_1+1} and bead {limit_2}" ) if chrom_num in inverted_contigs: inverted_contigs[chrom_num].append((limit_1 + 1, limit_2)) else: inverted_contigs[chrom_num] = [(limit_1 + 1, limit_2)] else: inverted_contigs[chrom_num] = [] return inverted_contigs def flip_inverted_contigs_in_structure( inverted_contigs, pdb_name_in, pdb_name_out ): """Flip inverted contigs in the 3D structure of the genome. Parameters ---------- inverted_contigs : dict Dictionnary with inverted contigs pdb_name_in : str PDB file containing the 3D structure of the genome pdb_name_out : str Output PDB file containing the 3D structure of the genome """ pdb_structure = PandasPdb().read_pdb(pdb_name_in) coordinates = pdb_structure.df["ATOM"] if sum(map(len, inverted_contigs.values()))== 0: pdb_structure.to_pdb( path=pdb_name_out, records=None, gz=False, append_newline=True ) print("\nNothing to fix. Structure is fine.") return print("\nFlipping contigs.") for chrom_num in inverted_contigs: for contig in inverted_contigs[chrom_num]: contig_start, contig_end = contig print( f"Structure of chromosome {chrom_num}: " f"flip contig between beads {contig_start} " f"and {contig_end}" ) contig_start_index = coordinates[ (coordinates["residue_number"] == chrom_num) & (coordinates["atom_number"] == contig_start) ].index[0] contig_end_index = coordinates[ (coordinates["residue_number"] == chrom_num) & (coordinates["atom_number"] == contig_end) ].index[0] contig_before_df = coordinates.loc[: contig_start_index - 1, :] contig_df = coordinates.loc[contig_start_index:contig_end_index, :] contig_after_df = coordinates.loc[contig_end_index + 1 :, :] # Flip contig. contig_df = contig_df[::-1] # Assemble genome structure. coordinates = pd.concat( [contig_before_df, contig_df, contig_after_df] ) coordinates = coordinates.reset_index(drop=True) # The 'line_idx' column keeps the real order of atoms in the PDB file. coordinates["line_idx"] = coordinates.index pdb_structure.df["ATOM"] = coordinates pdb_structure.to_pdb( path=pdb_name_out, records=None, gz=False, append_newline=True ) def flip_inverted_contigs_in_sequence( inverted_contigs, chromosome_names, fasta_name_in, HiC_resolution, fasta_name_out, ): """Flip inverted contigs in the genome 3D structure and sequence. Parameters ---------- inverted_contigs : dict Dictionnary with inverted contigs. chromosome_names : list List with chromosome names. fasta_name_in : str Name of Fasta file containing the sequence of the genome. HiC_resolution : int HiC resolution. fasta_name_out : str Output FASTA file containing the fixed sequence (at the 3D structure resolution!). """ # Flip inverted contigs in the genome sequence. genome_fasta = SeqIO.to_dict(SeqIO.parse(fasta_name_in, "fasta")) if bool([a for a in inverted_contigs.values() if a == []]): with open(fasta_name_out, "w") as fasta_file: SeqIO.write(genome_fasta.values(), fasta_file, "fasta") print("Nothing to fix. Sequence is fine.") return for chrom_num in inverted_contigs: chrom_name = chromosome_names[chrom_num - 1] chrom_sequence = str(genome_fasta[chrom_name].seq) for contig in inverted_contigs[chrom_num]: contig_start = contig[0] * HiC_resolution contig_end = contig[1] * HiC_resolution print( f"Sequence of chromosome {chrom_num}: " f"flip inverted contig between base {contig_start} " f"and {contig_end}" ) contig_sequence = chrom_sequence[contig_start : contig_end + 1] # Flip contig. contig_sequence = contig_sequence[::-1] # Reassemble chromosome sequence. chrom_sequence = ( chrom_sequence[:contig_start] + contig_sequence + chrom_sequence[contig_end + 1 :] ) genome_fasta[chrom_name].seq = Seq(chrom_sequence) # Write genome sequence. with open(fasta_name_out, "w") as fasta_file: SeqIO.write(genome_fasta.values(), fasta_file, "fasta") if __name__ == "__main__": # Parse command line arguments. ARGS = get_cli_arguments() if ARGS.run != "True": print("Do not verify inverted contigs.") print(f"Copying {ARGS.pdb} to {ARGS.output_pdb}.") shutil.copyfile(ARGS.pdb, ARGS.output_pdb) print(f"Copying {ARGS.fasta} to {ARGS.output_fasta}.") shutil.copyfile(ARGS.fasta, ARGS.output_fasta) sys.exit() # Read Fasta file and extract chromosome names and lengths. CHROMOSOME_NAMES, CHROMOSOME_LENGTHS = extract_chromosome_name_length( ARGS.fasta ) # Find inverted contigs. INVERTED_CONTIGS = find_inverted_contigs( ARGS.pdb, CHROMOSOME_LENGTHS, ARGS.resolution, ARGS.threshold ) # Flip inverted contigs in the genome 3D structure and sequence. flip_inverted_contigs_in_structure( INVERTED_CONTIGS, ARGS.pdb, ARGS.output_pdb ) flip_inverted_contigs_in_sequence( INVERTED_CONTIGS, CHROMOSOME_NAMES, ARGS.fasta, ARGS.resolution, ARGS.output_fasta, ) |
31 32 33 34 | shell: "python ../scripts/describe_conda_env.py >{output} &&" "fasterq-dump --version >>{output} &&" "pigz --version >>{output} " |
46 47 48 49 50 51 | shell: "/usr/local/bin/HiC-Pro_3.1.0/bin/utils/digest_genome.py --help >{output} && " "printf '\n\n\n' >>{output} && " "bowtie2-build --version >>{output} &&" "printf '\n\n\n' >>{output} && " "LC_ALL=C; HiC-Pro --version >>{output} " |
74 75 76 | shell: "fasterq-dump {wildcards.sra_id} --threads {threads} --log-level {params.loglevel} " "{params.progress} --outdir {params.outdir} >{log} 2>&1 " |
87 88 | shell: "mv {input} {output}" |
122 123 124 125 126 | shell: "python ../scripts/calculate_chromosome_sizes.py " "--fasta {input} " "--output {output} " ">{log} 2>&1 " |
142 143 144 145 146 147 | shell: "/usr/local/bin/HiC-Pro_3.1.0/bin/utils/digest_genome.py " "-r {config[hicpro_restriction_sites]} " "-o {output} " "{input} " ">{log} 2>&1 " |
163 164 | shell: "bowtie2-build {input} HiC-Pro/bowtie2_index/genome >{log} 2>&1" |
184 185 186 187 188 189 190 191 192 | shell: "python ../scripts/create_HiC_Pro_config.py " "--template {input.template} " "--chromosome-sizes {input.chromosome_sizes} " "--genome-fragment {input.genome_fragment} " "--genome-index-path {params.genome_index_path} " "--resolutions {params.resolutions} " "--output {output} " ">{log} 2>&1" |
212 213 214 215 216 217 | shell: "LC_ALL=C; echo 'y' | HiC-Pro " "-i fastq_files " "-o HiC-Pro/output " "-c {input.config} " ">{log} 2>&1 " |
233 234 | shell: "cp {input} {output}" |
257 258 259 260 261 262 263 264 265 | shell: "HiC-Pro " "-i HiC-Pro/merged_samples " "-o HiC-Pro/merged_output " "-c {input.config} " "-s merge_persample " "-s build_contact_maps " "-s ice_norm " ">{log} 2>&1 " |
280 281 282 283 284 285 | shell: "/usr/local/bin/HiC-Pro_3.1.0/bin/utils/sparseToDense.py " "--bins {input.bed} " "--output {output} " "{input.matrix} " ">{log} 2>&1 " |
300 301 302 303 304 305 | shell: "/usr/local/bin/HiC-Pro_3.1.0/bin/utils/sparseToDense.py " "--bins {input.bed} " "--output {output} " "{input.matrix} " ">{log} 2>&1 " |
319 320 321 322 323 | shell: "python ../scripts/build_contact_maps.py " "--contacts {input} " "--map {output} " ">{log} 2>&1 " |
338 339 340 341 342 343 | shell: "python ../scripts/infer_structures_nb.py " "--matrix {input.matrix} " "--bed {input.bed} " "--output {output} " ">{log} 2>&1 " |
358 359 360 361 362 363 364 | shell: "python ../scripts/assign_chromosomes.py " "--pdb {input.structure} " "--fasta {input.sequence} " "--resolution {wildcards.resolution} " "--output {output} " ">{log} 2>&1 " |
382 383 384 385 386 387 388 389 390 | shell: "python ../scripts/verify_inverted_contigs.py " "--run {params.verify} " "--pdb {input.structure} " "--fasta {input.sequence} " "--resolution {wildcards.resolution} " "--output-pdb {output.structure} " "--output-fasta {output.sequence} " ">{log} 2>&1 " |
404 405 406 407 408 | shell: "python ../scripts/add_missing_beads.py " "--input-pdb {input} " "--output-pdb {output} " ">{log} 2>&1 " |
422 423 424 425 426 | shell: "python ../scripts/delete_outlier_beads.py " "--input-pdb {input} " "--output-pdb {output} " ">{log} 2>&1 " |
441 442 443 444 445 446 447 | shell: "python ../scripts/convert_to_g3d.py " "--pdb {input.structure} " "--fasta {input.sequence} " "--resolution {wildcards.resolution} " "--output {output} " ">{log} 2>&1 " |
Support
- Future updates
Related Workflows





