Diffusion-Based Parcellation of Piriform Cortex with Snakemake Workflow

public public 1yr ago 0 bookmarks

Snakemake workflow overview for diffusion-based parcellation of the piriform cortex:

  1. rule hcp7T_Diffusion_bedpostx_gpu : Generate white matter fibre orientations within each brain voxel for each subject's HCP 7T diffusion data via FSL's bedpostx_gpu .

  2. rule hcp_mmk : Generate target segmentation 3D nifti volumes (bilateral) from HCP32k surf gifti files (180 regions)

  3. rule diffparc_smk : Resample the piriform seed/hcp-mmk180_targets-->HCP7TDiffusionResolution and subsequently perform probabilistic tractography from the piriform seed in each subject's space to hcp-mmk180_targets via FSL's probtrackx2_gpu

    3b. Bring connectivity data for each subject's seed voxels into template and performs spectral clustering on the concatenated feature vectors to parcellate into k regions

    3c. Create node and edges tables to use as the input to the Gephi software for each of the clustering solutions

  4. rule tractmap Perform probabilistic tractography on each individual cluster to generate tractography maps for each cluster following spectral clustering (useful for visualization of each cluster's individual connectivity)

Inputs in config/config.yml:

  1. participants.tsv : Target subject IDs that have both HCP 7T diffusion and be used in creation of a subject specific template from: ants_build_template_smk , (see input 5)

  2. template_prob_seg : Probabilistic segmentation as a 3D nifti for individual piriform hemisphere to process

  3. prob_seg_threshold : Value to threshold the probabilistic segmentation at (default is 0.5, e.g. 0.2 would include 80% of the probseg)

  4. template : Name of template to use for filenaming within the pipeline

  5. ANTS transformations from template T1w space to/from each subject's T1w, i.e. from: ants_build_template_smk

    • ants_affine_mat : Forward affine from subject-->template

    • ants_invwarp_nii : Warp from template-->subject

    • ants_warp_nii : Warp from subject-->template

    • ants_ref_nii : Study-specific T1w created from: [ants_build_template_smk]

  6. lhrh_targets_independent_condition : When set to "True": 358 regions, analyze connectivity of piriform segmentation within each hemisphere independently (considers each lh and rh region to be separate/different between hemispheres; when set to "false": 179 regions, analyze connectivity of piriform segmentation irrespective of hemispheres (considers each lh and rh region to be a combined single region across hemispheres)

  7. targets_txt_lh-rh_separate_358regions : List of ROIs in atlas segmentations to use as targets for piriform connectivity (358 lh vs. rh)

  8. targets_txt_lh-rh_combined_179regions : List of ROIs in atlas segmentations to use as targets for piriform connectivity (179 lh + rh)

  9. max_k : Maximum number of clusters to create during spectral clustering (will create all cluster solutions <= max_k)

  10. probtrack : Tractography parameters, seed_resolution defines the resolution to resample all other inputs to

  11. probtrack_tractmap : Tractography parameters to generate individual tractmaps for each cluster following spectral clustering (useful for visualization)

  12. in_freesurfer : Path to subject freesurfer dir from HCP_1200 release

  13. dwi_preproc_dir : Path to base directory containing the HCP 7T diffusion data for all subjects

  • in_dwi_nii

  • in_dwi_bval

  • in_dwi_bvec

  • in_dwi_grad_dev

  • in_dwi_mask_nii

Required Singularity Containers:

  • singularity_neuroglia : FSL, ANTS, NiftyReg, gnu-parallel

  • singularity_freesurfer : Freesurfer

  • singularity_connectome_workbench : Connectome wb

  • singularity_cuda : FSL6 with CUDA container (for running gpu versions of bedpostx and probtrackx2)

Usage:

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

  1. Create a new github repository using this workflow as a template.

  2. Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yml to configure the workflow execution, and participants.tsv to specify your subjects.

Step 3: Install and Activate Snakemake

Install Snakemake using conda: conda create -c bioconda -c conda-forge -n snakemake snakemake

Activate the conda environment: conda activate snakemake or add this line into your bash startup

For installation details, see the instructions in the Snakemake documentation.

Step 4: Execute workflow

  • Option A) Execute the workflow locally via: snakemake --use-conda --use-singularity --cores $N using $N cores, or, run it in a cluster environment via snakemake --use-conda --use-singularity --cluster qsub --jobs 100 or snakemake --use-conda --use-singularity --drmaa --jobs 100

  • Option B) If you are using Compute Canada, you can use the cc-slurm profile, which submits jobs and takes care of requesting the correct resources per job (including GPUs). Once it is set-up with cookiecutter, run: snakemake --profile cc-slurm

  • Option C) Or, with neuroglia-helpers can get a 1-GPU, 8-core, 32gb node and run locally there. First, get a node with a GPU (default 8-core, 32gb, 3 hour limit): regularInteractive -g . Then, run: snakemake --use-conda --use-singularity --cores 8 --resources gpu=1 mem=32000

Step 4b: Executing workflow on Compute Canada via the cc-slurm profile in iterations

If you have a large number of subjects, we recommend you run the workflow in the following stages:

  1. bedpostx_gpu: via snakemake --profile cc-slurm --until perform_bedpostx_gpu

  2. generate target segs and probtrackx2_gpu: via snakemake --profile cc-slurm --until run_probtrack

  3. piriform diffusion parcellation with spectral_clustering in template space on concatenated subjects and creation of Gephi input nodes/edges tables: via snakemake --profile cc-slurm --until create_gephi_input_nodes_and_edge_tables

  4. probtrackx2_gpu run again on each cluster to generate visualizations of each cluster's connectivity: via snakemake --profile cc-slurm --until vote_tractmap_template

After running each iteration, run snakemake --profile cc-slurm --until <iteration name previously run> -np to ensure that all jobs completed and all outputs were sucessfully created before moving on to the next iteration.

Code Snippets

18
shell: 'fslmaths {input.lh} -max {input.rh} {output.lh_rh} &> {log}'
30
shell: 'fslmaths {input.lh} -max {input.rh} {output.lh_rh} &> {log}'
41
42
shell:
    'cp -v {input} {output} &> {log}'
52
53
shell:
    'fslmaths {input} -thr {params.threshold} -bin {output} &> {log}'
66
67
shell:
    'fslmaths {input.dwi} -bin {output.mask} &> {log}'
112
113
shell:
    'fslmaths {input} -thr {params.threshold} -bin {output} &> {log}'
156
157
shell:
    'fslmaths {input.seed_res_bin} -mul 1000 -max {input.targets_res} -uthr 999 {output.targets_pir_overlap_removed} &> {log}'
174
175
shell:
    'mkdir -p {output} && parallel  --jobs {threads} fslmaths {input.targets_res} -thr {{1}} -uthr {{1}} -bin {{2}} &> {log} ::: {params.target_nums} :::+ {params.target_seg}'
188
189
190
191
192
run:
    f = open(output.target_txt,'w')
    for s in params.target_seg:
        f.write(f'{s}\n')
    f.close()
220
221
shell:
    'mkdir -p {output.probtrack_dir} && singularity exec -e --nv {params.container} probtrackx2 --samples={params.bedpost_merged} --mask={input.mask} --seed={input.seed_res} --targetmasks={input.target_txt} --seedref={input.seed_res} --nsamples={params.nsamples} --dir={output.probtrack_dir} {params.probtrack_opts} -V 1  &> {log}'
244
245
shell:
    'mkdir -p {output.probtrack_dir} && singularity exec -e --nv {params.container} probtrackx2_gpu --samples={params.bedpost_merged} --mask={input.mask} --seed={input.seed_res} --targetmasks={input.target_txt} --seedref={input.seed_res} --nsamples={params.nsamples} --dir={output.probtrack_dir} {params.probtrack_opts} -V 2  &> {log}' 
268
269
shell:
    'mkdir -p {output} && ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1 parallel  --jobs {threads} antsApplyTransforms -d 3 --interpolation Linear -i {{1}} -o {{2}}  -r {input.ref} -t {input.warp} -t {input.affine} &> {log} :::  {params.in_connmap_3d} :::+ {params.out_connmap_3d}' 
288
script: '../scripts/save_connmap_template_npz.py'
304
script: '../scripts/gather_connmap_group.py'
322
script: '../scripts/spectral_clustering.py'
345
script: '../scripts/save_connmapClusters_template_npz.py'
365
script: '../scripts/gather_connmapClusters_group.py'
383
script: '../scripts/gather_connmapClusters_group.py' 
407
script: '../scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py'
427
script: '../scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py'
31
32
33
34
35
36
shell:
	'cp {input.dwi7T_nii} {output.dwi7T_nii_copy} && '
	'cp {input.dwi7T_bval} {output.dwi7T_bval_copy} && '
	'cp {input.dwi7T_bvec} {output.dwi7T_bvec_copy} && '
	'cp {input.dwi7T_grad_dev} {output.dwi7T_grad_dev_copy} && '
	'cp {input.dwi7T_mask_nii} {output.dwi7T_mask_nii_copy}'
53
54
shell:
	'mkdir -p {output.outdir} && singularity exec -e --nv {params.container} bedpostx_gpu {input.indir} -g &> {log}'
21
shell: 'FS_LICENSE={params.license} mris_convert {input} {output} &> {log}'
32
shell: 'FS_LICENSE={params.license} mri_convert {input} {output} &> {log}'
45
shell: 'FS_LICENSE={params.license} mri_info {input.t1} --tkr2scanner > {output.tkr2scanner} 2> {log}'
57
shell: 'wb_command -surface-apply-affine {input.surf} {input.tkr2scanner} {output.surf} &> {log}'
70
shell: 'wb_command -surface-average {output.midthickness} -surf {input.white} -surf {input.pial} &> {log}'
88
shell: 'wb_command -surface-resample {input.surf} {input.current_sphere} {input.new_sphere} {params.method} {output.surf} &> {log}'
108
109
110
111
shell: 
    'wb_command -label-resample {input.label} {input.current_sphere} {input.new_sphere}'
    ' {params.method} {output.label}'
    ' -area-surfs {input.current_surf} {input.new_surf} &> {log}'
128
129
130
131
shell:
    'wb_command -label-to-volume-mapping {input.label} {input.surf} {input.vol_ref} {output.label_vol}'
    ' -ribbon-constrained {input.white_surf} {input.pial_surf}'
    ' -greedy &> {log}'
145
146
shell:
    'c3d {input.label_vol} -replace 110 0 -o {output.label_vol_removepir110} &> {log}'
164
165
166
167
shell:
    'fslmaths {input.label_vol_removepir110} -thr 111 -sub 1 {output.tempA_thrBelow111_sub1} && '
    'fslmaths {input.label_vol_removepir110} -uthr 109 {output.tempB_thrAbove109} && '
    'fslmaths {output.tempA_thrBelow111_sub1} -max {output.tempB_thrAbove109} {output.relabelled_vol_removepir110} &> {log}'
179
180
shell:
    'fslmaths {input.relabelled_rh_vol_removepir110} -add 179 -thr 180 {output.relabelled_rh_vol_removepir110_add179} &> {log}'
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import nibabel as nib
from functools import reduce
import os


#Set the name of the seed you want to create the Gephi input for
seeds = snakemake.params.seeds_py

#Set the k parcellation you want to create the Gephi input for
#this_k = 3
cluster_range = range(2,snakemake.params.max_k+1)

#Set the path to the location of the directory containing the connmapGroup directories for each seed and each k 
#(i.e. connmapClusters dir from diffparc located in /scratch/nchris5/HCP_template_piriform_segmentations/diffparc/results)
path = snakemake.params.connmapClusters_dir


#Read in the Glasser_2016_Table with the 'SectionsBOLD' column manually included, which indicates which of the 22 overarching regions each of the 180 regions belongs to; 
#and the 'ColourRGB' column manually included, which indicates the RGB for that specific SectionsBOLD
GlasserTable = pd.read_csv(snakemake.params.GlasserTable_22regions_colours, header=0, usecols=range(1,8))
GlasserTable.index = range(1,snakemake.params.num_targets+1)
print('Original GlasserTable : \n', GlasserTable)
#Sort by the section values from 1-->22
GlasserTable_sorted_PrimarySection = GlasserTable.sort_values(by='SectionsBOLD', ascending=True)
#Change the actual index to 1-->179
GlasserTable_sorted_PrimarySection.index = range(1,snakemake.params.num_targets+1)
#Insert a column that describes the sorted dataframe new index from 1-->180
GlasserTable_sorted_PrimarySection.insert(loc=1, column='Sorted_PrimarySection_Index', value=range(1,snakemake.params.num_targets+1))
#Create the list of target ID's with leading zeros so all ID's have 3 digits (i.e. 001, 010, 100), as for gephi to read in proper order
GlasserTable_sorted_PrimarySection_Index_ThreeDigits = [str(indID).zfill(3) for indID in GlasserTable_sorted_PrimarySection.Sorted_PrimarySection_Index]
print('GlasserTable_sorted_PrimarySection : \n', GlasserTable_sorted_PrimarySection)

print('\n ------ Creating Nodes df ------ \n ')
#Includes the Id, Label, color, and SectionsBOLD for each of the 179 or 358 Glasser targets (001-179 or 001-358)
Targets = pd.DataFrame({'Id': GlasserTable_sorted_PrimarySection_Index_ThreeDigits, 'Label': GlasserTable_sorted_PrimarySection.AreaName, 'color': GlasserTable_sorted_PrimarySection.ColourRGB, 'Modularity Class': GlasserTable_sorted_PrimarySection.SectionsBOLD}, index=range(1,snakemake.params.num_targets+1))
print('Targetsdf : \n', Targets)

for seed in seeds:
    for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4
        #Set the RBG colours for each of the clusters within this_k to the same colour that is present in the clustering output in itksnap (i.e. diffparc spectral_clustering output)
        if this_k==2:
            this_k_ColourRGB = ['255,0,0','0,255,0'] #Cluster01=Red, #Cluster02=Green
        elif this_k==3:
            this_k_ColourRGB = ['255,0,0','0,255,0','0,0,255'] #Cluster01=Red, #Cluster02=Green, #Cluster03=Blue
        elif this_k==4:
            this_k_ColourRGB = ['255,0,0','0,255,0','0,0,255', '255,255,0'] #Cluster01=Red, #Cluster02=Green, #Cluster03=Blue #Yellow
        #Create the Id's and labels for the individual clusters  for and get the basename for the individual clusters (e.g. Cluster01, Cluster02, Cluster03), assigning Id values above 179 or 358 (since 179 or 358 targets) e.g. 180=Cluster01, 181=Cluster02, 182=Cluster03
        IdlistClusters = list()
        labellist = list()
        for i,this_label in enumerate(range(1,this_k+1)):
            IdlistClusters.append(snakemake.params.num_targets+1+i)
            labellist.append('Cluster{this_label:02d}'.format(this_label=this_label))
        #Set the Modularity Class for each individual cluster to be distinct (i.e. Cluster01=23, Cluster02=24, Cluster03=25)
        ModularityClassClusters = list(range(23,this_k+23))
        #Create the dataframe for the clusters to add to the list of targets
        individualclusterstoadd = pd.DataFrame({'Id': IdlistClusters, 'Label': labellist, 'color': this_k_ColourRGB, 'Modularity Class': ModularityClassClusters}, index=range(snakemake.params.num_targets,snakemake.params.num_targets+this_k))
        print('\n individualclusterstoadd : \n', individualclusterstoadd)

        #Append the cluster labels dataframe to the bottom of the targets list dataframe to yield the final Nodesdf
        Nodesdf = Targets.append(individualclusterstoadd)
        print('\n Nodesdf : \n ', Nodesdf)




        print('\n ------ Creating Edges df ------ \n ')
        print(print('------ Creating Edges df SourceId and SourceName columns ------ \n '))

        #Multiply the targets df by number of clusters (i.e. first 180 will represent cluster 1, next 180 will represent cluster 2, etc.)
        EdgesSourceId = np.zeros((snakemake.params.num_targets,this_k), dtype=int)
        EdgesSourceNames = list()
        for i,this_label in enumerate(range(1,this_k+1)):
            EdgesSourceId[:,i] = this_label+snakemake.params.num_targets
            tempSourceNames = list()
            for j in range(snakemake.params.num_targets):
                tempSourceNames.append('Cluster{this_label:02d}'.format(this_label=this_label))
            EdgesSourceNames.append(tempSourceNames)
        #Use functools reduce to remove iterative lists into a single list i.e. [179xCluster01], [179xCluster02], [179xCluster03] to [179xCluster01, 179xCluster02, 179xCluster03]
        EdgesSourceNames = reduce(lambda x,y: x+y, EdgesSourceNames)
        #Reshape the edges source col current with shape (179,numClusters) to 179*numClusters with order 'F' (i.e. 179x'181' --> 179x'182' --> 179x'183')
        EdgesSourceId = np.reshape(EdgesSourceId, (snakemake.params.num_targets*this_label), order='F')

        TargetsMultiplied = pd.concat([Targets]*this_k, ignore_index=True)
        print('TargetsMultiplied Shape = ', TargetsMultiplied.shape)
        print('TargetsMultiplied = \n', TargetsMultiplied)

        print('\n ------ Creating Edges df Columns:   Source, SLabel, Target, TLabel, ConnectivityScore, ConnectivityScoreNormalized, Rank_in_Cluster, Weight ------ \n ')

        #Create list of paths to individual clusternpz files for this_k
        connmapClustersDirList = list()
        thiskString = str(this_k)
        #Only select the directories that have the specified seed and also end with k=this_k
        for connmapClustersDir in os.listdir(path):
            if ((connmapClustersDir.find(seed) != -1) and (connmapClustersDir.endswith(thiskString))):
                connmapClustersDirList.append(connmapClustersDir)
                newpath = path + '/' + connmapClustersDirList[0]
                individualClusters_thisk_npzList = list()
                for individualClusters_thisk_npz in os.listdir(newpath):
                    if individualClusters_thisk_npz.endswith('.npz'):
                        fullpath = newpath + '/' + individualClusters_thisk_npz
                        individualClusters_thisk_npzList.append(fullpath)
                print(individualClusters_thisk_npzList)

        #Initiate an empty dataframe that will be appended to include the values from each cluster
        df_ALLCLUSTERS_InsertRank = pd.DataFrame(columns = ['targetnames', 'ConnectivityScore', 'ConnectivityScoreNormalized', 'SortedRank'])

        #Create a list of ranks from 1(lowest connectivity) to 180(highest connectivity)
        for i,thisk_thisCluster_npz in enumerate(individualClusters_thisk_npzList):
            data_thisCluster = np.load(thisk_thisCluster_npz)
            conngroup_thisCluster = data_thisCluster['conn_group']

            connAvg_across_subs_thisCluster = conngroup_thisCluster.mean(axis=0) #Shape of input is (#sub,#voxels,#targets)
            targetAvg_across_voxels_thisCluster = connAvg_across_subs_thisCluster.mean(axis=0) #Shape of input is (#voxels,#targets)
            sum_targetAvg_across_voxels_thisCluster = np.sum(targetAvg_across_voxels_thisCluster) #Sum all the targets to get sum of conn scores
            quotients = [number / sum_targetAvg_across_voxels_thisCluster for number in targetAvg_across_voxels_thisCluster]

            df_thisCluster = pd.DataFrame({'targetnames': GlasserTable.AreaName, 'ConnectivityScore': targetAvg_across_voxels_thisCluster, 'ConnectivityScoreNormalized': quotients}, index=range(1,snakemake.params.num_targets+1))
            #Insert a column the SectionsBold column from the GlasserTable
            df_thisCluster.insert(loc=1, column='SectionsBOLD', value=GlasserTable.SectionsBOLD)
            print('\n df_thisCluster_InsertedSectionsBold k-%s and cluster-%s : \n %s' % (this_k, i+1, df_thisCluster))
            df_thisCluster_sorted_PrimarySection =  df_thisCluster.sort_values(by='SectionsBOLD', ascending=True)
            df_thisCluster_sorted_PrimarySection.index = range(1,snakemake.params.num_targets+1)
            #Insert a column that describes the sorted dataframe new index from 1-->179
            df_thisCluster_sorted_PrimarySection.insert(loc=1, column='Sorted_PrimarySection_Index', value=range(1,snakemake.params.num_targets+1))
            print('\n df_thisCluster_sorted_PrimarySection k-%s and cluster-%s : \n %s' % (this_k, i+1, df_thisCluster_sorted_PrimarySection))

            #Assign weight based on ConnectivityScore
            df_sorted_thisCluster = df_thisCluster_sorted_PrimarySection.sort_values(by='ConnectivityScoreNormalized', ascending=False)
            print('\n df_sorted_thisCluster k-%s and cluster-%s : \n %s' % (this_k, i+1, df_sorted_thisCluster))

            df_sorted_thisCluster_InsertRank = df_sorted_thisCluster
            df_sorted_thisCluster_InsertRank['SortedRank'] = pd.Series(reversed(range(1,snakemake.params.num_targets+1)),index=(df_sorted_thisCluster.index))

            df_thisCluster_sorted_PrimarySection_InsertRank = df_thisCluster_sorted_PrimarySection
            df_thisCluster_sorted_PrimarySection_InsertRank['SortedRank'] = pd.Series(reversed(range(1,snakemake.params.num_targets+1)),index=(df_sorted_thisCluster.index))
            print('\n df_sorted_thisCluster k-%s and cluster-%s with Rank : \n %s' % (this_k, i+1, df_sorted_thisCluster_InsertRank))
            print('\n df_thisCluster k-%s and cluster -%s with Rank : \n %s' % (this_k, i+1, df_thisCluster_sorted_PrimarySection_InsertRank))

            df_ALLCLUSTERS_InsertRank = df_ALLCLUSTERS_InsertRank.append(df_thisCluster_sorted_PrimarySection_InsertRank, ignore_index=True)
            print('CURRENT df_ALLCLUSTERS_InsertRank Shape = ', df_ALLCLUSTERS_InsertRank.shape)
        print('FINAL df_ALLCLUSTERS_InsertRank Shape = ', df_ALLCLUSTERS_InsertRank.shape)
        print('FINAL df_ALLCLUSTERS_InsertRank : \n', df_ALLCLUSTERS_InsertRank)

        #Add in the edge weight column
        #Multiply the SortedRank*ConnectivityScoreNormalized to get the Weight of each edge, round to integer, then add 1 to ensure that lowest edge weight value is > 0
        df_ALLCLUSTERS_InsertRank['Weight'] = df_ALLCLUSTERS_InsertRank['SortedRank'] * df_ALLCLUSTERS_InsertRank['ConnectivityScoreNormalized']
        df_ALLCLUSTERS_InsertRank['Weight'] = df_ALLCLUSTERS_InsertRank['Weight'].apply(lambda x: round(x, 0) + 1)

        Edgesdf = pd.DataFrame({'Source': EdgesSourceId, 'SLabel': EdgesSourceNames, 'Target': TargetsMultiplied.Id, 'TLabel': TargetsMultiplied.Label, 'ConnectivityScore': df_ALLCLUSTERS_InsertRank.ConnectivityScore, 'ConnectivityScoreNormalized': df_ALLCLUSTERS_InsertRank.ConnectivityScoreNormalized, 'Rank_in_Cluster': df_ALLCLUSTERS_InsertRank.SortedRank, 'Weight': df_ALLCLUSTERS_InsertRank.Weight}, index=range(0,this_k*snakemake.params.num_targets))
        print('\n Shape of this Edges is %s and Edgesdf = \n %s  ' % (Edgesdf.shape, Edgesdf))


        print('\n ------ Saving Nodesdf and Edgees df to csv ------ \n ')

        if not os.path.exists('results/gephi_input_NodesandEdges_tables_' + seed):
            os.mkdir('results/gephi_input_NodesandEdges_tables_' + seed)

        #Create savepaths
        savepathNodesdf = 'results/gephi_input_NodesandEdges_tables_' + seed + '/GephiNodes_' + seed + '_k-' + str(this_k) + '_Sorted_Overarching22Categories_WithRGBColours.csv'
        savepathEdgesdf = 'results/gephi_input_NodesandEdges_tables_' + seed + '/GephiEdges_' + seed + '_k-' + str(this_k) + '_Sorted_Overarching22Categories_NormalizedWeight.csv'

        print('savepathNodesdf = ', savepathNodesdf)
        print('savepathEdgesdf = ', savepathEdgesdf)

        #Write to Nodesdf and Edgesdf to CSV files
        Nodesdf.to_csv(savepathNodesdf, index = False, header=True)
        Edgesdf.to_csv(savepathEdgesdf, index = False, header=True)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from scipy import stats

targetnames = pd.read_table('results/diffparc/sub-100610/target_images_from-hcp7Tsubj_rpiriform_overlap_removed.txt', names=['TargetNames'], header=None)
targetnamesList = targetnames.TargetNames.to_list()
targetnamesList = [ line.replace('results/diffparc/sub-100610/lh_rh_targets_from-hcp7Tsubj_rpiriform_space-native_resampled_dwi_resolution_pir-overlap-removed/','') for line in targetnamesList ] #Remove that string from each line
targetnamesList = [ line.replace('_ROI.nii.gz','') for line in targetnamesList ] #Remove this string from each
cluster_range = range(2,snakemake.params.max_k+1)
for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4
    label = 1
    thiskdirslist = list()
    thiskString = str(this_k)
    for j,thiskdir in enumerate(snakemake.input.connmap_Clusters_npz_dir):
        if thiskdir.endswith(thiskString):
            thiskdirslist.append(thiskdir)
    newClusterGroupDir = '{snakeout}_k-{thisk}'.format(snakeout=snakemake.params.connmap_Clusters_group_npz_dir_base,thisk=this_k)
    if not os.path.exists(newClusterGroupDir):
        os.mkdir(newClusterGroupDir)
    fig = plt.figure(figsize=[6.4*this_k,4.8])
    while label <= this_k:
        thislabel_namesList = list()
        labelString = str(label) + '_connmap.npz'
        print('labelString = ',labelString)
        for l,eachthiskdir in enumerate(thiskdirslist):
            thislabel_names = [f for f in os.listdir(path=eachthiskdir) if f.endswith(labelString)]
            thislabel_namesString = ''.join(thislabel_names)
            thislabel_fullpath = eachthiskdir + '/' + thislabel_namesString
            thislabel_namesList.append(thislabel_fullpath)
        thisk_thislabelonesubj = thislabel_namesList[0]
        print('thisk_thislabelonesubj = ',thisk_thislabelonesubj)
        data_thisk_thislabelonesubj = np.load(thisk_thislabelonesubj)
        affine = data_thisk_thislabelonesubj['affine']
        mask = data_thisk_thislabelonesubj['mask']
        conn_shape = data_thisk_thislabelonesubj['conn'].shape 
        nsubjects = len(thislabel_namesList)
        print('nsubjects = ',nsubjects)
        conn_group = np.zeros([nsubjects,conn_shape[0],conn_shape[1]])
        savestring = '{newClusterGroupD}/cluster{k_i:02d}_connmap_group.npz'.format(newClusterGroupD=newClusterGroupDir,k_i=label)
        for sub,npz in enumerate(thislabel_namesList):
            data_thisk_thislabel_eachsub = np.load(npz)
            conn_group[sub,:,:] = data_thisk_thislabel_eachsub['conn']
        np.savez(savestring, conn_group=conn_group,mask=mask,affine=affine)
        connmatrix_for_each_target_avggroup = conn_group.mean(axis=0) #Average across rows (subjects) to get connmap matrix average across all subjects
        print('connmatrix_for_each_target_avggroup shape = ',connmatrix_for_each_target_avggroup.shape)
        avg_conn_for_each_target_avggroup = connmatrix_for_each_target_avggroup.mean(axis=0) #Average across rows (voxels in piriform, indexed) to get connmap matrix average across all subjects
        sum_avg_conn_for_each_target_avggroup = np.sum(avg_conn_for_each_target_avggroup)
        quotients = [number / sum_avg_conn_for_each_target_avggroup for number in avg_conn_for_each_target_avggroup]
        #Create dataframe that includes columns target names and the values of the targets
        df = pd.DataFrame({'targetnames': targetnamesList, 'ConnectivityScore': avg_conn_for_each_target_avggroup, 'ConnectivityScoreNormalized': quotients}, index=range(0,snakemake.params.num_targets))
        print("Targets prior to sorting based on avg conn for k-%s and label %s = %s" % (this_k, label, df))
        dfsorted = df.sort_values(by='ConnectivityScoreNormalized', ascending=False) #Sort the target regions descending
        print("Targets sorted based on avg conn for k-%s and label %s = %s" % (this_k, label, dfsorted))
        #Plot the 10 targets with the highest connectivity to the piriform
        x = np.linspace(1,10,10) #start,stop,number of ticks
        y = np.linspace(0,1,11) #Makes it incerement by 0.1
        ax = fig.add_subplot(1,this_k,label)
        ax.set_title("k-%s Cluster-0%s" % (this_k, label))
        ax.set_xticks(x)
        ax.set_yticks(y)
        ax.set_ylim([0,0.5])
        ax.set_xticklabels(dfsorted.targetnames[0:10], rotation='vertical')
        ax.set_xlabel('Target Name')
        ax.set_ylabel('Average Connectivity to Target')
        ax.bar(x,dfsorted.ConnectivityScoreNormalized[0:10]) #Gives the target regions with the top 10 highest average connectivites from the group averaged piriform seed
        label +=1
    figuresavestring = '{newClusterGroupDir}/k-{thisk}_Individual_Clusters_Top10connectivity_to_targets.png'.format(newClusterGroupDir=newClusterGroupDir,thisk=this_k)
    print('figuresavestring is : ',figuresavestring)
    plt.savefig(figuresavestring, bbox_inches = "tight")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import numpy as np

#load first file to get shape
data = np.load(snakemake.input.connmap_npz[0])
affine = data['affine']
mask = data['mask']
conn_shape = data['conn'].shape
nsubjects = len(snakemake.input.connmap_npz)
conn_group = np.zeros([nsubjects,conn_shape[0],conn_shape[1]])

for i,npz in enumerate(snakemake.input.connmap_npz):
    data = np.load(npz)
    conn_group[i,:,:] = data['conn']

#save conn_group, mask and affine
np.savez(snakemake.output.connmap_group_npz, conn_group=conn_group,mask=mask,affine=affine)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import nibabel as nib
import numpy as np
import os

#kclusters_nii_list_names = os.listdir(path=snakemake.input.clusters_template_space_dir)
#kclusters_nii_list = list()
#for filename in kclusters_nii_list_names:
#    fullpath = os.path.join(snakemake.input.clusters_template_space_dir, filename)
#    kclusters_nii_list.append(fullpath)

kclusters_nii_list = list()
for a,thiskclustervol in enumerate(snakemake.input.clusters_template_space):
    kclusters_nii_list.append(thiskclustervol)
print('kclusters_nii_list = ',kclusters_nii_list)

cluster_range = range(2,snakemake.params.max_k+1) #Range of the clusters [2, 3, 4]
ntargets = len(snakemake.params.connmap_3d) #Number of targets = 179

for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4
    label = 1 #Lowest label value is 1
    newClusterDir = '{snakeout}_k-{thisk}'.format(snakeout=snakemake.params.connmap_Clusters_npz,thisk=this_k)
    if not os.path.exists(newClusterDir):
        os.mkdir(newClusterDir)
    while label <= this_k: #For each individual label in that cluster file, stop after label hits that current k value
        print("Label val for k-%s prior to creating indices = %s" % (this_k, label))
        clustmask_nib = nib.load(kclusters_nii_list[i]) #Load in cluster 2, 3, 4
        print("This k-%scluster = %s" % (this_k, kclusters_nii_list[i]))
        clustmask_vol = clustmask_nib.get_fdata()
        clustmask_indices = clustmask_vol==label #Indices within the clustmask_vol for label=label (e.g. indices in k-2 colume where label = 1)
        maskedCount = clustmask_vol[clustmask_indices] #Number of voxels in a list that contain this label
        nvoxels = len(maskedCount) #Count of the number of voxels
        print("nvoxels for k-%s and label %s = %s" % (this_k, label, nvoxels))
        maskedVoxelsIndividualCluster = clustmask_vol
        maskedVoxelsIndividualCluster[maskedVoxelsIndividualCluster!=label] = 0 #Any of the voxels that do not contain the label value become 0 (e.g. if this_k = 2, and label = 1, then label = 2 becomes 0)
        conn = np.zeros((nvoxels,ntargets)) #Create conn matrix of size nvoxels in this cluster,ntargets
        savestring = '{newClustDir}/cluster{k_i:02d}_connmap.npz'.format(newClustDir=newClusterDir,k_i=label)
        print("savestring for k-%s and label %s = %s" % (this_k, label, savestring))
        for j,conn_file in enumerate(snakemake.params.connmap_3d):
            targvol = nib.load(conn_file).get_fdata()
            maskedtargvol = targvol[clustmask_indices].T
            conn[:,j] = maskedtargvol
        print("Conn shape for k-%s and label %s =  %s \n" % (this_k, label, conn.shape))
        np.savez(savestring, conn=conn,mask=maskedVoxelsIndividualCluster,affine=clustmask_nib.affine)
        label +=1 #Label becomes 2 and loop while loop continues, then label becomes 3 and the loop exits
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import nibabel as nib
import numpy as np
mask_nib = nib.load(snakemake.input.mask)
mask_vol = mask_nib.get_fdata()
mask_indices = mask_vol > 0 
masked = mask_vol[mask_indices]

nvoxels = len(masked)
ntargets = len(snakemake.params.connmap_3d)
conn = np.zeros((nvoxels,ntargets))
for i,conn_file in enumerate(snakemake.params.connmap_3d):
    vol = nib.load(conn_file).get_fdata()
    masked =  vol[mask_indices].T
    conn[:,i] = masked
np.savez(snakemake.output.connmap_npz, conn=conn,mask=mask_vol,affine=mask_nib.affine)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import sklearn
import numpy as np
import nibabel as nib

# Define a function for saving niftis 
def save_label_nii (labels,mask,affine,out_nifti):
    labels_vol = np.zeros(mask.shape)
    labels_vol[mask > 0] = labels+1 #add a 1 so label 0 is diff from bgnd
    labels_nib = nib.Nifti1Image(labels_vol,affine)
    nib.save(labels_nib,out_nifti)

data = np.load(snakemake.input.connmap_group_npz)
cluster_range = range(2,snakemake.params.max_k+1)
out_nii_list = snakemake.output

conn_group = data['conn_group']
mask = data['mask']
affine = data['affine']

# Concat subjects
conn_group_m = np.moveaxis(conn_group,0,2)
conn_concat = conn_group_m.reshape([conn_group_m.shape[0],conn_group_m.shape[1]*conn_group_m.shape[2]])

# Run spectral clustering and save output nifti
for i,k in enumerate(cluster_range):
    from sklearn.cluster import SpectralClustering
    clustering = SpectralClustering(n_clusters=k, assign_labels="discretize",random_state=0,affinity='cosine').fit(conn_concat)
    print(f'i={i}, k={k},saving {out_nii_list[i]}')
    save_label_nii(clustering.labels_,mask,affine,out_nii_list[i])
ShowHide 34 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/nchris5/diffparc-smk_piriform
Name: diffparc-smk_piriform
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...