Diffusion-Based Parcellation of Piriform Cortex with Snakemake Workflow

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Snakemake workflow overview for diffusion-based parcellation of the piriform cortex:

rule hcp7T_Diffusion_bedpostx_gpu : Generate white matter fibre orientations within each brain voxel for each subject's HCP 7T diffusion data via FSL's bedpostx_gpu .
rule hcp_mmk : Generate target segmentation 3D nifti volumes (bilateral) from HCP32k surf gifti files (180 regions)
rule diffparc_smk : Resample the piriform seed/hcp-mmk180_targets-->HCP7TDiffusionResolution and subsequently perform probabilistic tractography from the piriform seed in each subject's space to hcp-mmk180_targets via FSL's probtrackx2_gpu

3b. Bring connectivity data for each subject's seed voxels into template and performs spectral clustering on the concatenated feature vectors to parcellate into k regions

3c. Create node and edges tables to use as the input to the Gephi software for each of the clustering solutions
rule tractmap Perform probabilistic tractography on each individual cluster to generate tractography maps for each cluster following spectral clustering (useful for visualization of each cluster's individual connectivity)

Inputs in config/config.yml:

participants.tsv : Target subject IDs that have both HCP 7T diffusion and be used in creation of a subject specific template from: ants_build_template_smk , (see input 5)
template_prob_seg : Probabilistic segmentation as a 3D nifti for individual piriform hemisphere to process
prob_seg_threshold : Value to threshold the probabilistic segmentation at (default is 0.5, e.g. 0.2 would include 80% of the probseg)
template : Name of template to use for filenaming within the pipeline
ANTS transformations from template T1w space to/from each subject's T1w, i.e. from: ants_build_template_smk
- ants_affine_mat : Forward affine from subject-->template
- ants_invwarp_nii : Warp from template-->subject
- ants_warp_nii : Warp from subject-->template
- ants_ref_nii : Study-specific T1w created from: [ants_build_template_smk]
lhrh_targets_independent_condition : When set to "True": 358 regions, analyze connectivity of piriform segmentation within each hemisphere independently (considers each lh and rh region to be separate/different between hemispheres; when set to "false": 179 regions, analyze connectivity of piriform segmentation irrespective of hemispheres (considers each lh and rh region to be a combined single region across hemispheres)
targets_txt_lh-rh_separate_358regions : List of ROIs in atlas segmentations to use as targets for piriform connectivity (358 lh vs. rh)
targets_txt_lh-rh_combined_179regions : List of ROIs in atlas segmentations to use as targets for piriform connectivity (179 lh + rh)
max_k : Maximum number of clusters to create during spectral clustering (will create all cluster solutions <= max_k)
probtrack : Tractography parameters, seed_resolution defines the resolution to resample all other inputs to
probtrack_tractmap : Tractography parameters to generate individual tractmaps for each cluster following spectral clustering (useful for visualization)
in_freesurfer : Path to subject freesurfer dir from HCP_1200 release
dwi_preproc_dir : Path to base directory containing the HCP 7T diffusion data for all subjects

in_dwi_nii
in_dwi_bval
in_dwi_bvec
in_dwi_grad_dev
in_dwi_mask_nii

Required Singularity Containers:

singularity_neuroglia : FSL, ANTS, NiftyReg, gnu-parallel
singularity_freesurfer : Freesurfer
singularity_connectome_workbench : Connectome wb
singularity_cuda : FSL6 with CUDA container (for running gpu versions of bedpostx and probtrackx2)

Usage:

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

Create a new github repository using this workflow as a template.
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yml to configure the workflow execution, and participants.tsv to specify your subjects.

Step 3: Install and Activate Snakemake

Install Snakemake using conda: conda create -c bioconda -c conda-forge -n snakemake snakemake

Activate the conda environment: conda activate snakemake or add this line into your bash startup

For installation details, see the instructions in the Snakemake documentation.

Step 4: Execute workflow

Option A) Execute the workflow locally via: snakemake --use-conda --use-singularity --cores $N using $N cores, or, run it in a cluster environment via snakemake --use-conda --use-singularity --cluster qsub --jobs 100 or snakemake --use-conda --use-singularity --drmaa --jobs 100
Option B) If you are using Compute Canada, you can use the cc-slurm profile, which submits jobs and takes care of requesting the correct resources per job (including GPUs). Once it is set-up with cookiecutter, run: snakemake --profile cc-slurm
Option C) Or, with neuroglia-helpers can get a 1-GPU, 8-core, 32gb node and run locally there. First, get a node with a GPU (default 8-core, 32gb, 3 hour limit): regularInteractive -g . Then, run: snakemake --use-conda --use-singularity --cores 8 --resources gpu=1 mem=32000

Step 4b: Executing workflow on Compute Canada via the cc-slurm profile in iterations

If you have a large number of subjects, we recommend you run the workflow in the following stages:

bedpostx_gpu: via snakemake --profile cc-slurm --until perform_bedpostx_gpu
generate target segs and probtrackx2_gpu: via snakemake --profile cc-slurm --until run_probtrack
piriform diffusion parcellation with spectral_clustering in template space on concatenated subjects and creation of Gephi input nodes/edges tables: via snakemake --profile cc-slurm --until create_gephi_input_nodes_and_edge_tables
probtrackx2_gpu run again on each cluster to generate visualizations of each cluster's connectivity: via snakemake --profile cc-slurm --until vote_tractmap_template

After running each iteration, run snakemake --profile cc-slurm --until <iteration name previously run> -np to ensure that all jobs completed and all outputs were sucessfully created before moving on to the next iteration.

Code Snippets

shell: 'fslmaths {input.lh} -max {input.rh} {output.lh_rh} &> {log}'

SnakeMake From line 18 of rules/diffparc.smk

shell: 'fslmaths {input.lh} -max {input.rh} {output.lh_rh} &> {log}'

SnakeMake From line 30 of rules/diffparc.smk

shell:
    'cp -v {input} {output} &> {log}'

SnakeMake From line 41 of rules/diffparc.smk

shell:
    'fslmaths {input} -thr {params.threshold} -bin {output} &> {log}'

SnakeMake From line 52 of rules/diffparc.smk

shell:
    'fslmaths {input.dwi} -bin {output.mask} &> {log}'

SnakeMake From line 66 of rules/diffparc.smk

shell:
    'fslmaths {input} -thr {params.threshold} -bin {output} &> {log}'

SnakeMake From line 112 of rules/diffparc.smk

shell:
    'fslmaths {input.seed_res_bin} -mul 1000 -max {input.targets_res} -uthr 999 {output.targets_pir_overlap_removed} &> {log}'

SnakeMake From line 156 of rules/diffparc.smk

shell:
    'mkdir -p {output} && parallel  --jobs {threads} fslmaths {input.targets_res} -thr {{1}} -uthr {{1}} -bin {{2}} &> {log} ::: {params.target_nums} :::+ {params.target_seg}'

SnakeMake From line 174 of rules/diffparc.smk

run:
    f = open(output.target_txt,'w')
    for s in params.target_seg:
        f.write(f'{s}\n')
    f.close()

SnakeMake From line 188 of rules/diffparc.smk

shell:
    'mkdir -p {output.probtrack_dir} && singularity exec -e --nv {params.container} probtrackx2 --samples={params.bedpost_merged} --mask={input.mask} --seed={input.seed_res} --targetmasks={input.target_txt} --seedref={input.seed_res} --nsamples={params.nsamples} --dir={output.probtrack_dir} {params.probtrack_opts} -V 1  &> {log}'

SnakeMake Singularity Hub From line 220 of rules/diffparc.smk

shell:
    'mkdir -p {output.probtrack_dir} && singularity exec -e --nv {params.container} probtrackx2_gpu --samples={params.bedpost_merged} --mask={input.mask} --seed={input.seed_res} --targetmasks={input.target_txt} --seedref={input.seed_res} --nsamples={params.nsamples} --dir={output.probtrack_dir} {params.probtrack_opts} -V 2  &> {log}' 

SnakeMake Singularity Hub From line 244 of rules/diffparc.smk

shell:
    'mkdir -p {output} && ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1 parallel  --jobs {threads} antsApplyTransforms -d 3 --interpolation Linear -i {{1}} -o {{2}}  -r {input.ref} -t {input.warp} -t {input.affine} &> {log} :::  {params.in_connmap_3d} :::+ {params.out_connmap_3d}' 

SnakeMake From line 268 of rules/diffparc.smk

script: '../scripts/save_connmap_template_npz.py'

SnakeMake From line 288 of rules/diffparc.smk

script: '../scripts/gather_connmap_group.py'

SnakeMake From line 304 of rules/diffparc.smk

script: '../scripts/spectral_clustering.py'

SnakeMake From line 322 of rules/diffparc.smk

script: '../scripts/save_connmapClusters_template_npz.py'

SnakeMake From line 345 of rules/diffparc.smk

script: '../scripts/gather_connmapClusters_group.py'

SnakeMake From line 365 of rules/diffparc.smk

script: '../scripts/gather_connmapClusters_group.py' 

SnakeMake From line 383 of rules/diffparc.smk

script: '../scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py'

SnakeMake From line 407 of rules/diffparc.smk

script: '../scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py'

SnakeMake From line 427 of rules/diffparc.smk

shell:
	'cp {input.dwi7T_nii} {output.dwi7T_nii_copy} && '
	'cp {input.dwi7T_bval} {output.dwi7T_bval_copy} && '
	'cp {input.dwi7T_bvec} {output.dwi7T_bvec_copy} && '
	'cp {input.dwi7T_grad_dev} {output.dwi7T_grad_dev_copy} && '
	'cp {input.dwi7T_mask_nii} {output.dwi7T_mask_nii_copy}'

SnakeMake From line 31 of rules/hcp7T_Diffusion_bedpostx_gpu.smk

shell:
	'mkdir -p {output.outdir} && singularity exec -e --nv {params.container} bedpostx_gpu {input.indir} -g &> {log}'

SnakeMake Singularity Hub From line 53 of rules/hcp7T_Diffusion_bedpostx_gpu.smk

shell: 'FS_LICENSE={params.license} mris_convert {input} {output} &> {log}'

SnakeMake From line 21 of rules/hcp_mmp.smk

shell: 'FS_LICENSE={params.license} mri_convert {input} {output} &> {log}'

SnakeMake From line 32 of rules/hcp_mmp.smk

shell: 'FS_LICENSE={params.license} mri_info {input.t1} --tkr2scanner > {output.tkr2scanner} 2> {log}'

SnakeMake From line 45 of rules/hcp_mmp.smk

shell: 'wb_command -surface-apply-affine {input.surf} {input.tkr2scanner} {output.surf} &> {log}'

SnakeMake From line 57 of rules/hcp_mmp.smk

shell: 'wb_command -surface-average {output.midthickness} -surf {input.white} -surf {input.pial} &> {log}'

SnakeMake From line 70 of rules/hcp_mmp.smk

shell: 'wb_command -surface-resample {input.surf} {input.current_sphere} {input.new_sphere} {params.method} {output.surf} &> {log}'

SnakeMake From line 88 of rules/hcp_mmp.smk

shell: 
    'wb_command -label-resample {input.label} {input.current_sphere} {input.new_sphere}'
    ' {params.method} {output.label}'
    ' -area-surfs {input.current_surf} {input.new_surf} &> {log}'

SnakeMake From line 108 of rules/hcp_mmp.smk

shell:
    'wb_command -label-to-volume-mapping {input.label} {input.surf} {input.vol_ref} {output.label_vol}'
    ' -ribbon-constrained {input.white_surf} {input.pial_surf}'
    ' -greedy &> {log}'

SnakeMake From line 128 of rules/hcp_mmp.smk

shell:
    'c3d {input.label_vol} -replace 110 0 -o {output.label_vol_removepir110} &> {log}'

SnakeMake C3D From line 145 of rules/hcp_mmp.smk

shell:
    'fslmaths {input.label_vol_removepir110} -thr 111 -sub 1 {output.tempA_thrBelow111_sub1} && '
    'fslmaths {input.label_vol_removepir110} -uthr 109 {output.tempB_thrAbove109} && '
    'fslmaths {output.tempA_thrBelow111_sub1} -max {output.tempB_thrAbove109} {output.relabelled_vol_removepir110} &> {log}'

SnakeMake From line 164 of rules/hcp_mmp.smk

shell:
    'fslmaths {input.relabelled_rh_vol_removepir110} -add 179 -thr 180 {output.relabelled_rh_vol_removepir110_add179} &> {log}'

SnakeMake From line 179 of rules/hcp_mmp.smk

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import nibabel as nib
from functools import reduce
import os


#Set the name of the seed you want to create the Gephi input for
seeds = snakemake.params.seeds_py

#Set the k parcellation you want to create the Gephi input for
#this_k = 3
cluster_range = range(2,snakemake.params.max_k+1)

#Set the path to the location of the directory containing the connmapGroup directories for each seed and each k 
#(i.e. connmapClusters dir from diffparc located in /scratch/nchris5/HCP_template_piriform_segmentations/diffparc/results)
path = snakemake.params.connmapClusters_dir


#Read in the Glasser_2016_Table with the 'SectionsBOLD' column manually included, which indicates which of the 22 overarching regions each of the 180 regions belongs to; 
#and the 'ColourRGB' column manually included, which indicates the RGB for that specific SectionsBOLD
GlasserTable = pd.read_csv(snakemake.params.GlasserTable_22regions_colours, header=0, usecols=range(1,8))
GlasserTable.index = range(1,snakemake.params.num_targets+1)
print('Original GlasserTable : \n', GlasserTable)
#Sort by the section values from 1-->22
GlasserTable_sorted_PrimarySection = GlasserTable.sort_values(by='SectionsBOLD', ascending=True)
#Change the actual index to 1-->179
GlasserTable_sorted_PrimarySection.index = range(1,snakemake.params.num_targets+1)
#Insert a column that describes the sorted dataframe new index from 1-->180
GlasserTable_sorted_PrimarySection.insert(loc=1, column='Sorted_PrimarySection_Index', value=range(1,snakemake.params.num_targets+1))
#Create the list of target ID's with leading zeros so all ID's have 3 digits (i.e. 001, 010, 100), as for gephi to read in proper order
GlasserTable_sorted_PrimarySection_Index_ThreeDigits = [str(indID).zfill(3) for indID in GlasserTable_sorted_PrimarySection.Sorted_PrimarySection_Index]
print('GlasserTable_sorted_PrimarySection : \n', GlasserTable_sorted_PrimarySection)

print('\n ------ Creating Nodes df ------ \n ')
#Includes the Id, Label, color, and SectionsBOLD for each of the 179 or 358 Glasser targets (001-179 or 001-358)
Targets = pd.DataFrame({'Id': GlasserTable_sorted_PrimarySection_Index_ThreeDigits, 'Label': GlasserTable_sorted_PrimarySection.AreaName, 'color': GlasserTable_sorted_PrimarySection.ColourRGB, 'Modularity Class': GlasserTable_sorted_PrimarySection.SectionsBOLD}, index=range(1,snakemake.params.num_targets+1))
print('Targetsdf : \n', Targets)

for seed in seeds:
    for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4
        #Set the RBG colours for each of the clusters within this_k to the same colour that is present in the clustering output in itksnap (i.e. diffparc spectral_clustering output)
        if this_k==2:
            this_k_ColourRGB = ['255,0,0','0,255,0'] #Cluster01=Red, #Cluster02=Green
        elif this_k==3:
            this_k_ColourRGB = ['255,0,0','0,255,0','0,0,255'] #Cluster01=Red, #Cluster02=Green, #Cluster03=Blue
        elif this_k==4:
            this_k_ColourRGB = ['255,0,0','0,255,0','0,0,255', '255,255,0'] #Cluster01=Red, #Cluster02=Green, #Cluster03=Blue #Yellow
        #Create the Id's and labels for the individual clusters  for and get the basename for the individual clusters (e.g. Cluster01, Cluster02, Cluster03), assigning Id values above 179 or 358 (since 179 or 358 targets) e.g. 180=Cluster01, 181=Cluster02, 182=Cluster03
        IdlistClusters = list()
        labellist = list()
        for i,this_label in enumerate(range(1,this_k+1)):
            IdlistClusters.append(snakemake.params.num_targets+1+i)
            labellist.append('Cluster{this_label:02d}'.format(this_label=this_label))
        #Set the Modularity Class for each individual cluster to be distinct (i.e. Cluster01=23, Cluster02=24, Cluster03=25)
        ModularityClassClusters = list(range(23,this_k+23))
        #Create the dataframe for the clusters to add to the list of targets
        individualclusterstoadd = pd.DataFrame({'Id': IdlistClusters, 'Label': labellist, 'color': this_k_ColourRGB, 'Modularity Class': ModularityClassClusters}, index=range(snakemake.params.num_targets,snakemake.params.num_targets+this_k))
        print('\n individualclusterstoadd : \n', individualclusterstoadd)

        #Append the cluster labels dataframe to the bottom of the targets list dataframe to yield the final Nodesdf
        Nodesdf = Targets.append(individualclusterstoadd)
        print('\n Nodesdf : \n ', Nodesdf)




        print('\n ------ Creating Edges df ------ \n ')
        print(print('------ Creating Edges df SourceId and SourceName columns ------ \n '))

        #Multiply the targets df by number of clusters (i.e. first 180 will represent cluster 1, next 180 will represent cluster 2, etc.)
        EdgesSourceId = np.zeros((snakemake.params.num_targets,this_k), dtype=int)
        EdgesSourceNames = list()
        for i,this_label in enumerate(range(1,this_k+1)):
            EdgesSourceId[:,i] = this_label+snakemake.params.num_targets
            tempSourceNames = list()
            for j in range(snakemake.params.num_targets):
                tempSourceNames.append('Cluster{this_label:02d}'.format(this_label=this_label))
            EdgesSourceNames.append(tempSourceNames)
        #Use functools reduce to remove iterative lists into a single list i.e. [179xCluster01], [179xCluster02], [179xCluster03] to [179xCluster01, 179xCluster02, 179xCluster03]
        EdgesSourceNames = reduce(lambda x,y: x+y, EdgesSourceNames)
        #Reshape the edges source col current with shape (179,numClusters) to 179*numClusters with order 'F' (i.e. 179x'181' --> 179x'182' --> 179x'183')
        EdgesSourceId = np.reshape(EdgesSourceId, (snakemake.params.num_targets*this_label), order='F')

        TargetsMultiplied = pd.concat([Targets]*this_k, ignore_index=True)
        print('TargetsMultiplied Shape = ', TargetsMultiplied.shape)
        print('TargetsMultiplied = \n', TargetsMultiplied)

        print('\n ------ Creating Edges df Columns:   Source, SLabel, Target, TLabel, ConnectivityScore, ConnectivityScoreNormalized, Rank_in_Cluster, Weight ------ \n ')

        #Create list of paths to individual clusternpz files for this_k
        connmapClustersDirList = list()
        thiskString = str(this_k)
        #Only select the directories that have the specified seed and also end with k=this_k
        for connmapClustersDir in os.listdir(path):
            if ((connmapClustersDir.find(seed) != -1) and (connmapClustersDir.endswith(thiskString))):
                connmapClustersDirList.append(connmapClustersDir)
                newpath = path + '/' + connmapClustersDirList[0]
                individualClusters_thisk_npzList = list()
                for individualClusters_thisk_npz in os.listdir(newpath):
                    if individualClusters_thisk_npz.endswith('.npz'):
                        fullpath = newpath + '/' + individualClusters_thisk_npz
                        individualClusters_thisk_npzList.append(fullpath)
                print(individualClusters_thisk_npzList)

        #Initiate an empty dataframe that will be appended to include the values from each cluster
        df_ALLCLUSTERS_InsertRank = pd.DataFrame(columns = ['targetnames', 'ConnectivityScore', 'ConnectivityScoreNormalized', 'SortedRank'])

        #Create a list of ranks from 1(lowest connectivity) to 180(highest connectivity)
        for i,thisk_thisCluster_npz in enumerate(individualClusters_thisk_npzList):
            data_thisCluster = np.load(thisk_thisCluster_npz)
            conngroup_thisCluster = data_thisCluster['conn_group']

            connAvg_across_subs_thisCluster = conngroup_thisCluster.mean(axis=0) #Shape of input is (#sub,#voxels,#targets)
            targetAvg_across_voxels_thisCluster = connAvg_across_subs_thisCluster.mean(axis=0) #Shape of input is (#voxels,#targets)
            sum_targetAvg_across_voxels_thisCluster = np.sum(targetAvg_across_voxels_thisCluster) #Sum all the targets to get sum of conn scores
            quotients = [number / sum_targetAvg_across_voxels_thisCluster for number in targetAvg_across_voxels_thisCluster]

            df_thisCluster = pd.DataFrame({'targetnames': GlasserTable.AreaName, 'ConnectivityScore': targetAvg_across_voxels_thisCluster, 'ConnectivityScoreNormalized': quotients}, index=range(1,snakemake.params.num_targets+1))
            #Insert a column the SectionsBold column from the GlasserTable
            df_thisCluster.insert(loc=1, column='SectionsBOLD', value=GlasserTable.SectionsBOLD)
            print('\n df_thisCluster_InsertedSectionsBold k-%s and cluster-%s : \n %s' % (this_k, i+1, df_thisCluster))
            df_thisCluster_sorted_PrimarySection =  df_thisCluster.sort_values(by='SectionsBOLD', ascending=True)
            df_thisCluster_sorted_PrimarySection.index = range(1,snakemake.params.num_targets+1)
            #Insert a column that describes the sorted dataframe new index from 1-->179
            df_thisCluster_sorted_PrimarySection.insert(loc=1, column='Sorted_PrimarySection_Index', value=range(1,snakemake.params.num_targets+1))
            print('\n df_thisCluster_sorted_PrimarySection k-%s and cluster-%s : \n %s' % (this_k, i+1, df_thisCluster_sorted_PrimarySection))

            #Assign weight based on ConnectivityScore
            df_sorted_thisCluster = df_thisCluster_sorted_PrimarySection.sort_values(by='ConnectivityScoreNormalized', ascending=False)
            print('\n df_sorted_thisCluster k-%s and cluster-%s : \n %s' % (this_k, i+1, df_sorted_thisCluster))

            df_sorted_thisCluster_InsertRank = df_sorted_thisCluster
            df_sorted_thisCluster_InsertRank['SortedRank'] = pd.Series(reversed(range(1,snakemake.params.num_targets+1)),index=(df_sorted_thisCluster.index))

            df_thisCluster_sorted_PrimarySection_InsertRank = df_thisCluster_sorted_PrimarySection
            df_thisCluster_sorted_PrimarySection_InsertRank['SortedRank'] = pd.Series(reversed(range(1,snakemake.params.num_targets+1)),index=(df_sorted_thisCluster.index))
            print('\n df_sorted_thisCluster k-%s and cluster-%s with Rank : \n %s' % (this_k, i+1, df_sorted_thisCluster_InsertRank))
            print('\n df_thisCluster k-%s and cluster -%s with Rank : \n %s' % (this_k, i+1, df_thisCluster_sorted_PrimarySection_InsertRank))

            df_ALLCLUSTERS_InsertRank = df_ALLCLUSTERS_InsertRank.append(df_thisCluster_sorted_PrimarySection_InsertRank, ignore_index=True)
            print('CURRENT df_ALLCLUSTERS_InsertRank Shape = ', df_ALLCLUSTERS_InsertRank.shape)
        print('FINAL df_ALLCLUSTERS_InsertRank Shape = ', df_ALLCLUSTERS_InsertRank.shape)
        print('FINAL df_ALLCLUSTERS_InsertRank : \n', df_ALLCLUSTERS_InsertRank)

        #Add in the edge weight column
        #Multiply the SortedRank*ConnectivityScoreNormalized to get the Weight of each edge, round to integer, then add 1 to ensure that lowest edge weight value is > 0
        df_ALLCLUSTERS_InsertRank['Weight'] = df_ALLCLUSTERS_InsertRank['SortedRank'] * df_ALLCLUSTERS_InsertRank['ConnectivityScoreNormalized']
        df_ALLCLUSTERS_InsertRank['Weight'] = df_ALLCLUSTERS_InsertRank['Weight'].apply(lambda x: round(x, 0) + 1)

        Edgesdf = pd.DataFrame({'Source': EdgesSourceId, 'SLabel': EdgesSourceNames, 'Target': TargetsMultiplied.Id, 'TLabel': TargetsMultiplied.Label, 'ConnectivityScore': df_ALLCLUSTERS_InsertRank.ConnectivityScore, 'ConnectivityScoreNormalized': df_ALLCLUSTERS_InsertRank.ConnectivityScoreNormalized, 'Rank_in_Cluster': df_ALLCLUSTERS_InsertRank.SortedRank, 'Weight': df_ALLCLUSTERS_InsertRank.Weight}, index=range(0,this_k*snakemake.params.num_targets))
        print('\n Shape of this Edges is %s and Edgesdf = \n %s  ' % (Edgesdf.shape, Edgesdf))


        print('\n ------ Saving Nodesdf and Edgees df to csv ------ \n ')

        if not os.path.exists('results/gephi_input_NodesandEdges_tables_' + seed):
            os.mkdir('results/gephi_input_NodesandEdges_tables_' + seed)

        #Create savepaths
        savepathNodesdf = 'results/gephi_input_NodesandEdges_tables_' + seed + '/GephiNodes_' + seed + '_k-' + str(this_k) + '_Sorted_Overarching22Categories_WithRGBColours.csv'
        savepathEdgesdf = 'results/gephi_input_NodesandEdges_tables_' + seed + '/GephiEdges_' + seed + '_k-' + str(this_k) + '_Sorted_Overarching22Categories_NormalizedWeight.csv'

        print('savepathNodesdf = ', savepathNodesdf)
        print('savepathEdgesdf = ', savepathEdgesdf)

        #Write to Nodesdf and Edgesdf to CSV files
        Nodesdf.to_csv(savepathNodesdf, index = False, header=True)
        Edgesdf.to_csv(savepathEdgesdf, index = False, header=True)

Python Pandas numpy matplotlib nibabel From line 1 of scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py

import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from scipy import stats

targetnames = pd.read_table('results/diffparc/sub-100610/target_images_from-hcp7Tsubj_rpiriform_overlap_removed.txt', names=['TargetNames'], header=None)
targetnamesList = targetnames.TargetNames.to_list()
targetnamesList = [ line.replace('results/diffparc/sub-100610/lh_rh_targets_from-hcp7Tsubj_rpiriform_space-native_resampled_dwi_resolution_pir-overlap-removed/','') for line in targetnamesList ] #Remove that string from each line
targetnamesList = [ line.replace('_ROI.nii.gz','') for line in targetnamesList ] #Remove this string from each
cluster_range = range(2,snakemake.params.max_k+1)
for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4
    label = 1
    thiskdirslist = list()
    thiskString = str(this_k)
    for j,thiskdir in enumerate(snakemake.input.connmap_Clusters_npz_dir):
        if thiskdir.endswith(thiskString):
            thiskdirslist.append(thiskdir)
    newClusterGroupDir = '{snakeout}_k-{thisk}'.format(snakeout=snakemake.params.connmap_Clusters_group_npz_dir_base,thisk=this_k)
    if not os.path.exists(newClusterGroupDir):
        os.mkdir(newClusterGroupDir)
    fig = plt.figure(figsize=[6.4*this_k,4.8])
    while label <= this_k:
        thislabel_namesList = list()
        labelString = str(label) + '_connmap.npz'
        print('labelString = ',labelString)
        for l,eachthiskdir in enumerate(thiskdirslist):
            thislabel_names = [f for f in os.listdir(path=eachthiskdir) if f.endswith(labelString)]
            thislabel_namesString = ''.join(thislabel_names)
            thislabel_fullpath = eachthiskdir + '/' + thislabel_namesString
            thislabel_namesList.append(thislabel_fullpath)
        thisk_thislabelonesubj = thislabel_namesList[0]
        print('thisk_thislabelonesubj = ',thisk_thislabelonesubj)
        data_thisk_thislabelonesubj = np.load(thisk_thislabelonesubj)
        affine = data_thisk_thislabelonesubj['affine']
        mask = data_thisk_thislabelonesubj['mask']
        conn_shape = data_thisk_thislabelonesubj['conn'].shape 
        nsubjects = len(thislabel_namesList)
        print('nsubjects = ',nsubjects)
        conn_group = np.zeros([nsubjects,conn_shape[0],conn_shape[1]])
        savestring = '{newClusterGroupD}/cluster{k_i:02d}_connmap_group.npz'.format(newClusterGroupD=newClusterGroupDir,k_i=label)
        for sub,npz in enumerate(thislabel_namesList):
            data_thisk_thislabel_eachsub = np.load(npz)
            conn_group[sub,:,:] = data_thisk_thislabel_eachsub['conn']
        np.savez(savestring, conn_group=conn_group,mask=mask,affine=affine)
        connmatrix_for_each_target_avggroup = conn_group.mean(axis=0) #Average across rows (subjects) to get connmap matrix average across all subjects
        print('connmatrix_for_each_target_avggroup shape = ',connmatrix_for_each_target_avggroup.shape)
        avg_conn_for_each_target_avggroup = connmatrix_for_each_target_avggroup.mean(axis=0) #Average across rows (voxels in piriform, indexed) to get connmap matrix average across all subjects
        sum_avg_conn_for_each_target_avggroup = np.sum(avg_conn_for_each_target_avggroup)
        quotients = [number / sum_avg_conn_for_each_target_avggroup for number in avg_conn_for_each_target_avggroup]
        #Create dataframe that includes columns target names and the values of the targets
        df = pd.DataFrame({'targetnames': targetnamesList, 'ConnectivityScore': avg_conn_for_each_target_avggroup, 'ConnectivityScoreNormalized': quotients}, index=range(0,snakemake.params.num_targets))
        print("Targets prior to sorting based on avg conn for k-%s and label %s = %s" % (this_k, label, df))
        dfsorted = df.sort_values(by='ConnectivityScoreNormalized', ascending=False) #Sort the target regions descending
        print("Targets sorted based on avg conn for k-%s and label %s = %s" % (this_k, label, dfsorted))
        #Plot the 10 targets with the highest connectivity to the piriform
        x = np.linspace(1,10,10) #start,stop,number of ticks
        y = np.linspace(0,1,11) #Makes it incerement by 0.1
        ax = fig.add_subplot(1,this_k,label)
        ax.set_title("k-%s Cluster-0%s" % (this_k, label))
        ax.set_xticks(x)
        ax.set_yticks(y)
        ax.set_ylim([0,0.5])
        ax.set_xticklabels(dfsorted.targetnames[0:10], rotation='vertical')
        ax.set_xlabel('Target Name')
        ax.set_ylabel('Average Connectivity to Target')
        ax.bar(x,dfsorted.ConnectivityScoreNormalized[0:10]) #Gives the target regions with the top 10 highest average connectivites from the group averaged piriform seed
        label +=1
    figuresavestring = '{newClusterGroupDir}/k-{thisk}_Individual_Clusters_Top10connectivity_to_targets.png'.format(newClusterGroupDir=newClusterGroupDir,thisk=this_k)
    print('figuresavestring is : ',figuresavestring)
    plt.savefig(figuresavestring, bbox_inches = "tight")

Python Pandas numpy matplotlib scipy From line 1 of scripts/gather_connmapClusters_group.py

import numpy as np

#load first file to get shape
data = np.load(snakemake.input.connmap_npz[0])
affine = data['affine']
mask = data['mask']
conn_shape = data['conn'].shape
nsubjects = len(snakemake.input.connmap_npz)
conn_group = np.zeros([nsubjects,conn_shape[0],conn_shape[1]])

for i,npz in enumerate(snakemake.input.connmap_npz):
    data = np.load(npz)
    conn_group[i,:,:] = data['conn']

#save conn_group, mask and affine
np.savez(snakemake.output.connmap_group_npz, conn_group=conn_group,mask=mask,affine=affine)

Python numpy From line 1 of scripts/gather_connmap_group.py

import nibabel as nib
import numpy as np
import os

#kclusters_nii_list_names = os.listdir(path=snakemake.input.clusters_template_space_dir)
#kclusters_nii_list = list()
#for filename in kclusters_nii_list_names:
#    fullpath = os.path.join(snakemake.input.clusters_template_space_dir, filename)
#    kclusters_nii_list.append(fullpath)

kclusters_nii_list = list()
for a,thiskclustervol in enumerate(snakemake.input.clusters_template_space):
    kclusters_nii_list.append(thiskclustervol)
print('kclusters_nii_list = ',kclusters_nii_list)

cluster_range = range(2,snakemake.params.max_k+1) #Range of the clusters [2, 3, 4]
ntargets = len(snakemake.params.connmap_3d) #Number of targets = 179

for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4
    label = 1 #Lowest label value is 1
    newClusterDir = '{snakeout}_k-{thisk}'.format(snakeout=snakemake.params.connmap_Clusters_npz,thisk=this_k)
    if not os.path.exists(newClusterDir):
        os.mkdir(newClusterDir)
    while label <= this_k: #For each individual label in that cluster file, stop after label hits that current k value
        print("Label val for k-%s prior to creating indices = %s" % (this_k, label))
        clustmask_nib = nib.load(kclusters_nii_list[i]) #Load in cluster 2, 3, 4
        print("This k-%scluster = %s" % (this_k, kclusters_nii_list[i]))
        clustmask_vol = clustmask_nib.get_fdata()
        clustmask_indices = clustmask_vol==label #Indices within the clustmask_vol for label=label (e.g. indices in k-2 colume where label = 1)
        maskedCount = clustmask_vol[clustmask_indices] #Number of voxels in a list that contain this label
        nvoxels = len(maskedCount) #Count of the number of voxels
        print("nvoxels for k-%s and label %s = %s" % (this_k, label, nvoxels))
        maskedVoxelsIndividualCluster = clustmask_vol
        maskedVoxelsIndividualCluster[maskedVoxelsIndividualCluster!=label] = 0 #Any of the voxels that do not contain the label value become 0 (e.g. if this_k = 2, and label = 1, then label = 2 becomes 0)
        conn = np.zeros((nvoxels,ntargets)) #Create conn matrix of size nvoxels in this cluster,ntargets
        savestring = '{newClustDir}/cluster{k_i:02d}_connmap.npz'.format(newClustDir=newClusterDir,k_i=label)
        print("savestring for k-%s and label %s = %s" % (this_k, label, savestring))
        for j,conn_file in enumerate(snakemake.params.connmap_3d):
            targvol = nib.load(conn_file).get_fdata()
            maskedtargvol = targvol[clustmask_indices].T
            conn[:,j] = maskedtargvol
        print("Conn shape for k-%s and label %s =  %s \n" % (this_k, label, conn.shape))
        np.savez(savestring, conn=conn,mask=maskedVoxelsIndividualCluster,affine=clustmask_nib.affine)
        label +=1 #Label becomes 2 and loop while loop continues, then label becomes 3 and the loop exits

Python numpy nibabel From line 1 of scripts/save_connmapClusters_template_npz.py

import nibabel as nib
import numpy as np
mask_nib = nib.load(snakemake.input.mask)
mask_vol = mask_nib.get_fdata()
mask_indices = mask_vol > 0 
masked = mask_vol[mask_indices]

nvoxels = len(masked)
ntargets = len(snakemake.params.connmap_3d)
conn = np.zeros((nvoxels,ntargets))
for i,conn_file in enumerate(snakemake.params.connmap_3d):
    vol = nib.load(conn_file).get_fdata()
    masked =  vol[mask_indices].T
    conn[:,i] = masked
np.savez(snakemake.output.connmap_npz, conn=conn,mask=mask_vol,affine=mask_nib.affine)

Python numpy nibabel From line 1 of scripts/save_connmap_template_npz.py

import sklearn
import numpy as np
import nibabel as nib

# Define a function for saving niftis 
def save_label_nii (labels,mask,affine,out_nifti):
    labels_vol = np.zeros(mask.shape)
    labels_vol[mask > 0] = labels+1 #add a 1 so label 0 is diff from bgnd
    labels_nib = nib.Nifti1Image(labels_vol,affine)
    nib.save(labels_nib,out_nifti)

data = np.load(snakemake.input.connmap_group_npz)
cluster_range = range(2,snakemake.params.max_k+1)
out_nii_list = snakemake.output

conn_group = data['conn_group']
mask = data['mask']
affine = data['affine']

# Concat subjects
conn_group_m = np.moveaxis(conn_group,0,2)
conn_concat = conn_group_m.reshape([conn_group_m.shape[0],conn_group_m.shape[1]*conn_group_m.shape[2]])

# Run spectral clustering and save output nifti
for i,k in enumerate(cluster_range):
    from sklearn.cluster import SpectralClustering
    clustering = SpectralClustering(n_clusters=k, assign_labels="discretize",random_state=0,affinity='cosine').fit(conn_concat)
    print(f'i={i}, k={k},saving {out_nii_list[i]}')
    save_label_nii(clustering.labels_,mask,affine,out_nii_list[i])