African Demography Review

public public 1yr ago 0 bookmarks

African Demography Review

This repository provides a Snakemake workflow used for the data analysis presented in Pfennig et al., Evolutionary genetics and admixture in African populations, Genome Biology and Evolution, 2023; https://doi.org/10.1093/gbe/evad054 .

Datasets

We combine and harmonize genomic datasets from"

  • 20 European, Middle Eastern, and African populations from the Simons Genome Diversity Project (SGDP)1

  • 36 African populations from Crawford et al. (2017)2 and Scheinfeldt et al. (2019)3 that were genotyped on the Illumina 1M array (These datasets were merged by Matt Hansen (unpublished work) and kindly provided to us)

  • 16 Sudanese populations from Hollfelder et al. (2017)4

  • 8 North African populations from Arauna et al. (2017)5

  • 14 Sahelian populations from Fortes-Lima et al. (2022)6

  • 10 Southern African populations from Schlebusch et al. (2012)7

Populations with more than 2 samples are downsampled to 2 indivdiuals to avoid effects from variable sample sizes. If a population was included in more than one dataset, only samples from one dataset were included for this population. We merge the dataset on biallelic common SNPs. Conflicting SNPs are strand flipped once and SNPs that remain in conflict are excluded. A minor allele frequency threshold of 0.05 and genotyping rate per SNP threshold of 0.1 and LD pruning (max r2=0.05) were applied (these parameters can be changed in the config/config.yaml . Ancestry proportions were estyimated using ADMIXTURE8 and different values of K. For visualization, ADMIXTURE results are aligned across all Ks using CLUMPAK9 prior to plotting (Figure 2). Ancestry proportions of the best K were then interpolated in space using the ordinary Kriging method and effective migration rates were estimated using FEEMS10 (Figure 3).

Software dependencies

  • bcftools v1.13 (path can be updated in config/config.yaml )

  • All other software is stored in provided conda environment, which are automatically created and used by Snakemake.

Data dependencies

  • The dataset from Crawford et al.2 (dbGaP: phs001396.v1.p1 ) and Scheinfeldt et al.3 (dbGaP: phs001780.v1.p1 ) need to be manually downloaded and merged and put into the data directory. The plink prefix of the files can be updated in the config/config.yaml . We provide coordinates of populations used in our analysis in data/coordinates_crawford_et_al_and_scheinfeldt_et_al.txt . Importantly, we only used samples that were genotyped on the Illumina 1M array.

  • The dataset from Fortes-Lima et al.5 needs to be downloaded separately from https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-12243?query=E-MTAB-12243 (the dataset was kindly made available by the authors prior to publication...). Population coordinates were extracted from Table S2 in the corresponding publication and are stored in data/coordinates_fortes_lima_et_al.txt . The plink prefix can be updated in the config/config.yaml .

  • Geographic coordinates of the 18 Sudanese populations from Hollfelder et al. (2017)3 were kindly made available by the original authors. We provide the coordinates in data/coordinates_hollfelder_et_al.txt .

  • If city information was available for populations in Arauna et al. (2017)5, the coordinates of the city were used. For the populations from Libya and Egypt, the center of the respective country was taken (see data/coordinates_arauna_et_al.txt ). The data file was kindly made available by the origina authors.

  • All other required data is automatically retrieved

Comments

The Kriging plot of admixture results currently only works for K<=4, as we observed the lowest CV error at K=4. To get the best CV error across different Ks run grep CV {results_path}/{prefix}.*.Q .

As of now, this Snakemake is workflow is not written in a general way that allows to download/merge data from any dataset, but only those mentioned above.

Executing the Snakemake workflow

After ensuring that soft and data requirements are satisfied, the Snakemake workflow can be executed using the following command:

snakemake -c<threads> --use-conda

Citation

Pfennig et al., Evolutionary genetics and admixture in African populations, Genome Biology and Evolution, 2023; https://doi.org/10.1093/gbe/evad054

References

  1. Mallick S, Li H, Lipson M, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016). https://doi.org/10.1038/nature18964

  2. Crawford, N. G., Kelly, D. E., Hansen, M. E. B., et al. (2017) Loci associated with skin pigmentation identified in African populations. Science 358, eaan8433. 10.1126/science.aan8433

  3. Scheinfeldt LB et al. 2019. Genomic evidence for shared common ancestry of East African hunting-gathering populations and insights into local adaptation. Proc. Natl. Acad. Sci. 116:4166–4175. doi: https://www.pnas.org/doi/full/10.1073/pnas.1817678116 .

  4. Hollfelder N, Schlebusch CM, Günther T, et al. (2017) Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. PLoS Genet 13(8): e1006976. https://doi.org/10.1371/journal.pgen.1006976

  5. Arauna LR, Mendoza-Revilla J, Mas-Sandoval A, et al. (2017) Recent Historical Migrations Have Shaped the Gene Pool of Arabs and Berbers in North Africa, Molecular Biology and Evolution, Volume 34, Issue 2, https://doi.org/10.1093/molbev/msw218

  6. Fortes-Lima C, Tříska P, Čížková M, et al. (2022) Demographic and Selection Histories of Populations Across the Sahel/Savannah Belt, Molecular Biology and Evolution, Volume 39, Issue 10, msac209, https://doi.org/10.1093/molbev/msac209

  7. Schlebusch CM et al. (2012). Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science. 338:374–379. https://doi.org/10.1126/science.1227721 .

  8. Alexander DH, Novembre J, and Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research, 19:1655–1664, 2009. http://www.genome.org/cgi/doi/10.1101/gr.094052.109

  9. Kopelman, N, Mayzel, J, Jakobsson, M, Rosenberg, N, Mayrose, I (2015) CLUMPAK: a program for identifying clustering modes and packaging population structure inferences across K. Molecular Ecology Resources 15(5): 1179-1191, https://doi.org/10.1111/1755-0998.12387

  10. Marcus J, Wooseok H, Barber RF, Novembre J (2021) Fast and flexible estimation of effective migration surfaces eLife 10:e61927. https://doi.org/10.7554/eLife.61927

Code Snippets

 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import sys 
import argparse
import pandas as pd
import numpy as np
import os

### This for purely explorative purposes. The labelling is roughly done on geographic coordinates, but


def visualize_pcs(data, outfile, pc_a=1, pc_b=2):
    """
    Visualize PCA done with plink
    :param data: df, DataFrame with eigenvectors and geographic coordinates
    :param outfile: str, where to save Figure to
    :param pc_a: int, first PC to plot
    :param pc_b: int, second PC to plot
    """
    np.random.seed(42)
    colors = np.where(
        (data.Longitude >= -16.7) & (data.Longitude <= 17) & (data.Latitude >= 2.5) & (data.Latitude <= 20), 'green', 0)
    colors = np.where((data.Longitude > 17) & (data.Latitude >= -5.0) & (data.Latitude <= 20), 'blue', colors)
    colors = np.where((data.Latitude <= -5), 'orange', colors)
    colors = np.where((data.Latitude > 15) & (data.Longitude >= 35), 'pink', colors)
    colors = np.where((data.Latitude > 27) & (data.Longitude < 15), 'red', colors)
    color_dict = {'Middle East': 'pink',
                  "West Africa": "green",
                  "East Africa": "blue",
                  "South Africa": "orange",
                  "Europe": "red"}
    data['colors'] = colors
    fig, ax = plt.subplots()
    ax.scatter(data.loc[:, f'PC{pc_a}'], data.loc[:, f'PC{pc_b}'].values, c=data.colors, alpha=0.5)
    ax.set_xlabel(f'PC{pc_a}')
    ax.set_ylabel(f'PC{pc_b}')
    legend_elements = [Line2D([0], [0], color=col, marker='o', ls='', label=pop) for pop, col in color_dict.items()]
    ax.legend(handles=legend_elements, loc='upper center', ncol=4, bbox_to_anchor=(0.5, -.14))
    fig.savefig(outfile, bbox_inches='tight')
    plt.close()


def main(argv):
    parser = argparse.ArgumentParser(description='Visualize PCA performed with plink2')
    parser.add_argument('--eigenvec', help='.eigenvec file that was generated by plink. ')
    parser.add_argument('--coords', help='file containing geographic coordinates of sample locations.')
    parser.add_argument('--plot_dir', help='Directory where to save plots')
    args = parser.parse_args()
    if args.plot_dir.endswith('/'):
        plot_dir = args.plot_dir
    else:
        plot_dir = args.plot_dir + '/'
    if not os.path.isdir(plot_dir):
        os.makedirs(plot_dir)
    eigenvectors = pd.read_csv(args.eigenvec, sep='\t')
    eigenvectors.set_index('IID', inplace=True)
    coordinates = pd.read_csv(args.coords, sep='\t', header=None, names=['FID', "IID", "Latitude", "Longitude"])
    # join
    data = eigenvectors.join(coordinates.set_index('IID'))
    outfile = plot_dir + args.eigenvec.split('/')[-1].split('.eigenvec')[0] + "_pca.png"
    visualize_pcs(data, outfile)


if __name__ == '__main__':
    main(sys.argv[1:])
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
import numpy as np
import pkg_resources
from sklearn.impute import SimpleImputer
from pandas_plink import read_plink
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
from feems.utils import prepare_graph_inputs
from feems import SpatialGraph, Viz
from feems.cross_validation import run_cv
import sys
import argparse
import pandas as pd

# change matplotlib fonts
plt.rcParams["font.family"] = "Arial"
plt.rcParams["font.sans-serif"] = "Arial"


def create_spatial_graph(data_path, prefix, coords_file):
    """
    Create SpatialGraph object
    :param data_path: str, data directory
    :param prefix: str, prefix for plink files in data directory
    :param coords_file: str, path to file with sampling coordinates
    :return: SpatialGraph
    """
    data_path_feems = pkg_resources.resource_filename("feems", "data/")
    # read the genotype data and mean impute missing data
    (bim, fam, G) = read_plink("{}/{}".format(data_path, prefix))
    imp = SimpleImputer(missing_values=np.nan, strategy="mean")
    genotypes = imp.fit_transform((np.array(G)).T)

    # setup graph
    coords = pd.read_csv(coords_file, sep='\t', header=None,
                         names=['fid_c', 'iid_c', 'latitude', 'longitude'])
    coords = fam.set_index('iid').join(coords.set_index('iid_c')).loc[:, ["longitude", "latitude"]].values
    grid_path = "{}/grid_100.shp".format(data_path_feems)  # path to discrete global grid

    # graph input files
    _, edges, grid, _ = prepare_graph_inputs(coord=coords,
                                             ggrid=grid_path,
                                             translated=False,
                                             buffer=0,)

    # # construct spatial graph object
    sp_graph = SpatialGraph(genotypes, coords, grid, edges, scale_snps=True)
    return sp_graph


def cross_validate(sp_graph):
    """
    Perform cross validation to find optimal lambda
    :param sp_graph: SpatialGraph
    :return: float, lambda
    """
    # reverse the order of lambdas and alphas for warmstart
    lamb_grid = np.geomspace(1e-6, 1e2, 20)[::-1]

    # run cross-validation
    cv_err = run_cv(sp_graph, lamb_grid, n_folds=sp_graph.n_observed_nodes, factr=1e10)

    # average over folds
    mean_cv_err = np.mean(cv_err, axis=0)

    # argmin of cv error
    lamb_cv = float(lamb_grid[np.argmin(mean_cv_err)])
    return lamb_cv


def fit_final_graph(sp_graph, lamb_cv, prefix, output_path):
    """
    Fit final SpatialGraph using optimal lambda inferred from cross-validation
    :param sp_graph: SpatialGraph
    :param lamb_cv: float, lambda
    :param output_path: str, directory where to save output figure named feems_plot.pdf
    """
    sp_graph.fit(lamb_cv)
    projection = ccrs.PlateCarree()
    # Plot the FEEMS result
    fig = plt.figure(figsize=(9, 16), dpi=600)
    ax = fig.add_subplot(1, 1, 1, projection=projection)
    ax.set_extent([-25, 60, -40, 50], crs=ccrs.PlateCarree())
    v = Viz(ax, sp_graph, projection=projection, edge_width=.5,
            edge_alpha=1, edge_zorder=100, sample_pt_size=20,
            obs_node_size=7.5, sample_pt_color="black",
            cbar_font_size=5, cbar_ticklabelsize=4)
    v.draw_map()
    v.draw_edges(use_weights=True)
    v.draw_obs_nodes(use_ids=False)
    v.draw_edge_colorbar()
    fig.savefig(output_path + prefix + '_feems_plot.pdf', bbox_inches='tight', dpi=600)


def main(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--data_path', help='Data directory. default=./data/', default='./data/')
    parser.add_argument('-p', '--prefix', help='Prefix to plink files')
    parser.add_argument('-c', '--coords', help='Path to file coordinates of samples. '
                                               'default=data/population_geo_coords.tab',
                        default='data/population_geo_coords.tab')
    parser.add_argument('-o', '--output_path', help='Output directory. Will save Figure called feems_plot.pdf '
                                                    'in this direcotry. default=./results/', default='./results/')
    args = parser.parse_args()
    data_path = args.data_path
    prefix = args.prefix
    coords = args.coords
    output_path = args.output_path
    if not data_path.endswith('/'):
        data_path += '/'
    if not output_path.endswith('/'):
        output_path += '/'
    sp_graph = create_spatial_graph(data_path, prefix, coords)
    lamb_cv = cross_validate(sp_graph)
    fit_final_graph(sp_graph, lamb_cv, prefix, output_path)


if __name__ == '__main__':
    main(sys.argv[1:])
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import glob
import sys
import argparse
import seaborn as sns
import numpy.ma as ma
from pykrige.ok import OrdinaryKriging
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shpreader
import shapely.geometry as sgeom
from shapely.ops import unary_union
from shapely.prepared import prep
from itertools import permutations


def is_land(x, y, land):
    """
    Function to determine if a given point is on land or not
    :param x: float, longitude
    :param y: float, latitude
    :param land:
    :return: boolean
    """
    return land.contains(sgeom.Point(x, y))


def build_mask(x, y, land):
    """
    Create a mask of points that are on land and should be plotted
    :param x: np.array, longitude values
    :param y: np.array, latitude values
    :param land:
    :return: np.array, boolean
    """
    # mesh
    xv, yv = np.meshgrid(x, y, indexing='xy')
    # initialize mask
    mask = np.zeros(xv.shape, dtype=bool)
    for i in range(x.shape[0]):
        for j in range(y.shape[0]):
            # check if point is land
            mask[j, i] = is_land(xv[j, i], yv[j, i], land)
    return mask


def sort_individuals(df, K):
    """
    Sort individuals by modal ancestry components and it's intensity
    :param df: pd.DataFrame, data frame containing ancestry porportions
    :param K: int, K used to run admixture
    :return: pd.DataFrame, sorted df
    """
    # get mean ancestry per component for each population
    mean_ancestry_pops = df.groupby('population').mean()
    # identify modal component for each population, i.e., the component with the highest ancestry proportion
    modal_clusters_pops = mean_ancestry_pops.idxmax(axis=1).sort_values()
    individual_orders = []
    # iterate over all k
    for i in range(K):
        # get all populations for which the current component is model
        pops = modal_clusters_pops[modal_clusters_pops == i].index.values.tolist()
        # sort the population by they mean ancestry proportions of the modal component
        pops_sorted = mean_ancestry_pops.loc[pops, i].sort_values(ascending=False).index.values
        inds = []
        # sort the individuals within a population by ancestry proportion
        for pop in pops_sorted:
            inds.extend(df.loc[df.population == pop, i].sort_values(ascending=False).index.values.tolist())
        individual_orders.extend(inds)

    df = df.loc[individual_orders, :]
    return df


def get_best_k_and_plot_best_k(qfiles, Kvals, fam, log_dir, colors, prefix):
    """
    Find best K across different ADMIXTURE runs using different K and plot admixture plot of best K separately
    :param qfiles: list, list of paths to aligned .Q files
    :param Kvals: list-like, K values that were used
    :param fam: df, pd.DataFrame of plink .fam file
    :param log_dir: str, path to directory with log files of ADMIXTURE runs
    :param colors: list, colors used for plotting
    :param prefix: str, prefix of output figure "_admixture_plot_best_K.pdf" will be appended
    :return: array-like, array-like; order of individuals, cross-validation error
    """
    min_k = min(Kvals)
    # get CV error of ADMIXTURE runs
    cv_error = np.zeros(len(Kvals))
    logfiles = glob.glob(log_dir + '*.log')
    # read CV error
    for log in logfiles:
        k = int(log.split('.')[-2])
        with open(log, 'r') as fp:
            for line in fp:
                if line.startswith("CV error"):
                    cv_error[k - min_k] = float(line.split(':')[1])
                    break
            fp.close()
    # get best k
    best_k = Kvals[np.argmin(cv_error)]
    df = pd.read_csv(qfiles[np.argmin(cv_error)], sep=' ', header=None, index_col=None)
    df = df.iloc[:, 6:]
    df.columns = [i for i in range(best_k)]
    df['population'] = fam.fid.values
    # sort individuals
    df_sorted = sort_individuals(df, best_k)
    # get order of idnviduals
    plot_best_k(df_sorted, best_k, min(cv_error), colors, prefix)
    return cv_error


def sort_individuals_specified_K(qfile, K, fam):
    """
    Sort individuals based on estimated ancestry proportions given a K
    :param qfile: str, path to aligned Q file
    :param K: int, K value
    :param fam: df, pd.DataFrame of plink .fam file
    :return: array-like,  order of individuals

    """
    df = pd.read_csv(qfile, sep=' ', header=None, index_col=None)
    df = df.iloc[:, 6:]
    df.columns = [i for i in range(K)]
    df['population'] = fam.fid.values
    # sort individuals
    df_sorted = sort_individuals(df, K)
    # get order of idnviduals
    individual_order = df_sorted.index.values
    return individual_order


def plot_krigin(qfile, K, path_to_coords, vmin, fam, colors, prefix):
    """
    Interpolate admixture proportion in space using Kriging method
    :param qfile: str, path to qfile corresponding to K
    :param K: int, K used for admixture run
    :param path_to_coords:
    :param vmin: float, minimum admixture proportion required for Kriging method
    :param fam: df, pd.DataFrame of plink .fam file
    :param colors: list, colors used for plotting
    :param prefix: str, prefix for figure
    """
    # The parameters are tailored to our specific case and may not give meaningful results in other cases
    if K > 4:
        raise AttributeError("Only K<= 4 is supported")
    land_shp_fname = shpreader.natural_earth(resolution='50m',
                                             category='physical', name='land')

    land_geom = unary_union(list(shpreader.Reader(land_shp_fname).geometries()))
    # create land object
    land = prep(land_geom)

    # read data
    coords = pd.read_csv(path_to_coords, sep='\t', header=None, names=['fid', 'lat', 'long'], usecols=[0, 2, 3])
    coords.drop_duplicates('fid', inplace=True)
    df = pd.read_csv(qfile, sep=' ', header=None, index_col=None)
    df = df.iloc[:, 6:]
    df.columns = [i for i in range(K)]
    df['population'] = fam.fid.values
    df = df.join(coords.set_index('fid'), on='population')

    # longitude and latidude coords of interest
    long_coords = np.arange(-25, 60, 0.15, dtype=float)
    lat_coords = np.arange(-40, 50, 0.15, dtype=float)
    # color maps
    cmaps = [
        sns.color_palette(f"blend:white,{colors[0]}", as_cmap=True),
        sns.color_palette(f"blend:white,{colors[1]}", as_cmap=True),
        sns.color_palette(f"blend:white,{colors[2]}", as_cmap=True),
        sns.color_palette(f"blend:white,{colors[3]}", as_cmap=True),
    ]

    projection = ccrs.PlateCarree()

    fig = plt.figure(figsize=(9, 16))
    ax = fig.add_subplot(1, 1, 1, projection=projection)
    ax.set_extent([-25, 60, -40, 50], crs=ccrs.PlateCarree())

    # run kriging for each K
    perms = list(set(permutations([i for i in range(K - 1)])))
    perms = [perm + (K-1,) for perm in perms]
    for perm in perms:
        for i in perm:
            cmap = cmaps[i]
            proportions = df.loc[:, i].values
            ok = OrdinaryKriging(df.long.values,
                                 df.lat.values,
                                 proportions, variogram_model='linear',
                                 coordinates_type='geographic', nlags=2)
            z, ss = ok.execute('grid', long_coords, lat_coords)
            # get mask
            mask = build_mask(long_coords, lat_coords, land)
            # normalize data
            data = (z.data - z.data.min()) / (z.data.max() - z.data.min())
            # update mask
            mask[data < vmin] = False
            masked_data = ma.masked_array(data, ~mask)
            # plot
            ax.imshow(np.flip(masked_data, 0), extent=[-25, 60, -40, 50], cmap=cmap,
                      interpolation='none', vmax=1, vmin=vmin, alpha=1 / len(perms))
    # formatting
    ax.add_feature(cfeature.COASTLINE, color='black')
    ax.add_feature(cfeature.BORDERS, linestyle=':', color='black')
    # plot populations
    ax.scatter(coords.long, coords.lat, color='black', s=8)
    ax.axis('off')
    fig.savefig(f"{prefix}_admixture_kriging_K{K}.pdf", bbox_inches='tight', dpi=600)


def plot_best_k(df, K, cv, colors, prefix):
    """
    Create horizontal bar plot of ADMIXTURE results
    :param df: pd.DataFrame, sorted DataFrame with ancestry proportions
    :param K: int, K used for admixture run
    :param colors: list, list of colors to use
    :param cv: float, CV error of ADMIXTURE run
    :param prefix: str, prefix for figure
    """
    fig, ax = plt.subplots(figsize=(2, 8))
    populations = df.population.values
    y_coords = [0]
    y_ticks = []
    y_labels = []
    prev_pop = populations[0]
    pop_coord_start = 0
    cumulative_padding = 0
    padding = 0.15
    for i, pop in enumerate(populations[1:], 1):
        if prev_pop != pop:
            # get ytick positions and labels
            y_ticks.append((pop_coord_start + i + cumulative_padding - 1) / 2)
            y_labels.append(prev_pop)
            prev_pop = pop
            # add white space between populations
            cumulative_padding += padding
            pop_coord_start = i + cumulative_padding
        # determine y coords --> add white space between populations
        y_coords.append(i + cumulative_padding)
    y_ticks.append((pop_coord_start + i + cumulative_padding) / 2)
    y_labels.append(prev_pop)
    prev_k = []
    # do plotting
    for i in range(K):
        if i > 0:
            ax.barh(y_coords, df.loc[:, i], left=df.loc[:, prev_k].sum(axis=1), height=1, color=colors[i])
        else:
            ax.barh(y_coords, df.loc[:, i], height=1, color=colors[i])
        prev_k.append(i)
    # formatting
    ax.set_yticks(y_ticks)
    ax.set_yticklabels([x.replace('Sabue', 'Chabu').replace('Xun', '!Xun').replace('_ERR_', "_Errachidia_")\
                        .replace('Juhoansi', "Ju|'Hoan").replace('SEBantu', 'BantuSAfrica')\
                        .replace('SWBantu', 'Herero').replace('Haihom', 'Hai||om').replace("_TIZ_", "_Tiznit_")\
                        .replace('_Sen_', '_Sened_').replace('_Chen_', '_Chenini_').replace('_', " ")\
                        .replace("BER", "Berber")
                        for x in y_labels], fontsize=5)
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.invert_yaxis()
    ax.set_xlabel(f"K={K}\nCV={round(cv, 4)}", fontsize=8)
    ax.set_ylim([len(populations) + cumulative_padding - 0.5, -0.5])
    ax.set_xlim([0, 1])
    fig.savefig(f"{prefix}_admixture_plot_best_K.pdf", bbox_inches='tight', dpi=600)


def plot_all_k(qfiles, Kvals, cv_error, fam, individual_order, colors, prefix):
    """
    Plot admixture proportions across all K
    :param qfiles: list, list of paths to aligned .Q files
    :param Kvals: list-like, K values that were used
    :param cv_error: array-like, cross-validation errors of individual runs
    :param fam: pd.DataFrame, df of plink .fam file
    :param individual_order: list, order of individudals in which to plot them to guarantee consistency
    :param colors: list, colors used for plotting
    :param prefix: str, prefix of output figure "_admixture_plots.pdf" will be appended
    """
    fig, ax = plt.subplots(1, len(qfiles), figsize=(8, 11))
    plt.subplots_adjust(wspace=0.15)
    min_k = min(Kvals)
    for qfile, K, cv in zip(qfiles, Kvals, cv_error):
        df = pd.read_csv(qfile, sep=' ', header=None, index_col=None)
        df = df.iloc[:, 6:]
        df.columns = [i for i in range(K)]
        df['population'] = fam.fid.values
        df_sorted = df.loc[individual_order]
        plot_admixture_proportions(df_sorted, K, ax[K - min_k], cv, colors, min_k)
    fig.savefig(prefix + "_admixture_plots.pdf", bbox_inches='tight', dpi=600)
    plt.close()


def plot_admixture_proportions(df, K, ax, cv, colors, min_k):
    """
    Helper to plot admixture proportions of for specific value of K
    :param df: pd.DataFrame, sorted DataFrame with ancestry proportions
    :param K: int, K used for admixture run
    :param ax: ax object to plot on
    :param cv: float, CV error of ADMIXTURE run
    :param colors: list, list of colors to use
    :param min_k: int, minimum k that was used in any run
    :return: ax
    """
    populations = df.population.values
    y_coords = [0]
    y_ticks = []
    y_labels = []
    prev_pop = populations[0]
    pop_coord_start = 0
    cumulative_padding = 0
    padding = 0.15
    for i, pop in enumerate(populations[1:], 1):
        if prev_pop != pop:
            # get ytick positions and labels
            y_ticks.append((pop_coord_start + i + cumulative_padding - 1) / 2)
            y_labels.append(prev_pop)
            prev_pop = pop
            # add white space between populations
            cumulative_padding += padding
            pop_coord_start = i + cumulative_padding
        # determine y coords --> add white space between populations
        y_coords.append(i + cumulative_padding)
    y_ticks.append((pop_coord_start + i + cumulative_padding) / 2)
    y_labels.append(prev_pop)
    prev_k = []
    # do plotting
    for i in range(K):
        if i > 0:
            ax.barh(y_coords, df.loc[:, i], left=df.loc[:, prev_k].sum(axis=1), height=1, color=colors[i])
        else:
            ax.barh(y_coords, df.loc[:, i], height=1, color=colors[i])
        prev_k.append(i)
    # formatting
    if K > min_k:
        ax.set_yticks([])
    else:
        ax.set_yticks(y_ticks)
        ax.set_yticklabels([x.replace('Sabue', 'Chabu').replace('Xun', '!Xun').replace('_ERR_', "_Errachidia_")\
                           .replace('Juhoansi', "Ju|'Hoan").replace('SEBantu', 'BantuSAfrica')\
                           .replace('SWBantu', 'Herero').replace('Haihom', 'Hai||om').replace("_TIZ_", "_Tiznit_")\
                           .replace('_Sen_', '_Sened_').replace('_Chen_', '_Chenini_').replace('_', " ")\
                           .replace("BER", "Berber")
                            for x in y_labels], fontsize=5)
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.invert_yaxis()
    ax.set_xlabel(f"K={K}\nCV={round(cv, 4)}", fontsize=5)
    ax.set_ylim([len(populations) + cumulative_padding - 0.5, -0.5])
    ax.set_xlim([0, 1])
    return ax


def main(argv):
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--input_dir',
                        help='Input directory of "converted" files, i.e., .Q files aligned with CLUMPAK')
    parser.add_argument('-l', '--logfile_dir',
                        help='Directory containing log files from admixture runs to extract CV error.')
    parser.add_argument('-f', '--fam', help='Plink fam file to extract individual and family IDs')
    parser.add_argument('-c', '--coords', help='File with individual geographic coordinates')
    parser.add_argument('--vmin', type=float,
                        help='Only admixture proportions greater vmin are considered for Kriging method. default=0.5',
                        default=0.5)
    parser.add_argument('-k', '--K_kriging', type=int, help='K to use for plotting Kriging')
    parser.add_argument('-o', '--output_prefix', help='Output prefix')
    args = parser.parse_args()
    input_dir = args.input_dir
    log_dir = args.logfile_dir
    if not input_dir.endswith('/'):
        input_dir += '/'
    if not log_dir.endswith('/'):
        log_dir += '/'
    # get input q files
    aligned_q_files = glob.glob(input_dir + "*.converted")
    # read fam file
    fam = pd.read_csv(args.fam, sep='\t', names=['fid', 'iid'], usecols=[0, 1])
    # get k values
    Kvals = [int(f.split('.')[-3]) for f in aligned_q_files]
    # sort based on K
    aligned_q_files = np.array(aligned_q_files)[np.argsort(Kvals)]
    Kvals = np.sort(Kvals)
    max_k = max(Kvals)

    # initialize colors
    colors = ["purple", "orange", "blue", "green"]
    # pad colors
    additional_colors = []
    while max_k > len(colors) + len(additional_colors):
        additional_colors.append((np.random.random(), np.random.random(), np.random.random()))
        additional_colors = list(set(additional_colors))
    colors.extend(additional_colors)

    cv_error = get_best_k_and_plot_best_k(aligned_q_files, Kvals, fam, log_dir, colors, args.output_prefix)
    individual_order = sort_individuals_specified_K(aligned_q_files[np.where(Kvals == args.K_kriging)[0][0]],
                                                    args.K_kriging, fam)
    plot_all_k(aligned_q_files, Kvals, cv_error, fam, individual_order, colors, args.output_prefix)
    plot_krigin(aligned_q_files[np.where(Kvals == args.K_kriging)[0][0]], args.K_kriging, args.coords, args.vmin, fam,
                colors, args.output_prefix)


if __name__ == '__main__':
    main(sys.argv[1:])
53
54
55
56
shell:
    "wget -q {params.url}; unzip {params.zip}; rm {params.zip}; mv {output} {output}_tmp; "
    "unzip -d {output}_tmp {output}_tmp/26_03_2015_CLUMPAK.zip; mv {output}_tmp/26_03_2015_CLUMPAK/CLUMPAK .;"
    "rm -r {output}_tmp; chmod +x {output}/distruct/*; chmod +x {output}/CLUMPP/*; chmod +x {output}/mcl/*"
64
65
shell:
    "wget -q -P {params.path} {params.url}"
SnakeMake From line 64 of main/Snakefile
74
75
76
shell:
    "wget -q -O- {params.url} | awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print $3, $5, $6, $7, $8, $12, $13, $2}}' | "
    "grep -E \"{params.pops_of_interest}\" > {output}"
SnakeMake From line 74 of main/Snakefile
85
86
shell:
    "mkdir -p  {params.dir}; tar xvf {input} -C {params.dir}"
SnakeMake From line 85 of main/Snakefile
103
104
shell:
    "{params.bcftools} merge -0 -r {params.chrom} -Oz -o {output} --threads {threads} {input}"
SnakeMake From line 103 of main/Snakefile
116
117
118
shell:
    "plink2 --vcf {input} --snps-only --max-alleles 2 --make-bed --set-all-var-ids @:# "
    "--out {params.base} --threads {threads}"
126
127
128
shell:
    "cut -f2 {input.fam} | tr \"\n\" \"|\" | xargs -I {{}} grep -Ew {{}} {input.metadata} | "
    "awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print 0, $NF}}' > {output};"
SnakeMake From line 126 of main/Snakefile
143
144
shell:
    "plink2 --bfile {params.input_base} --keep {input.samples} --make-bed --out {params.output_base} --threads {threads}"
152
153
154
shell:
    "cut -f2 {input.old} | tr \"\n\" \"|\" | xargs -I {{}} grep -E {{}} {input.metadata} | "
    "awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print $3, $2}}' | paste {input.old} - > {output}"
SnakeMake From line 152 of main/Snakefile
170
171
172
shell:
    'plink2 --bfile {params.input_base} --update-ids {input.ids} --make-bed --out {params.output_base} '
    '--threads {threads}'
181
182
183
shell:
    "cut -f1 {input.fam} | sort | uniq | xargs -I {{}} bash -c \"grep {{}} {input.fam} | cut -f1,2 | "
    "shuf -n{params.n_samples}\" bash > {output}"
SnakeMake From line 181 of main/Snakefile
197
198
shell:
    "plink2 --bfile {params.input_base} --keep {input[0]} --make-bed --out {params.output_base} --threads {threads}"
206
207
shell:
    "cut -f1,2 {input.coords} > {output.to_keep}"
SnakeMake From line 206 of main/Snakefile
221
222
shell:
    "plink2 --bfile {params.input_base} --keep {input.keep} --make-bed --out {params.output_base} --threads {threads}"
230
231
232
233
shell:
    "cut -f2 {input.fam} | xargs -I {{}} grep {{}} {input.meta} | "
    "awk -F \"\\t\" 'BEGIN{{OFS=\"\\t\"}}{{print $3, $2}}' | sed -e 's/\///g' | "
    "paste <(cut -f1,2 {input.fam} | sed 's/\\s/\\t/g') - > {output}"
SnakeMake From line 230 of main/Snakefile
247
248
249
shell:
    "plink2 --bfile {params.input_base} --update-ids {input.ids} --make-bed --threads {threads} "
    "--out {params.output_base}"
259
260
261
262
shell:
    "cat {input.samples} | sed -e 's/\///g' -e 's/|/|\\\/' | "
    "cut -f3 | sort | uniq | xargs -I {{}} bash -c \"grep -w \\\"{{}}\\\" <( cat {input.samples} | sed -e 's/\///g') | "
    "shuf -n {params.n_samples} | cut -f3,4 >> {output}\" sh"
SnakeMake From line 259 of main/Snakefile
276
277
278
shell:
    "plink2 --bfile {params.input_base} --keep {input.samples} --make-bed --out {params.output_base} "
    "--threads {threads}"
286
287
shell:
    "wget -q -P {params.data} {params.url}"
SnakeMake From line 286 of main/Snakefile
296
297
shell:
    "tar xvzf {input} -C {params.data_path}"
SnakeMake From line 296 of main/Snakefile
307
308
309
shell:
    "cut -d ' ' -f1 {input.fam} | sort | uniq | xargs -I {{}} bash -c \"grep {{}} {input.fam} | cut -d ' ' -f1,2 | "
    "shuf -n {params.n_samples}\" bash | grep -v -E \"{params.exclude}\" > {output}"
SnakeMake From line 307 of main/Snakefile
323
324
325
shell:
    "plink2 --bfile {params.input_base} --keep {input.to_keep} --make-bed --set-all-var-ids @:# "
    "--out {params.output_base} --threads {threads}"
332
333
shell:
    "cut -f2 {input} | sort | uniq -d > {output}"
SnakeMake From line 332 of main/Snakefile
347
348
349
shell:
    "plink2 --bfile {params.input_base} --exclude {input.ids} --make-bed --out {params.outputbase} "
    "--threads {threads}"
361
362
363
364
365
shell:
    "wget -q -P {params.data_path} {params.url_fam}; mv {params.fam_file} {output.fam}; "
    "wget -q -P {params.data_path} {params.url_tped}; "
    "tar -xzvf {params.tped_file}.tar.gz; mv {params.tped_file} {output.tped}; "
    "rm {params.tped_file}.tar.gz; "
SnakeMake From line 361 of main/Snakefile
377
378
shell:
    "plink --tfile {params.base} --recode --make-bed --out {params.base} --threads {threads}"
388
389
390
shell:
    "cut -d ' ' -f1 {input.fam} | sort | uniq | xargs -I {{}} bash -c \"grep {{}} {input.fam} | cut -d ' ' -f1,2 | "
    "shuf -n {params.n_samples}\" bash | grep -v -E \"{params.exclude}\" > {output}"
SnakeMake From line 388 of main/Snakefile
404
405
406
shell:
    "plink2 --bfile {params.input_base} --keep {input.to_keep} --make-bed --set-all-var-ids @:# "
    "--snps-only --max-alleles 2 --out {params.output_base} --threads {threads}"
419
420
421
422
shell:
    "wget -q {params.url_bed}; mv {params.name_bed} {params.data_path}arauna_et_al_2017.bed; "
    "wget -q {params.url_bim}; mv {params.name_bim} {params.data_path}arauna_et_al_2017.bim; "
    "wget -q {params.url_fam}; mv {params.name_fam} {params.data_path}arauna_et_al_2017.fam;"
SnakeMake From line 419 of main/Snakefile
433
434
435
shell:
    "cut -d ' ' -f1 {input.coords} | tail -n+2 | xargs -I {{}} bash -c \"grep -w {{}} {input.fam} | "
    "shuf -n{params.n_samples} | cut -d ' ' -f1,2\" > {output}"
SnakeMake From line 433 of main/Snakefile
449
450
451
shell:
    "plink2 --bfile {params.input_base} --keep {input.to_keep} --make-bed --set-all-var-ids @:# "
    "--out {params.output_base} --threads {threads}"
478
479
shell:
    "cut -f2 {input} | sort > {output}"
SnakeMake From line 478 of main/Snakefile
493
494
shell:
    "comm -12 {input} > {output}"
SnakeMake From line 493 of main/Snakefile
509
510
shell:
    "plink2 --bfile {params.input_base} --extract {input.snps} --make-bed --out {params.output_base}"
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
shell:
    "if [[ -f {params.output_base}_trial1-merge.missnp ]]; " # second trial --> flip SNPs once
    "then "
        "plink --bfile {params.base_b} --flip {params.output_base}_trial1-merge.missnp --make-bed --out "
        "{params.base_b}_tmp --threads {threads}; " # flip once
        "rm {params.output_base}_trial1-merge.missnp; "
        "plink --bfile {params.base_a} --bmerge {params.base_b}_tmp --make-bed --geno {params.geno} "
        "--maf {params.maf} --out {params.output_base}_trial2 --threads {threads}; "
        "mv {params.output_base}_trial2.bed {params.output_base}.bed; "
        "mv {params.output_base}_trial2.bim {params.output_base}.bim; "
        "mv {params.output_base}_trial2.fam {params.output_base}.fam; "
    "elif [[ -f {params.output_base}_trial2-merge.missnp ]]; " # last trial --> excluding SNPS that caused problems
    "then "
        "plink --bfile {params.base_a} --exclude {params.output_base}_trial2-merge.missnp --make-bed "
        "--out {params.base_a}_tmp --threads {threads}; "
        "plink --bfile {params.base_b}_tmp --exclude {params.output_base}_trial2-merge.missnp --make-bed "
        "--out {params.base_b}_tmp1 --threads {threads}; "
        "plink --bfile {params.base_a}_tmp --bmerge {params.base_b}_tmp1 --make-bed --geno {params.geno} "
        "--maf {params.maf} --out {params.output_base} --threads {threads}; "
        "rm {params.base_a}_tmp.*; "
        "rm {params.base_b}_tmp.*; "
        "rm {params.base_b}_tmp1.*; "
        "rm {params.output_base}_trial2-merge.missnp; "
    "else " # first pass
        "plink --bfile {params.base_a} --bmerge {params.base_b} --make-bed --geno {params.geno} "
        "--maf {params.maf} --out {params.output_base}_trial1 --threads {threads}; "
        "mv {params.output_base}_trial1.bed {params.output_base}.bed; "
        "mv {params.output_base}_trial1.bim {params.output_base}.bim; "
        "mv {params.output_base}_trial1.fam {params.output_base}.fam; "
    "fi"
575
576
577
578
579
shell:
    "awk -F '\\t' '{{if ($5 == \"A\" && $6 != \"T\" || $5 != \"A\") print $0}}' {input} | "
    "awk -F '\\t' '{{if ($5 == \"T\" && $6 != \"A\" || $5 != \"T\") print $0}}' | "
    "awk -F '\\t' '{{if ($5 == \"C\" && $6 != \"G\" || $5 != \"C\") print $0}}' | "
    "awk -F '\\t' '{{if ($5 == \"G\" && $6 != \"C\" || $5 != \"G\") print $2}}' | sort > {output}"
SnakeMake From line 575 of main/Snakefile
593
594
595
shell:
    "plink2 --bfile {params.input_base} --extract {input[0]} --make-bed --out {params.output_base} "
    "--threads {threads}"
602
603
shell:
    "awk  -F '\\t' 'BEGIN{{OFS=\"\\t\"}}{{if ($5 == \".\" || $6 == \".\") print $2}}' {input} > {output}"
SnakeMake From line 602 of main/Snakefile
611
612
shell:
    "grep -v -w -f {input.exclude} {input.bim} | cut -f2 | sort > {output}"
SnakeMake From line 611 of main/Snakefile
620
621
shell:
    "comm -12 {input} > {output}"
SnakeMake From line 620 of main/Snakefile
668
669
shell:
    'cut -f2 {input} | sort > {output}'
SnakeMake From line 668 of main/Snakefile
676
677
shell:
    'cut -f2 {input} | sort > {output}'
SnakeMake From line 676 of main/Snakefile
685
686
shell:
    "comm -12 {input} > {output}"
SnakeMake From line 685 of main/Snakefile
731
732
shell:
    "cut -f2 {input} | sort > {output}"
SnakeMake From line 731 of main/Snakefile
739
740
shell:
    "cut -f2 {input} | sort > {output}"
SnakeMake From line 739 of main/Snakefile
748
749
shell:
    "comm -12 {input} > {output}"
SnakeMake From line 748 of main/Snakefile
799
800
shell:
    "cut -f2 {input} | sort > {output}"
SnakeMake From line 799 of main/Snakefile
807
808
shell:
    "cut -f2 {input} | sort > {output}"
SnakeMake From line 807 of main/Snakefile
816
817
shell:
    "comm -12 {input} > {output}"
SnakeMake From line 816 of main/Snakefile
874
875
876
shell:
    "plink2 --bfile {params.input_base} --indep-pairwise 50 1 {params.max_r2} --out {params.input_base} "
    "--threads {threads}"
891
892
893
shell:
    "plink2 --bfile {params.input_base} --extract {input.extract} --make-bed --out {params.output_base} "
    "--threads {threads}"
906
907
shell:
    "plink2 --bfile {params.base} --pca 20 --out {params.base} --threads {threads}"
917
918
shell:
    "scripts/plot_pca.py --eigenvec {input.eigenvec} --plot_dir {params.output_dir} --coords {input.coords}"
SnakeMake From line 917 of main/Snakefile
937
938
939
940
941
942
943
944
945
946
947
948
949
950
shell:
    "cut -f2 {input.sgdp_fam} | xargs -I {{}} bash -c \"grep {{}} {input.meta_sgdp} | cut -f6,7\" | "
    "paste <(cut -f1,2 {input.sgdp_fam}) - > {output}; "
    "cut -f2 {input.crawford_and_scheinfeldt_fam} | "
    "xargs -I {{}} bash -c \"grep {{}} {input.meta_crawford_and_scheinfeldt}\" | cut -f4,5 | "
    "paste <(cut -f1,2 {input.crawford_and_scheinfeldt_fam}) - >> {output}; "
    "cut -f1 {input.hollfelder_fam} | xargs -I {{}} bash -c \"grep {{}} {input.meta_hollfelder}\" | cut -f6,7 | "
    "paste <(cut -f1,2 {input.hollfelder_fam}) - >> {output}; "
    "cut -f1 {input.arauna_fam} | xargs -I {{}} bash -c \"grep -w {{}} {input.meta_arauna}\" | cut -d ' ' -f2,3 | "
    "paste <(cut -f1,2 {input.arauna_fam}) - | sed -e 's/\\s/\\t/g' >> {output}; "
    "cut -f1 {input.fortes_lima_fam} | xargs -I {{}} bash -c \"grep -w {{}} {input.meta_fortes_lima}\" | "
    "cut -d ' ' -f2,3 | paste <(cut -f1,2 {input.fortes_lima_fam}) - | sed -e 's/\\s/\\t/g' >> {output}; "
    "cut -f1 {input.schlebusch_fam} | xargs -I {{}} bash -c \"grep -w {{}} {input.meta_schlebusch}\" | "
    "cut -d ' ' -f2,3 | paste <(cut -f1,2 {input.schlebusch_fam}) - | sed -e 's/\\s/\\t/g' >> {output}; "
SnakeMake From line 937 of main/Snakefile
965
966
967
968
shell:
    "admixture --cv {input} {wildcards.K} -j{threads} > {params.prefix}.{wildcards.K}.log; "
    "mkdir -p {params.results}; "
    "mv {params.prefix}.{wildcards.K}.* {params.results}"
SnakeMake From line 965 of main/Snakefile
979
980
shell:
    "cd {params.results_path}; zip {params.zip} {params.files}"
SnakeMake From line 979 of main/Snakefile
987
988
shell:
    "cut -f1 {input} > {output}"
SnakeMake From line 987 of main/Snakefile
 999
1000
1001
1002
shell:
    "cd {input.clumpak_dir}; cpanm List::Permutor; "
    "perl distructForManyKs.pl --id 1 --dir ../{output} --file ../{input.qfiles} "
    "--inputtype admixture --indtopop ../{input.pops}"
SnakeMake From line 999 of main/Snakefile
1019
1020
1021
shell:
    "scripts/visualize_admixture.py -i {input.clumpak}/aligned.files/ -f {input.fam} -l {params.results_path} "
    "-c {input.coords} -k {params.K_kriging} -o {params.output_prefix}"
SnakeMake From line 1019 of main/Snakefile
1036
1037
shell:
    "scripts/run_feems.py -d {params.data_path} -p {params.prefix} -c {input.pop_coords} -o {params.output_path}"
SnakeMake From line 1036 of main/Snakefile
ShowHide 59 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/LachanceLab/AfricanPopulationStructure
Name: africanpopulationstructure
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...