Data processing and analytics code for project "Biodiversity and ecosystem services trade-offs"

public public 1yr ago 0 bookmarks

Trade-offs between biodiversity and ecosystem services in Europe

Version: 0.2.0

Introduction

Installation

The project relies both on Python and R scripts to pre- and post-procesing the data as well as running some of the analyses. While the project might run on Windows machines, it has never been tested on one. Your safest bet is running everything on a Ubuntu 14.04 (Trusty) machine. Everything else should run on other Linux distributions as well, but Zonation currently works nicely only Ubuntu 14.04 (though it is possible to compile it on 16.04 as well).

1. Getting this project

You need to first have git installed on the system you want to run this project on. Install git by:

sudo apt-get install git

Next, get everything in this project using git:

git clone https://github.com/VUEG/bdes_to.git

2. Installing necessary dependencies using conda

This project uses conda package, dependency and environment management system for setting everything up. It comes with simple installer script bootsrap_conda.sh that will install the right version of conda command line program for you. To run it, type:

# Get into the project directory
cd bdes_to
# Install conda
./bootsrap_conda.sh

After installation is finished and assuming everythign went well, you create a new enviroment with all the necessary (Python and R) pacakages installed by doing the following (still in the same directory):

conda env create -n bdes_to

This command will create a new virtual environment called bdes_to and install all the dependencies listed in environment.yml in it.

3. Installing Zonation

Zonation is not yet available through conda, so you will have to install it separately and system-wide. Follow the installation instructions found here .

Running the processing and analysis workflow

This project uses snakemake workflow management system to run the different stages in sequence. It has already been installed in step 1. You can list all the individual snakemake rules (i.e. stages) by typing in:

snakemake -l

or run the whole sequence (not recommended as this will also run all the Zonation analyses) by:

snakemake

Project organization

├── LICENSE
├── environment.yml <- Conda environment file
├── README.md <- The top-level README for developers using this project.
├── data
 ├── external <- Data from third party sources.
 ├── interim <- Intermediate data that has been transformed.
 ├── processed <- The final, canonical data sets for modeling.

├── docs <- A default Sphinx project; see sphinx-doc.org for details

├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
 the creator's initials, and a short `-` delimited description, e.g.
 `1.0-jqp-initial-data-exploration`.

├── references <- Data dictionaries, manuals, and all other explanatory materials.

├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
 └── figures <- Generated graphics and figures to be used in reporting

├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
 generated with `pip freeze > requirements.txt`

├── src <- Source code for use in this project.
 ├── __init__.py <- Makes src a Python module
 
 ├── data <- Scripts to download or generate data
  └── make_dataset.py
 
 ├── features <- Scripts to turn raw data into features for modeling
  └── build_features.py
 
 ├── models <- Scripts to train models and then use trained models to make
   predictions
  ├── predict_model.py
  └── train_model.py
 
 └── visualization <- Scripts to create exploratory and results oriented visualizations
 └── visualize.py

└── tests <- tests scripts

License

See LICENSE file .

Contributors

Code Snippets

118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
run:
    llogger = utils.get_local_logger("pprocess_nuts0", log[0])
    # Read in the bounds as used in harmonize_data rule
    bleft = PROJECT_EXTENT["left"] + OFFSET[0]
    bbottom = PROJECT_EXTENT["bottom"] + OFFSET[1]
    bright = PROJECT_EXTENT["right"] + OFFSET[2]
    btop = PROJECT_EXTENT["top"] + OFFSET[3]
    bounds = "{0} {1} {2} {3}".format(bleft, bbottom, bright, btop)
    # Reproject to EPSG:3035 from EPSG:4258
    input_shp = utils.pick_from_list(input.shp, ".shp")
    reprojected_shp = utils.pick_from_list(output.reprojected, ".shp")
    cmd_str = 'ogr2ogr {0} -t_srs "EPSG:{1}" {2}'.format(reprojected_shp, PROJECT_CRS, input_shp)
    shell(cmd_str)
    llogger.info("Reprojected NUTS level 0 data from EPSG:4258 to EPSG:3035")
    llogger.debug(cmd_str)

    # NUTS 0 data has "NUTS_ID" field, but it's character. Convert to
    # integer for raserization
    enhanced_shp = utils.pick_from_list(output.enhanced, ".shp")
    with fiona.drivers():
        with fiona.open(reprojected_shp) as source:
            meta = source.meta
            meta['schema']['geometry'] = 'Polygon'
            # Insert new fields
            meta['schema']['properties']['ID'] = 'int'
            meta['schema']['properties']['mask'] = 'int'

            ID = 1
            with fiona.open(enhanced_shp, 'w', **meta) as sink:
                # Loop over features
                for f in source:
                    f['properties']['ID'] = ID
                    ID += 1
                    # Create a mask ID (same for each feature) that can
                    # later be used in creating a mask.
                    f['properties']['mask'] = 1
                    # Write the record out.
                    sink.write(f)

    # Clip shapefile using ogr2ogr, syntax:
    # ogr2ogr output.shp input.shp -clipsrc <left> <bottom> <right> <top>
    processed_shp = utils.pick_from_list(output.processed, ".shp")
    # Do 2 things at the same time:
    #  1. Select a subset of counties (defined by params.countries)
    #  2. Clip output to an extent (given by bounds)
    # Build the -where clause for ogr2ogr
    where_clause = "NUTS_ID IN ({})".format(", ".join(["'" + item + "'" for item in PROJECT_COUNTRIES]))
    shell('ogr2ogr -where "{where_clause}" {processed_shp} {enhanced_shp} -clipsrc {bounds}')
    llogger.debug("Clipped NUTS data to analysis bounds: {}".format(bounds))
    llogger.debug("Selected only a subset of eurostat countries:")
    llogger.debug(" " + ", ".join(PROJECT_COUNTRIES))
    llogger.debug("Resulting file: {}".format(processed_shp))
SnakeMake From line 118 of master/Snakefile
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
run:
    llogger = utils.get_local_logger("pprocess_nuts2", log[0])
    # Read in the bounds as used in harmonize_data rule
    bleft = PROJECT_EXTENT["left"] + OFFSET[0]
    bbottom = PROJECT_EXTENT["bottom"] + OFFSET[1]
    bright = PROJECT_EXTENT["right"] + OFFSET[2]
    btop = PROJECT_EXTENT["top"] + OFFSET[3]
    bounds = "{0} {1} {2} {3}".format(bleft, bbottom, bright, btop)
    # Reproject to EPSG:3035 from EPSG:4258 and clip
    input_shp = utils.pick_from_list(input.shp, ".shp")
    reprojected_shp = utils.pick_from_list(output.reprojected, ".shp")
    cmd_str = 'ogr2ogr {0} -t_srs "EPSG:{1}" {2}'.format(reprojected_shp, PROJECT_CRS, input_shp)
    shell(cmd_str)
    llogger.debug("Reprojected NUTS level 2 data from EPSG:4258 to EPSG:3035")
    llogger.debug(cmd_str)

    # Clip shapefile using ogr2ogr, syntax:
    # ogr2ogr output.shp input.shp -clipsrc <left> <bottom> <right> <top>
    clipped_shp = utils.pick_from_list(output.clipped, ".shp")
    # Clip output to an extent (given by bounds)
    shell('ogr2ogr {clipped_shp} {reprojected_shp} -clipsrc {bounds}')

    # The Pre-processing steps need to be done:
    #  1. Tease apart country code from field NUTS_ID
    #  2. Create a running ID field that can be used as value in the
    #     rasterized version
    processed_shp = utils.pick_from_list(output.processed, ".shp")
    with fiona.drivers():
        with fiona.open(clipped_shp) as source:
            meta = source.meta
            meta['schema']['geometry'] = 'Polygon'
            # Insert new fields
            meta['schema']['properties']['ID'] = 'int'
            meta['schema']['properties']['country'] = 'str'

            ID = 1
            with fiona.open(processed_shp, 'w', **meta) as sink:
                # Loop over features
                for f in source:
                    # Check the country code part of NUTS_ID (2 first
                    # charatcters). NOTE: we're effectively doing filtering
                    # here.
                    country_code = f['properties']['NUTS_ID'][0:2]
                    if country_code in PROJECT_COUNTRIES:
                        f['properties']['ID'] = ID
                        ID += 1
                        f['properties']['country'] = country_code
                        # Write the record out.
                        sink.write(f)

    llogger.debug("Clipped NUTS level 2 data to analysis bounds: {}".format(bounds))
    llogger.debug("Selected only a subset of eurostat countries:")
    llogger.debug(" " + ", ".join(PROJECT_COUNTRIES))
    llogger.debug("Resulting file: {}".format(processed_shp))
SnakeMake From line 182 of master/Snakefile
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
run:
    llogger = utils.get_local_logger("rasterize_nuts0", log[0])
    input_shp = utils.pick_from_list(input, ".shp")
    layer_shp = os.path.basename(input_shp).replace(".shp", "")

    # Construct extent
    bounds = "{0} {1} {2} {3}".format(PROJECT_EXTENT["left"],
                                      PROJECT_EXTENT["bottom"],
                                      PROJECT_EXTENT["right"],
                                      PROJECT_EXTENT["top"])
    # 1) Rasterize land mask
    cmd_str = "gdal_rasterize -l {} ".format(layer_shp) + \
              "-a ID -tr 1000 1000 -te {} ".format(bounds) + \
              "-ot Int16 -a_nodata -32768 -co COMPRESS=DEFLATE " + \
              "{0} {1}".format(input_shp, output.land_mask)
    llogger.debug(cmd_str)
    for line in utils.process_stdout(shell(cmd_str, read=True)):
        llogger.debug(line)
    # 2) Rasterize common data mask
    cmd_str = "gdal_rasterize -l {} ".format(layer_shp) + \
              "-a mask -tr 1000 1000 -te {} ".format(bounds) + \
              "-ot Int8 -a_nodata -128 -co COMPRESS=DEFLATE " + \
              "{0} {1}".format(input_shp, output.data_mask)
    llogger.debug(cmd_str)
    for line in utils.process_stdout(shell(cmd_str, read=True)):
        llogger.debug(line)
SnakeMake From line 248 of master/Snakefile
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
run:
    llogger = utils.get_local_logger("rasterize_nuts2", log[0])
    input_shp = utils.pick_from_list(input, ".shp")
    layer_shp = os.path.basename(input_shp).replace(".shp", "")

    # Construct extent
    bounds = "{0} {1} {2} {3}".format(PROJECT_EXTENT["left"],
                                      PROJECT_EXTENT["bottom"],
                                      PROJECT_EXTENT["right"],
                                      PROJECT_EXTENT["top"])
    # Rasterize
    cmd_str = "gdal_rasterize -l {} ".format(layer_shp) + \
              "-a ID -tr 1000 1000 -te {} ".format(bounds) + \
              "-ot Int16 -a_nodata -32768 -co COMPRESS=DEFLATE " + \
              "{0} {1}".format(input_shp, output[0])
    llogger.debug(cmd_str)
    for line in utils.process_stdout(shell(cmd_str, read=True)):
        llogger.debug(line)
SnakeMake From line 284 of master/Snakefile
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
run:
    llogger = utils.get_local_logger("clip_udr_data", log[0])
    nsteps = len(input.external)
    for i, s_raster in enumerate(input.external):
        # Target raster
        clipped_raster = s_raster.replace("external", "processed/features")
        prefix = utils.get_iteration_prefix(i+1, nsteps)

        llogger.info("{0} Clipping dataset {1}".format(prefix, s_raster))
        llogger.debug("{0} Target dataset {1}".format(prefix, clipped_raster))
        # Clip data. NOTE: UDR species rasters do not have a SRS defined,
        # but they are in EPSG:3035
        cmd_str = 'gdalwarp -s_srs EPSG:3035 -t_srs EPSG:3035 -cutline {0} {1} {2} -co COMPRESS=DEFLATE'.format(input.clip_shp, s_raster, clipped_raster)
        for line in utils.process_stdout(shell(cmd_str, read=True), prefix=prefix):
            llogger.debug(line)
SnakeMake From line 313 of master/Snakefile
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
run:
    llogger = utils.get_local_logger("create_clc_mask", log[0])

    minx = PROJECT_EXTENT["left"]
    miny = PROJECT_EXTENT["bottom"]
    maxx = PROJECT_EXTENT["right"]
    maxy = PROJECT_EXTENT["top"]

    # STEP 1: Aggregate CLC data
    cmd_warp_str = "gdalwarp -te {} {} {} {} ".format(minx, miny, maxx, maxy) + \
                   "-multi -tr {} {} -r mode ".format(PROJECT_RES, PROJECT_RES) + \
                   "-s_srs 'EPSG:{}' -t_srs 'EPSG:{}' ".format(PROJECT_CRS, PROJECT_CRS) + \
                   "{} {}".format(input.clc_2012, output.agg_clc)
    llogger.info(" [1/3] Aggregating CLC data")
    for line in utils.process_stdout(shell(cmd_warp_str, read=True)):
        llogger.debug(line)

    # STEP 2: Remove aquatic CLC classes:
    # 39 Intertidal flats
    # 40 Water courses
    # 41 Water bodies
    # 42 Coastal lagoons
    # 43 Estuaries
    # 44 Sea and ocean
    #
    # Also binarize the selection
    clc_tmp_1 = os.path.basename(output.clc_mask).split(".")[0] + "_tmp1.tif"
    clc_tmp_1 = os.path.join(os.path.dirname(output.agg_clc), clc_tmp_1)
    cmd_calc_str = 'rio calc "(where (< (* (>= (read 1) 39) 255) 255) 1 0) " ' + \
                   '{} {} --co "compress=DEFLATE"'.format(output.agg_clc, clc_tmp_1)
    llogger.info(" [2/3] Subsetting and binarizing CLC data")
    for line in utils.process_stdout(shell(cmd_calc_str, read=True)):
        llogger.debug(line)

    # STEP 3: Overlay with NUTS0 mask
    cmd_ovr_str = 'rio calc "(where (== (+ (take a 1) (take b 1)) 2) 1 0) " ' + \
                  '--name "a={}" --name "b={}" '.format(clc_tmp_1, input.nuts0) + \
                  '{} --co "compress=DEFLATE"'.format(output.clc_mask)
    logger.debug(cmd_ovr_str)
    llogger.info(" [3/3] Overlaying with NUTS0 mask")
    for line in utils.process_stdout(shell(cmd_ovr_str, read=True)):
        llogger.debug(line)

    # Remove temporary files
    os.remove(clc_tmp_1)
SnakeMake From line 340 of master/Snakefile
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
run:
    llogger = utils.get_local_logger("harmonize_data", log[0])
    nsteps = len(input.external)
    for i, s_raster in enumerate(input.external):

        # The assumption is that zips don't need anything else but
        # extraction
        if s_raster.endswith(".zip"):
            target_dir = s_raster.replace("external", "processed/features_flow_zones")
            target_dir = os.path.dirname(target_dir)
            # Get rid of the last path component to avoid repetition
            target_dir = os.path.sep.join(target_dir.split(os.path.sep)[:-1])
            if not os.path.exists(target_dir):
                os.mkdir(target_dir)
            prefix = utils.get_iteration_prefix(i+1, nsteps)
            llogger.info("{0} Unzipping dataset {1}".format(prefix, s_raster))
            shell("unzip -o {} -d {} >& {}".format(s_raster, target_dir, log[0]))
        else:
            ## WARP
            # Target raster
            warped_raster = s_raster.replace("external", "interim/warped")
            # No need to process the snap raster, just copy it
            prefix = utils.get_iteration_prefix(i+1, nsteps)
            if s_raster == input.like_raster:
                llogger.info("{0} Copying dataset {1}".format(prefix, s_raster))
                llogger.debug("{0} Target dataset {1}".format(prefix, warped_raster))
                ret = shell("cp {s_raster} {warped_raster}", read=True)
            else:
                llogger.info("{0} Warping dataset {1}".format(prefix, s_raster))
                llogger.debug("{0} Target dataset {1}".format(prefix, warped_raster))
SnakeMake From line 401 of master/Snakefile
475
476
477
478
479
480
481
482
483
484
485
run:
    llogger = utils.get_local_logger("process_flowzones", log[0])

    for in_raster, outdir in zip(input.src, output):
        cmd_str = "src/01_pre_processing/cutter.py {} {} {} -f {} -c {}".format(in_raster,
                                                                                input.flow_zone_units,
                                                                                outdir,"ID",
                                                                                    threads)
        print(cmd_str)
        for line in utils.process_stdout(shell(cmd_str, read=True)):
            llogger.debug(line)
SnakeMake From line 475 of master/Snakefile
499
500
501
502
503
504
505
run:
    llogger = utils.get_local_logger("calculate_flowzone_weights", log[0])

    llogger.info(" [1/2] Calculating zonal stats for {}".format(input.cli_agro))
    stats = zonal_stats(input.flow_zone_units, input.cli_agro,
                        stats=['sum'])
    print(stats)
SnakeMake From line 499 of master/Snakefile
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
run:
    llogger = utils.get_local_logger("match_coverages", log[0])

    coverage.expand_value_coverage(input.car, input.expand_raster,
                                   output.car, logger=llogger)
    coverage.expand_value_coverage(input.esc, input.expand_raster,
                                   output.esc, logger=llogger)
    coverage.expand_value_coverage(input.esf, input.expand_raster,
                                   output.esf, logger=llogger)
    coverage.expand_value_coverage(input.bio_car, input.expand_raster,
                                   output.bio_car, logger=llogger)
    coverage.expand_value_coverage(input.bio_esc, input.expand_raster,
                                   output.bio_esc, logger=llogger)
    coverage.expand_value_coverage(input.bio_esf, input.expand_raster,
                                   output.bio_esf, logger=llogger)
SnakeMake From line 527 of master/Snakefile
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
run:
    llogger = utils.get_local_logger("generate_range_data", log[0],
                                     debug=True)

    range_stats = pd.DataFrame({"feature": [], "count": [], "sum": [],
                                "q25_ol": [], "mean_ol": [],
                                "median_ol": [], "q75_ol": []})

    features = ES_DST_DATASETS + BD_DST_DATASETS
    # Remove potential zip files
    features = [feature for feature in features if not feature.endswith(".zip")]
    for i, feature in enumerate(features):
        prefix = utils.get_iteration_prefix(i+1, len(features))

        llogger.info("{} Processing {}".format(prefix, feature))

        feature_stats = spatutils.get_range_size(feature, logger = llogger)
        range_stats = pd.concat([range_stats, feature_stats])

    llogger.info(" Saving results to {}".format(output.csv))
    range_stats.to_csv(output.csv, columns=["feature", "count", "sum",
                                            "q25_ol", "mean_ol",
                                            "median_ol", "q75_ol"],
                                            index=False)
SnakeMake From line 574 of master/Snakefile
ShowHide 11 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/VUEG/bdes_to
Name: bdes_to
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
Keywords:
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...