Codes to reproduce the paper "How unprecedented was the February 2021 Texas cold snap?" by James Doss-Gollin, David J. Farnham, Upmanu Lall, and Vijay Modi

public 1yr ago Version: 1.0 0 bookmarks

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Welcome to the code repository for the paper " How unprecedented was the February 2021 Texas cold snap? " by James Doss-Gollin (Rice), David J. Farnham (Carnegie Institute for Science), Upmanu Lall (Columbia), and Vijay Modi (Columbia).

Note: Some edits to this repository have been made since the paper was published (to add analysis of summer extremes and to run for more years). For the exact version used to generate our published results, a permananent repository is available on Zenodo .

How to cite

This paper is available OPEN ACCESS in the journal Environmental Research Letters. To cite our results and/or methods, please cite it as something like:

@article{doss-gollin_txtreme:2021,
 title = {How Unprecedented Was the {{February}} 2021 {{Texas}} Cold Snap?},
 author = {{Doss-Gollin}, James and Farnham, David J. and Lall, Upmanu and Modi, Vijay},
 year = {2021},
 issn = {1748-9326},
 doi = {10.1088/1748-9326/ac0278},
 journal = {Environmental Research Letters},
}

Several summaries of this work are available. If you'd like a high-level overview of this work, we suggest this Twitter thread by James Doss-Gollin, a summary by Rice University, or a Columbia Earth Institute blog post by all authors. You can also view the poster summarizing this work included in this repository .

For researchers

We strive to make our work accessible to an inquiring public and to the scientific community. Thus we use only publicly available data sets. All code is posted on this repository.

The following sections outline the steps you can take to examine and reproduce our work.

Repository organization

LICENSE describes the terms of the GNU GENERAL PUBLIC LICENSE under which our code is licensed. This is a "free, copyleft license".
README.md is what you are looking at
Snakefile implements our workflow. From the Snakemake documentation : "The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language."
codebase/ provides a Python package that various scripts use. Modules are provided to read the GHCN data ( read_ghcn ), to keep track of the directory path structure ( path ), to parse the input data sources ( data ), to perform common calculations ( calc ), and to add features for interacting with figures ( fig )
data/ contains only raw inputs, stored in data/raw . If you reproduce our codes (see instructions below), approximately 60 GB of data will be downloaded to `data/processed.
doc/ contains latex files for our paper submissions. You shouldn't need to compile these because our paper is available open access, but you're welcome to browse. One useful thing you might do here is to identify which figure (in fig/ ) is used in the text!
environment.yml specifies the conda environment used. This should be sufficient to install all required packages for reproducibility. In case you run into issues, conda.txt specifies the exact versions of all packages used.
fig/ contains final versions of all our figures in both vector graphic ( .pdf ) and image ( .jpg ) formats. All are generated by our source code except for EGOVA.pdf , which is generated from the EGOVA tool produced by Edgar Virgüez. If you want to use these figures, you are responsible for complying with the policies of Environmental Research Letters , but we mainly ask that you cite our paper when you do so.
scripts/ contains the python scripts used to get and parse raw data, process data, and produce outputs. All figures are produced within Jupyter notebooks; they are the last step of the analysis. Please note that many scripts import modules from codebase , the internal package described above.
setup.py makes codebase available for installation

To browse the codes

If you want to browse our code, this section is for you.

You will find four Jupyter notebooks in scripts/ . You can open them and they will render in GitHub. This will show you how we produced all figures in our paper, along with some additional commentary.

If you want to dig deeper, but not to run our codes, then you may want to look at the Python scripts in scripts/ and/or the module in codebase/ .

To run the codes

If you want to reproduce or modify our results, this section is for you

Please note: running this will require approximately 60GB of disk space . All commands here assume standard UNIX terminal; Windows may be subtly different .

First, git clone the repository to your machine.

Next, you will need to install conda (we recommend miniconda) and wget .

Next, you need to create th conda environment:

conda env create --file environment.yml

If this gives you any trouble, you can use the exact version of packages that we did (this worked on an Apple M1 Macbook emulating OSX-64 but your mileage may vary on other systems):

conda create --name txtreme --file conda.txt

Once you have created the environment, then activate it:

conda activate txtreme

You will also need to install our custom module in codebase

pip install -e .

In order to run, you will need to do two things to access required data.

Download the GPWV4 data. See instructions in data/raw/gpwv4/README.md .
Register for a CDSAPI key with the ECMWF. This key is required for you to access this data. If you do not properly install the CDSAPI key, you will not be able to download the ERA-5 reanalysis data.

Now you can run!

snakemake --n <some number>

where <some number> specifies the number of cores to use (if you have no idea what this means, try 3: snakemake --n 3 . We again remind you that running will use nearly 60GB of disk space; a fast internet connection will be helpful.

Issues and comments

If you have issues related to the software, please raise an issue in the Issues tab.
If you have comments, please contact the corresponding author, James Doss-Gollin , directly

Code Snippets

shell:
    "wget -O {output} https://www.eia.gov/electricity/data/eia860m/archive/xls/november_generator2020.xlsx"

SnakeMake From line 42 of main/Snakefile

shell:
    "python {input.script} --year {wildcards.year} -o {output}"

SnakeMake From line 52 of main/Snakefile

shell:
    "python {input.script} --year {wildcards.year} -o {output}"

SnakeMake From line 62 of main/Snakefile

shell:
    "wget -O {output} http://berkeleyearth.lbl.gov/auto/Global/Gridded/Complete_{wildcards.var}_Daily_LatLong1_{wildcards.decade}.nc"

SnakeMake From line 70 of main/Snakefile

shell:
    "wget -O {output} https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt"

SnakeMake From line 78 of main/Snakefile

shell:
    "wget https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd_all.tar.gz -O - | tar -xz -C data/raw"

SnakeMake From line 86 of main/Snakefile

shell:
    "python {input.script} -i {input.infiles} -o {output}"

SnakeMake From line 104 of main/Snakefile

shell:
    "python {input.script} -i {input.infiles} -o {output}"

SnakeMake From line 117 of main/Snakefile

shell:
    "python {input.script} --population {input.population} --temperature {input.temperature} -o {output}"

SnakeMake From line 133 of main/Snakefile

shell:
    "python {input.script} --boundary {input.interconnect} --hdd {input.hdd} -o {output}"

SnakeMake From line 145 of main/Snakefile

shell:
    "python {input.script} -i {input.infiles} -o {output}"

SnakeMake From line 158 of main/Snakefile

shell:
    "python {input.script} -i {input.files} -o {output}"

SnakeMake From line 172 of main/Snakefile

shell:
    "python {input.script} --tmin {input.tmin} --tmax {input.tmax} -o {output}"

SnakeMake From line 184 of main/Snakefile

shell:
    "python {input.script} -i {input.stations} -o {output}"

SnakeMake From line 196 of main/Snakefile

shell:
    "python {input.script} -i {input.stations} -o {output}"

SnakeMake From line 207 of main/Snakefile

ShowHide 15 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: http://iopscience.iop.org/article/10.1088/1748-9326/ac0278

Name: 2021-txtreme

Version: 1.0

Badge:

Insert copied code into your website to add a link to this workflow.

License: GNU General Public License v3.0

Keywords:

Snakemake

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free