BatchConvert: Efficient Image Data Conversion into OME-TIFF or OME-Zarr Formats with Parallel Processing and Automated Storage Transfer

public public 1yr ago Version: main @ 03e32fe 0 bookmarks

BatchConvert

A command line tool for converting image data into either of the standard file formats OME-TIFF or OME-Zarr.

The tool wraps the dedicated file converters bfconvert and bioformats2raw to convert into OME-TIFF or OME-Zarr, respectively. The workflow management system NextFlow is used to perform conversion in parallel for batches of images.

The tool also wraps s3 and Aspera clients (go-mc and aspera-cli, respectively). Therefore, input and output locations can be specified as local or remote storage and file transfer will be performed automatically. The conversion can be run on HPC with Slurm.

Installation & Dependencies

Important note: The package has been so far only tested on Ubuntu 20.04.

The minimal dependency to run the tool is NextFlow, which should be installed and made accessible from the command line.

If conda exists on your system, you can install BatchConvert together with NextFlow using the following script:

git clone https://github.com/Euro-BioImaging/BatchConvert.git && \ source BatchConvert/installation/install_with_nextflow.sh

If you already have NextFlow installed and accessible from the command line (or if you prefer to install it manually e.g., as shown here ), you can also install BatchConvert alone, using the following script:

git clone https://github.com/Euro-BioImaging/BatchConvert.git && \ source BatchConvert/installation/install.sh

Other dependencies (which will be automatically installed):

  • bioformats2raw (entrypoint bioformats2raw)
  • bftools (entrypoint bfconvert)
  • go-mc (entrypoint mc)
  • aspera-cli (entrypoint ascp)

These dependencies will be pulled and cached automatically at the first execution of the conversion command. The mode of dependency management can be specified by using the command line option --profile or -pf . Depending on how this option is specified, the dependencies will be acquired / run either via conda or via docker/singularity containers.

Specifying --profile conda (default) will install the dependencies to an environment at ./.condaCache and use this environment to run the workflow. This option requires that miniconda/anaconda is installed on your system.

Alternatively, specifying --profile docker or --profile singularity will pull a docker or singularity image with the dependencies, respectively, and use this image to run the workflow. These options assume that the respective container runtime (docker or singularity) is available on your system. If singularity is being used, a cache directory will be created at the path ./.singularityCache where the singularity image is stored.

Finally, you can still choose to install the dependencies manually and use your own installations to run the workflow. In this case, you should specify --profile standard and make sure the entrypoints specified above are recognised by your shell.

Code Snippets

22
23
24
"""
batchconvert_cli.sh "$inpath.name" "${inpath.baseName}.ome.tiff"
"""
42
43
44
45
46
47
48
49
50
51
"""
if [[ -d "${inpath}/tempdir" ]];
    then
        batchconvert_cli.sh "${inpath}/tempdir/${pattern_file}" "${pattern_file.baseName}.ome.tiff"
    else
        batchconvert_cli.sh "$inpath/$pattern_file.name" "${pattern_file.baseName}.ome.tiff"
fi
# rm -rf ${inpath}/tempdir &> /dev/null
# rm -rf ${inpath}/*pattern &> /dev/null
"""
67
68
69
"""
batchconvert_cli.sh "$inpath.name" "${inpath.baseName}.ome.zarr"
"""
88
89
90
91
92
93
94
95
96
97
"""
if [[ -d "${inpath}/tempdir" ]];
    then
        batchconvert_cli.sh "${inpath}/tempdir/${pattern_file.name}" "${pattern_file.baseName}.ome.zarr"
    else
        batchconvert_cli.sh "$inpath/$pattern_file.name" "${pattern_file.baseName}.ome.zarr"
fi
# rm -rf ${inpath}/tempdir &> /dev/null
# rm -rf ${inpath}/*pattern &> /dev/null
"""
109
110
111
112
113
"""
sleep 5;
mc -C "./mc" alias set "${params.S3REMOTE}" "${params.S3ENDPOINT}" "${params.S3ACCESS}" "${params.S3SECRET}" &> /dev/null;
parse_s3_filenames.py "${params.S3REMOTE}/${params.S3BUCKET}/${source}/"
"""
126
127
128
129
130
131
132
133
134
135
136
"""
sleep 5;
localname="\$(basename $local)" && \
mc -C "./mc" alias set "${params.S3REMOTE}" "${params.S3ENDPOINT}" "${params.S3ACCESS}" "${params.S3SECRET}";
if [ -f $local ];then
    mc -C "./mc" cp $local "${params.S3REMOTE}"/"${params.S3BUCKET}"/"${params.out_path}"/"\$localname";
elif [ -d $local ];then
    mc -C "./mc" mirror $local "${params.S3REMOTE}"/"${params.S3BUCKET}"/"${params.out_path}"/"\$localname";
fi
echo "${params.S3REMOTE}"/"${params.S3BUCKET}"/"${params.out_path}"/$local > "./transfer_report.txt";
"""
145
146
147
148
149
"""
sleep 5;
mc -C "./mc" alias set "${params.S3REMOTE}" "${params.S3ENDPOINT}" "${params.S3ACCESS}" "${params.S3SECRET}";
mc -C "./mc" mirror "${params.S3REMOTE}"/"${params.S3BUCKET}"/"${source}" "transferred/${source}";
"""
160
161
162
163
164
"""
sleep 5;
mc -C "./mc" alias set "${params.S3REMOTE}" "${params.S3ENDPOINT}" "${params.S3ACCESS}" "${params.S3SECRET}";
mc -C "./mc" cp "${s3path}" "${s3name}";
"""
173
174
175
176
"""
ascp -P33001 -l 500M -k 2 -i $BIA_SSH_KEY -d $local [email protected]:${params.BIA_REMOTE}/${params.out_path};
echo "${params.BIA_REMOTE}"/"${params.out_path}" > "./transfer_report.txt";
"""
186
187
188
"""
ascp -P33001 -l 500M -k 2 -i $BIA_SSH_KEY -d [email protected]:${params.BIA_REMOTE}/$source ".";
"""
198
199
200
"""
ascp -P33001 -i $BIA_SSH_KEY [email protected]:$source transferred;
"""
271
272
273
274
275
276
277
278
279
280
281
"""
if [[ "${params.pattern}" == '' ]] && [[ "${params.reject_pattern}" == '' ]];then
    create_hyperstack --concatenation_order ${params.concatenation_order} ${inpath}
elif [[ "${params.reject_pattern}" == '' ]];then
    create_hyperstack --concatenation_order ${params.concatenation_order} --select_by ${params.pattern} ${inpath}
elif [[ "${params.pattern}" == '' ]];then
    create_hyperstack --concatenation_order ${params.concatenation_order} --reject_by ${params.reject_pattern} ${inpath}
else
    create_hyperstack --concatenation_order ${params.concatenation_order} --select_by ${params.pattern} --reject_by ${params.reject_pattern} ${inpath}
fi
"""
290
291
292
293
294
295
296
297
298
299
300
"""
if [[ "${params.pattern}" == '' ]] && [[ "${params.reject_pattern}" == '' ]];then
    create_hyperstack --concatenation_order ${params.concatenation_order} ${inpath}
elif [[ "${params.reject_pattern}" == '' ]];then
    create_hyperstack --concatenation_order ${params.concatenation_order} --select_by ${params.pattern} ${inpath}
elif [[ "${params.pattern}" == '' ]];then
    create_hyperstack --concatenation_order ${params.concatenation_order} --reject_by ${params.reject_pattern} ${inpath}
else
    create_hyperstack --concatenation_order ${params.concatenation_order} --select_by ${params.pattern} --reject_by ${params.reject_pattern} ${inpath}
fi
"""
318
319
320
321
"""
rm -rf "${inpath}/tempdir" &> /dev/null
rm -rf "${inpath}/*pattern" &> /dev/null
"""
332
333
334
335
"""
mc alias set "${params.S3REMOTE}" "${params.S3ENDPOINT}" "${params.S3ACCESS}" "${params.S3SECRET}";
mc mirror "${params.S3REMOTE}"/"${params.S3BUCKET}"/"${source}" "transferred";
"""
344
345
"""
"""
359
360
"""
"""
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
"""
if [[ "${params.merge_files}" == "True" ]];
    then
        create_hyperstack --concatenation_order ${params.concatenation_order} --select_by ${params.pattern} ${inpath};
        if [[ "${params.concatenation_order}" == "auto" ]];
            then
                batchconvert_cli.sh $inpath/*pattern "${inpath.baseName}.ome.zarr"
        elif ! [[ "${params.concatenation_order}" == "auto" ]];
            then
                batchconvert_cli.sh $inpath/tempdir/*pattern "${inpath.baseName}.ome.zarr"
        fi
elif [[ "${params.merge_files}" == "False" ]];
    then
        batchconvert_cli.sh $inpath "${inpath.baseName}.ome.zarr"
fi
rm -rf "${inpath}/tempdir" &> /dev/null
rm -rf "${inpath}/*pattern" &> /dev/null
"""
ShowHide 18 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/Euro-BioImaging/BatchConvert.git
Name: batchconvert
Version: main @ 03e32fe
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...