The executable script - run.sh
¶
The execution of flex_extract
is done by the run.sh
shell script, which is a wrapper script for the top-level Python script submit.py
.
The Python script constitutes the entry point to ECMWF data retrievals with flex_extract
and controls the program flow.
submit.py
has two (or three) sources for input parameters with information about program flow and ECMWF data selection, the so-called CONTROL
file,
the command line parameters, and the so-called ECMWF_ENV
file. Command line parameters will override parameters specified in the CONTROL
file.
Based on this input information, flex_extract
applies one of the application modes to either retrieve the ECMWF data via a web API on a local maschine, or submit a job script to an ECMWF server and retrieve the data there, and at the end sends the files to the local system.
Submission parameters¶
PARAMETER | Format | Possible values | Default | Description | Specifics / Conditions |
---|---|---|---|---|---|
START_DATE |
String (YYYYMMDD) |
depends on dataset |
None |
The first day of the retrieval period. |
If END_DATE is set, START_DATE must be earlier or equal than END_DATE. |
END_DATE |
String (YYYYMMDD) |
depends on dataset |
None |
The last day of the retrieval period. For a one day retrieval it has to be the same date as START_DATE. If not set, it is automatically equal to START_DATE. |
Doesn’t have to be set. If set, it has to be greater or equal than START_DATE. |
DATE_CHUNK |
Integer |
depends on resolution |
3 |
Maximum number of days retrieved within one MARS request. |
This number is limited due to maximum allowed memory and time limit for one MARS request. Be careful in changing this number. It can be larger for reanalysis data but may be too large for very high resolution retrievals. |
JOB_CHUNK |
Integer |
depends on resolution |
None |
# of days to be retrieved within a single job |
Can be selected to start the submit script once and let it automatically divide the time period in smaller job chunks. Might be very useful for example if one would like to retrieve one month with 0.1° space resolution and 1h time resolution. Then only 1 day per job is possible. |
CONTROLFILE |
String |
any CONTROL file |
CONTROL_EA5 |
The file with all CONTROL parameters. |
|
BASETIME |
Integer |
0; 12 |
None |
This parameter is intended for half-day retrievals. Only half a day will be retrieved starting from BASETIME going back 12 hours. E.g. 20180510 with a BASETIME = 00 would lead to a data retrieval of 20180509 12h until 20180510 00h. |
Can be set to 00 or 12 only. |
STEP |
blank seperated list of Integers (ii ii … ii) or as String (start/to/end) |
00 - max available STEP in data set |
None |
This is the forecast time step in hours for each corresponding field type (TYPE). Counting of the steps starts from the forecast times. E.g. In Era-Interim, for forecasts at 3, 6, 9 UTC the STEPS 3,6 and 9 are used and the forecast TIME 00 UTC. |
Has to have the same amount of values as in TYPE and TIME! For analysis (AN) fields the STEP has to be 00 always! It is more easily set in the CONTROL file. For pure forecast modes it might be set here as e.g. 0/to/36 |
LEVELIST |
String (start/to/end) |
1/to/137; depends on dataset |
None |
List of vertical levels for MARS request. It can be a subset of levels but it has to include the maximum level (end). |
If full list of levels is needed and parameter LEVEL is set, the LEVELIST parameter is not needed. “end” has to be the maximum number of possible levels and has to be the same as in LEVEL, if specified. |
AREA |
Double (f/f/f/f) |
any float within lat and lon boundaries |
None |
Area defined as north/west/south/east |
|
DEBUG |
Integer |
0;1 |
0 |
Debug mode - leave temporary files intact |
Usually only the final FLEXPART inputfiles are saved. |
OPER |
Integer |
0;1 |
0 |
Operational mode - prepares dates with environment variables |
|
REQUEST |
Integer |
0;1;2 |
0 |
List all mars requests with its specifc values in file mars_requests.dat 0 – no file 1 – only mars_requests.csv 2 – file and extraction |
Very useful for documentation or debugging reasons. |
PUBLIC |
Integer |
0;1 |
0 |
Public mode - retrieves the public datasets |
IMPORTANT: This is necessary to select for each PUBLIC user! |
RRINT |
Integer |
0;1 |
0 |
Selection of old or new precipitation interpolation ; 0 - old method; 1 - new method (additional subgrid points) |
IMPORTANT: If this new method is used, each single GRIB file will contain 3 fields for the large scale and 3 fields for the convective precipitation. They can be distinguished by the “stepRange” keyword in the GribMessages. StepRange = 0 : original time step; stepRange = 1 : first subgrid point; stepRange = 2 : second subgrid point |
INPUTDIR |
String |
any path |
None |
Path to the temporary directory for the retrieval grib files and other processing files. |
The temporary directory will be created if it does not already exist. |
OUTPUTDIR |
String |
any path |
None |
Path to the final directory where the final FLEXPART ready input files are stored. |
The final output directory will be created if it does not already exist. |
PPID |
Integer |
None |
This is the specify parent process id of a single flex_extract run to identify the files. It is the second number in the GRIB files. |
This is usually only necessary if the GRIB data were retrieved and a rerun of prepare_flexpart has to be done. Then ppid is used to select the files. |
|
JOB_TEMPLATE |
String |
jobscript.template |
jobscript.template |
The job template file which are adapted to be submitted to the batch system on ECMWF server. |
|
QUEUE |
String |
ecgate, cca, ccb |
None |
The ECMWF server name for submission of the job script to the batch system. |
Content of run.sh
¶
#!/bin/bash
#
# @Author: Anne Philipp
#
# @Date: October, 4 2018
#
# @Description:
# This script defines the available command-line parameters
# for running flex_extract and combines them for the execution
# of the Python program. It also does some checks to
# guarantee necessary parameters were set and consistent.
#
# @Licence:
# (C) Copyright 2014-2020.
#
# SPDX-License-Identifier: CC-BY-4.0
#
# This work is licensed under the Creative Commons Attribution 4.0
# International License. To view a copy of this license, visit
# http://creativecommons.org/licenses/by/4.0/ or send a letter to
# Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
#
# -----------------------------------------------------------------
# AVAILABLE COMMANDLINE ARGUMENTS TO SET
#
# THE USER HAS TO SPECIFY THESE PARAMETERS:
QUEUE='ecgate'
START_DATE=None
END_DATE=None
DATE_CHUNK=None
JOB_CHUNK=3
BASETIME=None
STEP=None
LEVELIST=None
AREA=None
INPUTDIR=None
OUTPUTDIR=None
PP_ID=None
JOB_TEMPLATE='submitscript.template'
CONTROLFILE='CONTROL_EA5'
DEBUG=0
REQUEST=2
PUBLIC=0
# -----------------------------------------------------------------
#
# AFTER THIS LINE THE USER DOES NOT HAVE TO CHANGE ANYTHING !!!
#
# -----------------------------------------------------------------
# PATH TO SUBMISSION SCRIPT
pyscript=../Source/Python/submit.py
# INITIALIZE EMPTY PARAMETERLIST
parameterlist=""
# CHECK IF ON ECMWF SERVER;
if [[ $HOST == *"ecgb"* ]] || [[ $HOST == *"cca"* ]] || [[ $HOST == *"ccb"* ]]; then
# LOAD PYTHON3 MODULE
module load python3
fi
# CHECK FOR MORE PARAMETER
if [ -n "$START_DATE" ]; then
parameterlist+=" --start_date=$START_DATE"
fi
if [ -n "$END_DATE" ]; then
parameterlist+=" --end_date=$END_DATE"
fi
if [ -n "$DATE_CHUNK" ]; then
parameterlist+=" --date_chunk=$DATE_CHUNK"
fi
if [ -n "$JOB_CHUNK" ]; then
parameterlist+=" --job_chunk=$JOB_CHUNK"
fi
if [ -n "$BASETIME" ]; then
parameterlist+=" --basetime=$BASETIME"
fi
if [ -n "$STEP" ]; then
parameterlist+=" --step=$STEP"
fi
if [ -n "$LEVELIST" ]; then
parameterlist+=" --levelist=$LEVELIST"
fi
if [ -n "$AREA" ]; then
parameterlist+=" --area=$AREA"
fi
if [ -n "$INPUTDIR" ]; then
parameterlist+=" --inputdir=$INPUTDIR"
fi
if [ -n "$OUTPUTDIR" ]; then
parameterlist+=" --outputdir=$OUTPUTDIR"
fi
if [ -n "$PP_ID" ]; then
parameterlist+=" --ppid=$PP_ID"
fi
if [ -n "$JOB_TEMPLATE" ]; then
parameterlist+=" --job_template=$JOB_TEMPLATE"
fi
if [ -n "$QUEUE" ]; then
parameterlist+=" --queue=$QUEUE"
fi
if [ -n "$CONTROLFILE" ]; then
parameterlist+=" --controlfile=$CONTROLFILE"
fi
if [ -n "$DEBUG" ]; then
parameterlist+=" --debug=$DEBUG"
fi
if [ -n "$REQUEST" ]; then
parameterlist+=" --request=$REQUEST"
fi
if [ -n "$PUBLIC" ]; then
parameterlist+=" --public=$PUBLIC"
fi
# -----------------------------------------------------------------
# CALL SCRIPT WITH DETERMINED COMMANDLINE ARGUMENTS
$pyscript $parameterlist
Usage of submit.py
(optional)¶
It is also possible to start flex_extract
directly from command line by using the submit.py
script instead of the wrapper shell script run.sh
. This top-level script is located in
flex_extract_vX.X/Source/Python
and is executable. With the --help
parameter
we see again all possible command line parameters.
submit.py --help
usage: submit.py [-h] [--start_date START_DATE] [--end_date END_DATE]
[--date_chunk DATE_CHUNK] [--job_chunk JOB_CHUNK]
[--controlfile CONTROLFILE] [--basetime BASETIME]
[--step STEP] [--levelist LEVELIST] [--area AREA]
[--debug DEBUG] [--oper OPER] [--request REQUEST]
[--public PUBLIC] [--rrint RRINT] [--inputdir INPUTDIR]
[--outputdir OUTPUTDIR] [--ppid PPID]
[--job_template JOB_TEMPLATE] [--queue QUEUE]
Retrieve FLEXPART input from ECMWF MARS archive
optional arguments:
-h, --help show this help message and exit
--start_date START_DATE
start date YYYYMMDD (default: None)
--end_date END_DATE end_date YYYYMMDD (default: None)
--date_chunk DATE_CHUNK
# of days to be retrieved at once (default: None)
--job_chunk JOB_CHUNK
# of days to be retrieved within a single job
(default: None)
--controlfile CONTROLFILE
The file with all CONTROL parameters. (default:
CONTROL_EA5)
--basetime BASETIME base such as 0 or 12 (for half day retrievals)
(default: None)
--step STEP Forecast steps such as 00/to/48 (default: None)
--levelist LEVELIST Vertical levels to be retrieved, e.g. 30/to/60
(default: None)
--area AREA area defined as north/west/south/east (default: None)
--debug DEBUG debug mode - leave temporary files intact (default:
None)
--oper OPER operational mode - prepares dates with environment
variables (default: None)
--request REQUEST list all mars requests in file mars_requests.dat
(default: None)
--public PUBLIC public mode - retrieves the public datasets (default:
None)
--rrint RRINT Selection of old or new precipitation interpolation: 0
- old method 1 - new method (additional subgrid
points) (default: None)
--inputdir INPUTDIR Path to the temporary directory for the retrieval grib
files and other processing files. (default: None)
--outputdir OUTPUTDIR
Path to the final directory where the final FLEXPART
ready input files are stored. (default: None)
--ppid PPID This is the specify parent process id of a single
flex_extract run to identify the files. It is the
second number in the GRIB files. (default: None)
--job_template JOB_TEMPLATE
The job template file which are adapted to be
submitted to the batch system on ECMWF server.
(default: job.temp)
--queue QUEUE The ECMWF server name for submission of the job script
to the batch system (e.g. ecgate | cca | ccb)
(default: None)