The executable script - run.sh

The execution of flex_extract is done by the run.sh shell script, which is a wrapper script for the top-level Python script submit.py. The Python script constitutes the entry point to ECMWF data retrievals with flex_extract and controls the program flow.

submit.py has two (or three) sources for input parameters with information about program flow and ECMWF data selection, the so-called CONTROL file, the command line parameters, and the so-called ECMWF_ENV file. Command line parameters will override parameters specified in the CONTROL file.

Based on this input information, flex_extract applies one of the application modes to either retrieve the ECMWF data via a web API on a local maschine, or submit a job script to an ECMWF server and retrieve the data there, and at the end sends the files to the local system.

Submission parameters

Parameters for submission
PARAMETER Format Possible values Default Description Specifics / Conditions

START_DATE

String (YYYYMMDD)

depends on dataset

None

The first day of the retrieval period.

If END_DATE is set, START_DATE must be earlier or equal than END_DATE.

END_DATE

String (YYYYMMDD)

depends on dataset

None

The last day of the retrieval period. For a one day retrieval it has to be the same date as START_DATE. If not set, it is automatically equal to START_DATE.

Doesn’t have to be set. If set, it has to be greater or equal than START_DATE.

DATE_CHUNK

Integer

depends on resolution

3

Maximum number of days retrieved within one MARS request.

This number is limited due to maximum allowed memory and time limit for one MARS request. Be careful in changing this number. It can be larger for reanalysis data but may be too large for very high resolution retrievals.

JOB_CHUNK

Integer

depends on resolution

None

# of days to be retrieved within a single job

Can be selected to start the submit script once and let it automatically divide the time period in smaller job chunks. Might be very useful for example if one would like to retrieve one month with 0.1° space resolution and 1h time resolution. Then only 1 day per job is possible.

CONTROLFILE

String

any CONTROL file

CONTROL_EA5

The file with all CONTROL parameters.

BASETIME

Integer

0; 12

None

This parameter is intended for half-day retrievals. Only half a day will be retrieved starting from BASETIME going back 12 hours. E.g. 20180510 with a BASETIME = 00 would lead to a data retrieval of 20180509 12h until 20180510 00h.

Can be set to 00 or 12 only.

STEP

blank seperated list of Integers (ii ii … ii) or as String (start/to/end)

00 - max available STEP in data set

None

This is the forecast time step in hours for each corresponding field type (TYPE). Counting of the steps starts from the forecast times. E.g. In Era-Interim, for forecasts at 3, 6, 9 UTC the STEPS 3,6 and 9 are used and the forecast TIME 00 UTC.

Has to have the same amount of values as in TYPE and TIME! For analysis (AN) fields the STEP has to be 00 always! It is more easily set in the CONTROL file. For pure forecast modes it might be set here as e.g. 0/to/36

LEVELIST

String (start/to/end)

1/to/137; depends on dataset

None

List of vertical levels for MARS request. It can be a subset of levels but it has to include the maximum level (end).

If full list of levels is needed and parameter LEVEL is set, the LEVELIST parameter is not needed. “end” has to be the maximum number of possible levels and has to be the same as in LEVEL, if specified.

AREA

Double (f/f/f/f)

any float within lat and lon boundaries

None

Area defined as north/west/south/east

DEBUG

Integer

0;1

0

Debug mode - leave temporary files intact

Usually only the final FLEXPART inputfiles are saved.

OPER

Integer

0;1

0

Operational mode - prepares dates with environment variables

REQUEST

Integer

0;1;2

0

List all mars requests with its specifc values in file mars_requests.dat 0 – no file 1 – only mars_requests.csv 2 – file and extraction

Very useful for documentation or debugging reasons.

PUBLIC

Integer

0;1

0

Public mode - retrieves the public datasets

IMPORTANT: This is necessary to select for each PUBLIC user!

RRINT

Integer

0;1

0

Selection of old or new precipitation interpolation ; 0 - old method; 1 - new method (additional subgrid points)

IMPORTANT: If this new method is used, each single GRIB file will contain 3 fields for the large scale and 3 fields for the convective precipitation. They can be distinguished by the “stepRange” keyword in the GribMessages. StepRange = 0 : original time step; stepRange = 1 : first subgrid point; stepRange = 2 : second subgrid point

INPUTDIR

String

any path

None

Path to the temporary directory for the retrieval grib files and other processing files.

The temporary directory will be created if it does not already exist.

OUTPUTDIR

String

any path

None

Path to the final directory where the final FLEXPART ready input files are stored.

The final output directory will be created if it does not already exist.

PPID

Integer

None

This is the specify parent process id of a single flex_extract run to identify the files. It is the second number in the GRIB files.

This is usually only necessary if the GRIB data were retrieved and a rerun of prepare_flexpart has to be done. Then ppid is used to select the files.

JOB_TEMPLATE

String

jobscript.template

jobscript.template

The job template file which are adapted to be submitted to the batch system on ECMWF server.

QUEUE

String

ecgate, cca, ccb

None

The ECMWF server name for submission of the job script to the batch system.

Content of run.sh

run.sh
#!/bin/bash
#
# @Author: Anne Philipp
#
# @Date: October, 4 2018
#
# @Description: 
#    This script defines the available command-line parameters
#    for running flex_extract and combines them for the execution  
#    of the Python program. It also does some checks to 
#    guarantee necessary parameters were set and consistent.
#
# @Licence:
#    (C) Copyright 2014-2020.
#
#    SPDX-License-Identifier: CC-BY-4.0
#
#    This work is licensed under the Creative Commons Attribution 4.0
#    International License. To view a copy of this license, visit
#    http://creativecommons.org/licenses/by/4.0/ or send a letter to
#    Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
#
# -----------------------------------------------------------------
# AVAILABLE COMMANDLINE ARGUMENTS TO SET
# 
# THE USER HAS TO SPECIFY THESE PARAMETERS:

QUEUE='ecgate'
START_DATE=None
END_DATE=None
DATE_CHUNK=None
JOB_CHUNK=3
BASETIME=None
STEP=None
LEVELIST=None
AREA=None
INPUTDIR=None
OUTPUTDIR=None
PP_ID=None
JOB_TEMPLATE='submitscript.template' 
CONTROLFILE='CONTROL_EA5' 
DEBUG=0
REQUEST=2
PUBLIC=0

# -----------------------------------------------------------------
#
# AFTER THIS LINE THE USER DOES NOT HAVE TO CHANGE ANYTHING !!!
#
# -----------------------------------------------------------------

# PATH TO SUBMISSION SCRIPT
pyscript=../Source/Python/submit.py

# INITIALIZE EMPTY PARAMETERLIST
parameterlist=""

# CHECK IF ON ECMWF SERVER; 
if [[ $HOST == *"ecgb"* ]] || [[ $HOST == *"cca"* ]] || [[ $HOST == *"ccb"* ]]; then
# LOAD PYTHON3 MODULE
  module load python3
fi 

# CHECK FOR MORE PARAMETER 
if [ -n "$START_DATE" ]; then
  parameterlist+=" --start_date=$START_DATE"
fi
if [ -n "$END_DATE" ]; then
  parameterlist+=" --end_date=$END_DATE"
fi
if [ -n "$DATE_CHUNK" ]; then
  parameterlist+=" --date_chunk=$DATE_CHUNK"
fi
if [ -n "$JOB_CHUNK" ]; then
  parameterlist+=" --job_chunk=$JOB_CHUNK"
fi
if [ -n "$BASETIME" ]; then
  parameterlist+=" --basetime=$BASETIME"
fi
if [ -n "$STEP" ]; then
  parameterlist+=" --step=$STEP"
fi
if [ -n "$LEVELIST" ]; then
  parameterlist+=" --levelist=$LEVELIST"
fi
if [ -n "$AREA" ]; then
  parameterlist+=" --area=$AREA"
fi
if [ -n "$INPUTDIR" ]; then
  parameterlist+=" --inputdir=$INPUTDIR"
fi
if [ -n "$OUTPUTDIR" ]; then
  parameterlist+=" --outputdir=$OUTPUTDIR"
fi
if [ -n "$PP_ID" ]; then
  parameterlist+=" --ppid=$PP_ID"
fi
if [ -n "$JOB_TEMPLATE" ]; then
  parameterlist+=" --job_template=$JOB_TEMPLATE"
fi
if [ -n "$QUEUE" ]; then
  parameterlist+=" --queue=$QUEUE"
fi
if [ -n "$CONTROLFILE" ]; then
  parameterlist+=" --controlfile=$CONTROLFILE"
fi
if [ -n "$DEBUG" ]; then
  parameterlist+=" --debug=$DEBUG"
fi
if [ -n "$REQUEST" ]; then
  parameterlist+=" --request=$REQUEST"
fi
if [ -n "$PUBLIC" ]; then
  parameterlist+=" --public=$PUBLIC"
fi

# -----------------------------------------------------------------
# CALL SCRIPT WITH DETERMINED COMMANDLINE ARGUMENTS

$pyscript $parameterlist

Usage of submit.py (optional)

It is also possible to start flex_extract directly from command line by using the submit.py script instead of the wrapper shell script run.sh. This top-level script is located in flex_extract_vX.X/Source/Python and is executable. With the --help parameter we see again all possible command line parameters.

submit.py --help

usage: submit.py [-h] [--start_date START_DATE] [--end_date END_DATE]
              [--date_chunk DATE_CHUNK] [--job_chunk JOB_CHUNK]
              [--controlfile CONTROLFILE] [--basetime BASETIME]
              [--step STEP] [--levelist LEVELIST] [--area AREA]
              [--debug DEBUG] [--oper OPER] [--request REQUEST]
              [--public PUBLIC] [--rrint RRINT] [--inputdir INPUTDIR]
              [--outputdir OUTPUTDIR] [--ppid PPID]
              [--job_template JOB_TEMPLATE] [--queue QUEUE]

 Retrieve FLEXPART input from ECMWF MARS archive

 optional arguments:
   -h, --help            show this help message and exit
   --start_date START_DATE
                         start date YYYYMMDD (default: None)
   --end_date END_DATE   end_date YYYYMMDD (default: None)
   --date_chunk DATE_CHUNK
                         # of days to be retrieved at once (default: None)
   --job_chunk JOB_CHUNK
                         # of days to be retrieved within a single job
                         (default: None)
   --controlfile CONTROLFILE
                         The file with all CONTROL parameters. (default:
                         CONTROL_EA5)
   --basetime BASETIME   base such as 0 or 12 (for half day retrievals)
                         (default: None)
   --step STEP           Forecast steps such as 00/to/48 (default: None)
   --levelist LEVELIST   Vertical levels to be retrieved, e.g. 30/to/60
                         (default: None)
   --area AREA           area defined as north/west/south/east (default: None)
   --debug DEBUG         debug mode - leave temporary files intact (default:
                         None)
   --oper OPER           operational mode - prepares dates with environment
                         variables (default: None)
   --request REQUEST     list all mars requests in file mars_requests.dat
                         (default: None)
   --public PUBLIC       public mode - retrieves the public datasets (default:
                         None)
   --rrint RRINT         Selection of old or new precipitation interpolation: 0
                         - old method 1 - new method (additional subgrid
                         points) (default: None)
   --inputdir INPUTDIR   Path to the temporary directory for the retrieval grib
                         files and other processing files. (default: None)
   --outputdir OUTPUTDIR
                         Path to the final directory where the final FLEXPART
                         ready input files are stored. (default: None)
   --ppid PPID           This is the specify parent process id of a single
                         flex_extract run to identify the files. It is the
                         second number in the GRIB files. (default: None)
   --job_template JOB_TEMPLATE
                         The job template file which are adapted to be
                         submitted to the batch system on ECMWF server.
                         (default: job.temp)
   --queue QUEUE         The ECMWF server name for submission of the job script
                         to the batch system (e.g. ecgate | cca | ccb)
                         (default: None)