Using the CIF in Docker

A docker image has been generated to easily run the CIF within a Docker container. This Docker image allows new users to try academic examples with a Gaussian plume model. In principle, more complex cases with full-physics numerical transport models could work on the Docker container, but this was only tested with the model CHIMERE so far, and it requires some extra knowledge on how to use Docker to be able to use datasets from your cluster.

What is Docker

Docker allows you to run applications inside a controlled pre-defined environment. It guarantees portability and reproducibility of your experiments.

Within the CIF, Docker images are used for automatic testing, as well as for dissemination. Academic cases can be run with almost no effort for new users to get accustomed to the CIF.

Installing Docker

One needs to have Docker installed on his/her machine/cluster to carry on. Please find here instructions for installing Docker.

Depending on your permissions on your machines and on how Docker was installed, it may be recommended to install Docker rootless mode which enables Docker for non-root users.

What is inside the CIF Docker image?

The CIF Docker image is build on a Linux Ubuntu environment. It includes Python 3.9 and all packages needed to run the CIF.

Fortran compilers and important Fortran libraries (MPI, NetCDF) are also included to be able to run some transport models.

Note

For developers, additional libraries can be required. You can complement accordingly the build DockerFile below to set your own Docker image. Please don’t forget to share the updated image to allow other users to run your model on the Docker.

# Build a Docker image that contains all requirements to run the PyCIF CI.
FROM ubuntu:20.04

RUN apt update \
    && DEBIAN_FRONTEND=noninteractive apt install -y software-properties-common rsync git wget\
    && apt update \
# Install Python
    && DEBIAN_FRONTEND=noninteractive apt install -y python3.9 python3-pip \
# Install GDAL
    && apt-get install -y gdal-bin libgdal-dev \
    && pip install numpy GDAL \
# Install important packages
	&& DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential xmlstarlet gfortran libeccodes0 graphviz \
	&& pip install scipy tox==3.23.1 sphinx sphinxcontrib-plantuml graphviz f90nml \
	               pyproj cython pandas matplotlib==3.3.4 bottleneck h5py xarray \
# Install MPI
    && apt-get install -y libopenmpi-dev openmpi-bin libhdf5-openmpi-dev \
# Install NetCDF
    && apt-get install libnetcdf-dev libnetcdff-dev;

# Install python packages to install pycif locally
RUN apt update \
    && apt-get install -y python3-yaml \
    && pip install pytest pytest-html pyyaml virtualenv==20.4.7 \
	&& pip install numpy --upgrade ;

# Install pycif on the container
COPY ./ /tmp/CIF

WORKDIR /tmp/CIF/

RUN python3 setup.py develop

# Command to run at execution
COPY ./bin/docker_entrypoint.sh /bin/

# Create directories
RUN mkdir /workdir/ \
    && mkdir /config/;

ENTRYPOINT ["docker_entrypoint.sh"]

Download the CIF image

The CIF Docker image is stored on Docker Hub. The image is publicly available and can be downloaded using the command:

docker pull pycif/pycif-ubuntu:0.2

Other images are available for specific usage of the CIF, especially for some CTMs. Corresponding images can be found here.

Warning

Please note that image tags can change over time and may not be updated here in the documentation. Be sure to visit the CIF Docker hub page to check the last image tag.

Please also look at the file .gitlab-ci.yml to check what docker image is used with which test.

Running the CIF inside the image

The CIF Docker image can be used as a black box to run and test the CIF with configuration files from your machine.

An example of script is available in the script bin/docker_debug.sh provided with the CIF sources. It can be simply used as follows:

cd CIF_root_dir/bin
./docker_debug.sh path_to_my_config_file.yml

The script contains the following lines:

 1#!/usr/bin/env bash
 2
 3# Provide local path to your yaml configuration file
 4config_file=$1
 5pycif_root_dir=`pwd`/../
 6extra_volumns=$2
 7
 8# Convert to absolute paths
 9config_file=`echo "$(cd "$(dirname "$config_file")"; pwd)/$(basename "$config_file")"`
10pycif_root_dir=`echo "$(cd "$pycif_root_dir"; pwd)"`
11
12# Fetch workdir to mount it in the container for saving outputs
13workdir=`python3 -W ignore -c \
14"
15from pycif.utils.yml import ordered_load;
16with open('$config_file', 'r') as f:
17    config = ordered_load(f)
18
19print('AAAAAAAAAAAA')
20print(config['workdir']); "`
21
22workdir=`echo $workdir | awk -F "AAAAAAAAAAAA" '{print $2}'`
23
24mkdir -p $workdir
25
26# Run the configuration into the container and writing outputs to workdir
27docker run -it -v $config_file:/config/config.yml \
28    -v $pycif_root_dir:/tmp/CIF/ \
29    -v $workdir:/workdir/ \
30    -v /home/aberchet/Projects/PYCIF_DATA_TEST/:/tmp/PYCIF_DATA_TEST/ \
31    $extra_volumns \
32    --entrypoint /bin/bash pycif/pycif-ubuntu:0.3
33#    pycif/pycif-ubuntu:0.3
34#    --entrypoint /bin/bash pycif/pycif-ubuntu:0.2
35
36
37#pip install h5py
38#python3 -m pycif /config/config.yml

We recommend testing tutorials for the Gaussian plume model in the Docker to quick start with the CIF and explore outputs.

Running automatic tests inside the image

Automatic tests are run each time a git push is done to gitlab.in2p3.fr. It can happen that tests do not pass and that returned Exceptions are not obvious to solve. To explore tests in a dynamic way, it is possible to run the automatic tests for pyCIF manually inside a docker.

To do so, create a script with the following lines in ${CIF_root_dir}/bin, where ${PYCIF_DATA_TEST} is the path to the test data (see note below for path to download test data):

#!/usr/bin/env bash

# Provide local path to your yaml configuration file
pycif_root_dir=`pwd`/../
extra_volumns="-v ${PYCIF_DATA_TEST}:/tmp/PYCIF_DATA_TEST/"

# Run the configuration into the container and writing outputs to workdir
docker run \
    -v $pycif_root_dir:/tmp/CIF/ \
    $extra_volumns \
    -it --entrypoint /tmp/CIF/bin/tox_command.sh pycif/pycif-tm5:0.2

The script above calls another script tox_command.sh to be created in ${CIF_root_dir}/bin:

#!/usr/bin/env bash

cd /tmp/CIF/
pip freeze
tox -e py38 -e coverage -- -m 'test_in_ci and dummy'

In the above example, the tests relative to the Gaussian model will be run.

It is possible to enter the docker image and explore corresponding results after running. This is especially useful if some bugs are detected. To do so:

# First find the ID of the container in which the tests where run:
docker ps -a

# Then restart the container
docker start [container-id]

# Now enter the resurrected container
docker exec -it [container-id] /bin/bash

Note

Tests are programmed for the Gaussian toy model, CHIMERE, FLEXPART and TM5.

Test for TM5 should be run with the Docker image: pycif/pycif-tm5:0.2.

Test for all other models can be run with the Docker image: pycif/pycif-ubuntu:0.2.

The tox keywords to run one model or the other are:

  1. Gaussian Toy model:

    tox -e py38 -e coverage -- -m 'test_in_ci and dummy'
    
  2. CHIMERE:

    tox -e py38 -e coverage -- -m 'test_in_ci and chimere'
    
  3. FLEXPART:

    tox -e py38 -e coverage -- -m 'test_in_ci and flexpart'
    
  4. TM5:

    tox -e py38 -e coverage -- -m 'test_in_ci and tm5'
    

External data are needed to run CHIMERE, FLEXPART and TM5. They can be downloaded at the following links: CHIMERE, FLEXPART, TM5, auxiliary data (e.g., EDGAR, etc; needed for CHIMERE).

All external data should be extracted under a given directory that should be set as the environment variable PYCIF_DATA_TEST before running the tests.

Further detail on tox commands used in CIF and about what data is downloaded for which test can be found in the file .gitlab-ci.yml

For complicated bugs, it is possible to enter Python debugger inside tox. To do so, insert the following lines when you want to enter the debugger:

import pudb
pudb.set_trace()

To allow tox to enter the debugger, it is necessary to add the following lines to the file tox.ini at the root of the CIF repository in the section [testenv] and the deps= definition (below numpy):

pudb
pytest-pudb

File structure in the image

There are few tweaks to know about how to set up the path in the configuration file compared to running it directly on your machine.

The working directory (workdir) you define in the Yaml must be valid on your host machine. A virtual link is established with the Docker to allow the Docker container to interact with the host.

All the other paths should be valid from the Docker point of view. That means, for instance, that paths to sources in the CIF should follow the local Docker directory tree:

Docker Container
├── bin
│   └── docker_entrypoint.sh
├── tmp
│   └── CIF
│       └── CIF sources
├── config
│   └── where your configuration file will be stored
└── workdir
    └── where your case will be run (then linked back to your local workdir)

In that case, path to CIF sources should be /tmp/CIF/...

Using more complex cases with external data from the host

If one wants to go beyond simple cases, external data (meteorological files, emission data sets, observations, etc.) may be necessary and not integrated in the Docker image.

To integrate data from the host in the Docker, one needs to change the bin/docker_debug.sh and add extra volumes to the configuration.

At line 22, the option -v ${path_on_host}:${path_on_docker} mounts virtual volumes in the Docker. Therefore, data volume can be added as follows:

docker run -v $config_file:/config/config.yml \
    -v $pycif_root_dir:/tmp/CIF/ \
    -v $workdir:/workdir/ \
    -v ${path_to_my_data}:/data/ \
    -it pycif/pycif-ubuntu:0.1

With that example, please remember that the paths in your configuration files pointing to your data will not automatically follow and would need to be updated to /data/...