Using the CIF in Docker

A docker image has been generated to easily run the CIF within a Docker container. This Docker image allows new users to try academic examples with a Gaussian plume model. In principle, more complex cases with full-physics numerical transport models could work on the Docker container, but this was only tested with the model CHIMERE so far, and it requires some extra knowledge on how to use Docker to be able to use datasets from your cluster.

What is Docker

Docker allows you to run applications inside a controlled pre-defined environment. It guarantees portability and reproducibility of your experiments.

Within the CIF, Docker images are used for automatic testing, as well as for dissemination. Academic cases can be run with almost no effort for new users to get accustomed to the CIF.

Installing Docker

One needs to have Docker installed on his/her machine/cluster to carry on. Please find here instructions for installing Docker.

Depending on your permissions on your machines and on how Docker was installed, it may be recommended to install Docker rootless mode which enables Docker for non-root users.

What is inside the CIF Docker image?

The CIF Docker image is build on a Linux Ubuntu environment. It includes Python 3.9 and all packages needed to run the CIF.

Fortran compilers and important Fortran libraries (MPI, NetCDF) are also included to be able to run some transport models.

Note

For developers, additional libraries can be required. You can complement accordingly the build DockerFile below to set your own Docker image. Please don’t forget to share the updated image to allow other users to run your model on the Docker.

# Build a Docker image that contains all requirements to run the PyCIF CI.
FROM ubuntu:20.10

RUN apt update \
    && DEBIAN_FRONTEND=noninteractive apt install -y software-properties-common rsync git \
    && apt update \
# Install Python
    && DEBIAN_FRONTEND=noninteractive apt install -y python3.9 python3-pip \
# Install GDAL
    && apt-get install -y gdal-bin libgdal-dev \
    && pip install numpy GDAL \
# Install important packages
	&& DEBIAN_FRONTEND=noninteractive apt-get install -y build-essential xmlstarlet gfortran libeccodes0 graphviz \
	&& pip install scipy tox sphinx sphinxcontrib-plantuml graphviz \
	               pyproj cython numpy pandas bottleneck \
# Install MPI
    && apt-get install -y libopenmpi-dev openmpi-bin libhdf5-openmpi-dev \
# Install NetCDF
    && cd /tmp \
    && apt install -y wget \
    && wget https://www.unidata.ucar.edu/downloads/netcdf/ftp/netcdf-fortran-4.5.3.tar.gz \
    && tar -xf netcdf-fortran-4.5.3.tar.gz \
    && cd netcdf-fortran-4.5.3 \
    && ./configure --prefix=${NCDIR} --disable-netcdf-4 \
    && make install;

RUN apt update \
    && apt-get install -y python3-yaml \
    && pip install pytest pytest-html pyyaml;

# Install pycif on the container
COPY ./ /tmp/CIF

WORKDIR /tmp/CIF/

RUN python3 setup.py develop

# Command to run at execution
COPY ./bin/docker_entrypoint.sh /bin/

# Create directories
RUN mkdir /workdir/ \
    && mkdir /config/;

ENTRYPOINT ["docker_entrypoint.sh"]

Download the CIF image

The CIF Docker image is stored on Docker Hub. The image is publicly available and can be downloaded using the command:

docker pull pycif/pycif-ubuntu:0.1

Other images are available for specific usage of the CIF, especially for some CTMs. Corresponding images can be found here.

Running the CIF inside the image

The CIF Docker image can be used as a black box to run and test the CIF with configuration files from your machine.

An example of script is available in the script bin/docker_debug.sh provided with the CIF sources. It can be simply used as follows:

cd CIF_root_dir/bin
./docker_debug.sh path_to_my_config_file.yml

The script contains the following lines:

 1#!/usr/bin/env bash
 2
 3# Provide local path to your yaml configuration file
 4config_file=$1
 5pycif_root_dir=`pwd`/../
 6extra_volumns=$2
 7
 8# Convert to absolute paths
 9config_file=`echo "$(cd "$(dirname "$config_file")"; pwd)/$(basename "$config_file")"`
10pycif_root_dir=`echo "$(cd "$pycif_root_dir"; pwd)"`
11
12# Fetch workdir to mount it in the container for saving outputs
13workdir=`python3 -W ignore -c \
14"
15from pycif.utils.yml import ordered_load;
16with open('$config_file', 'r') as f:
17    config = ordered_load(f)
18
19print('AAAAAAAAAAAA')
20print(config['workdir']); "`
21
22workdir=`echo $workdir | awk -F "AAAAAAAAAAAA" '{print $2}'`
23
24mkdir -p $workdir
25
26# Run the configuration into the container and writing outputs to workdir
27docker run -it -v $config_file:/config/config.yml \
28    -v $pycif_root_dir:/tmp/CIF/ \
29    -v $workdir:/workdir/ \
30    $extra_volumns \
31    pycif/pycif-tm5:0.2
32#    --entrypoint /bin/bash pycif/pycif-tm5:0.2
33#    --entrypoint /bin/bash pycif/pycif-ubuntu:0.1

We recommend testing tutorials the Gaussian plume model in the Docker to quick start with the CIF and explore outputs.

Running automatic tests inside the image

It is possible to run the automatic tests for pyCIF manually inside a docker.

To do so, create a script with the following lines in ${CIF_root_dir}/bin, where ${PYCIF_DATA_TEST} is the path to the test data:

#!/usr/bin/env bash

# Provide local path to your yaml configuration file
pycif_root_dir=`pwd`/../
extra_volumns="-v ${PYCIF_DATA_TEST}:/tmp/PYCIF_DATA_TEST/"

# Run the configuration into the container and writing outputs to workdir
docker run \
    -v $pycif_root_dir:/tmp/CIF/ \
    $extra_volumns \
    -it --entrypoint /tmp/CIF/bin/tox_command.sh pycif/pycif-tm5:0.2

The script above calls another script tox_command.sh to be created in ${CIF_root_dir}/bin:

#!/usr/bin/env bash

cd /tmp/CIF/
pip freeze
tox -e py38 -e coverage -- -m 'test_in_ci and dummy'

In the above example, the tests relative to the Gaussian model will be run.

It is possible to enter the docker image and explore corresponding results after running. This is especially useful if some bugs are detected. To do so:

# First find the ID of the container in which the tests where run:
docker ps -a

# Then restart the container
docker start [container-id]

# Now enter the resurected container
docker exec -it [container-id] /bin/bash

File structure in the image

There are few tweaks to know about how to set up the path in the configuration file compared to running it directly on your machine.

The working directory (workdir) you define in the Yaml must be valid on your host machine. A virtual link is established with the Docker to allow the Docker container to interact with the host.

All the other paths should be valid from the Docker point of view. That means, for instance, that paths to sources in the CIF should follow the local Docker directory tree:

Docker Container
├── bin
│   └── docker_entrypoint.sh
├── tmp
│   └── CIF
│       └── CIF sources
├── config
│   └── where your configuration file will be stored
└── workdir
    └── where your case will be run (then linked back to your local workdir)

In that case, path to CIF sources should be /tmp/CIF/...

Using more complex cases with external data from the host

If one wants to go beyond simple cases, external data (meteorological files, emission data sets, observations, etc.) may be necessary and not integrated in the Docker image.

To integrate data from the host in the Docker, one needs to change the bin/docker_debug.sh and add extra volumes to the configuration.

At line 22, the option -v ${path_on_host}:${path_on_docker} mounts virtual volumes in the Docker. Therefore, data volume can be added as follows:

docker run -v $config_file:/config/config.yml \
    -v $pycif_root_dir:/tmp/CIF/ \
    -v $workdir:/workdir/ \
    -v ${path_to_my_data}:/data/ \
    -it pycif/pycif-ubuntu:0.1

With that example, please remember that the paths in your configuration files pointing to your data will not automatically follow and would need to be updated to /data/...