Advanced users

Optimised performance using Numba (the use_numba flag)

tl;dr - should I set use_numba to True or False?

If performance is important (e.g. if running on HPC), set it to True. Otherwise, you are OK without.

If you need to debug your run, then keep use_numba False.

Why Numba?

MONARCHS is written in Python, due to its ease of use, portability, suitability for use on Windows, Mac and Linux, and myriad other reasons. However, one of the drawbacks of Python is that it is slow, compared to low-level languages such as C and Fortran. A compromise is to make use of Numba, a just-in-time compiler for Python. This significantly bridges the performance gap between these languages, at the cost of somewhat more complex code.

In many cases, this is “free”. However, since we need to use hybrd from MINPACK to solve the heat equation, and the standard Python implmentation is not Numba compatible (scipy.optimize.fsolve), we instead make use of NumbaMinpack, a Python library that calls MINPACK from a compiled Fortran source using Numba’s ctypes compatability. This makes the resulting source code a little more complex.

Numba also has great support for parallel Python via OpenMP, which we use in MONARCHS. Since the single-column physics does not affect other columns, this approach is very efficient.

What are the drawbacks?

This makes code development more complex since Numba requires strict static typing, and only supports a subset of inbuilt Python and numpy functions. Important libraries such as scipy are not yet supported. The code has been written to try and hide away most of this where possible, but some design choices were made during the development of MONARCHS that are inelegant or un-Pythonic, to accommodate the use of Numba.

In this vein, feedback or suggestions on how to improve the readability of the MONARCHS source code are appreciated.

Additionally, Numba code is significantly harder to debug, since it doesn’t use the normal Python stack trace. A compromise here is to initially run your code with Numba, ensuring that the model dumping flags are enabled, and then after the code crashes, run the model from this dump with parallel = False and use_numba = False in your runscript to debug.

What parts of the code are actually different if using Numba?

  • timestep_loop and all functions called by timestep_loop or deeper are called by Numba’s jit function, equivalent to decorating them using the @jit decorator. This is controlled by the jit_modules function in monarchs.core.configuration.

  • The core building block of MONARCHS, the IceShelf class, is converted to a numba.experimental.jitclass. This is a Numba-compatible version of the inbuilt Python class. Using a jitclass requires type specification, which can be found in the spec variable in core.iceshelf_class. If adding new variables to the MONARCHS code, ensure that you add them to spec if you want the code to run with Numba. Arrays can be specified using e.g. float64[:,:] for a 2D array of dtype float64.

  • the model grid is converted from a List to a numba.typed.List. This ensures compatibility but doesn’t make much difference practically.

  • Different versions of the firn heat equation, lake surface energy balance, and lid heat equation/surface energy balance solver functions are selected. This is because of the different input formatting requirements of the Scipy and Numba implementations. In the non-Numba implementation, these are solved using scipy.optimize.fsolve. In the Numba implementation, an external library NumbaMinpack (developed by Nicholas Wogan) is used to call the Fortran MINPACK library’s hybrd function.

Are there any differences between the two versions?

The model flow and algorithms are exactly the same in the non-Numba and Numba versions. However, there may be some small differences in the numerics, which can evolve over time. In general, the output of the two versions is extremely close, with no significant divergence observed over multi-year runs. Running with MPI —————-

It is possible to run MONARCHS across multiple nodes on HPC systems using MPI. This is achieved without the need for significant MPI code using an mpi4py.futures Pool. This allows us to spawn MPI processes as and when they are needed, in a similar way to how a multiprocessing Pool works, and allows for the code to be mostly free of complex MPI directives and boilerplate. This does not have a significant overhead compared to using multiprocessing even on single-node systems.

Currently, it is not possible to use MPI with Numba. This is current WIP but will not be ready on release.

Running MONARCHS with MPI may be slightly different to how you may have used MPI in the past, since it uses this pool/spawning approach. You should run with only a single MPI process, i.e. do:

mpirun -n 1 python run_MONARCHS.py -i <model_setup_path>

The number of MPI processes is controlled by the model setup variable cores. If running on HPC, you likely want to use cores = 'all' to ensure that you use all of the cores you have available in your job.

Attempting to do mpirun with more than 1 process will result in several processes running the whole code, which will result in an attempt to spawn <cores> processes for each of the N processes you create with the call to mpirun.

Currently this implementation does not work on ARCHER2 - this is a known issue and is being worked on.