Parallelization

socca supports parallel likelihood evaluation to speed up sampling runs. Two parallelization modes are available: multi-process parallelization on a single machine, and MPI-based distributed parallelization across multiple nodes. Both modes are supported by the nautilus, dynesty, and pocomc sampling backends.

Important

Parallelization is most beneficial when the likelihood evaluation is computationally expensive (e.g., involving large images, complex models, or PSF convolution on fine grids). If the likelihood is fast to evaluate, the overhead of inter-process communication can outweigh the speedup, resulting in slower runs than serial execution. This is especially true for dynesty and pocomc, where the sampler frequently synchronizes with the worker pool. nautilus is less affected due to its batch-oriented point proposal, but even in that case the gains diminish for cheap likelihoods, resulting in a sub-linear scaling with the number of processes/cores.

As a rule of thumb, parallelization pays off when a single likelihood call takes at least a few milliseconds.

Multi-process parallelization

The simplest way to parallelize a sampling run is to pass the ncores argument to the run() method. This spawns a pool of worker processes on the local machine, each evaluating the likelihood function independently:

>>> fit = socca.fitter(img=img, mod=mod)
>>> fit.run(method='nautilus', ncores=4)

Under the hood, this creates a MultiPool from socca.pool.mp, using loky for robust process management. Workers do not re-import __main__, so no if __name__ == "__main__": guard should be needed in user scripts.

The ncores argument is accepted by all three sampling backends:

>>> fit.run(method='nautilus', ncores=4)
>>> fit.run(method='dynesty', ncores=4, nlive=500)
>>> fit.run(method='pocomc', ncores=4)

Alternatively, you can pass a pre-built pool object via the pool argument. An integer value for pool is treated identically to ncores:

>>> fit.run(method='nautilus', pool=4)  # equivalent to ncores=4

Please note that specifying ncores and pool at the same time will raise a ValueError.

MPI parallelization

For distributed computing across multiple nodes (e.g., on HPC clusters), socca supports MPI-based parallelization via mpi4py. MPI parallelization is activated automatically when the script is launched with mpirun or mpiexec:

mpirun -np 8 python my_fit_script.py

No changes to the Python script are required. socca detects the MPI environment by checking for standard environment variables set by common MPI launchers (Open MPI, MPICH, Intel MPI, SLURM). When multiple MPI ranks are detected, an MPIPool is created automatically and the likelihood evaluation is distributed across all ranks.

The rank-0 process (master) drives the sampler, while worker ranks enter a loop that receives batches of likelihood evaluations. After sampling completes, the results (samples, weights, logw, logz, logz_prior) are broadcast to all ranks.

Installing MPI support

MPI support requires mpi4py:

pip install mpi4py

If mpi4py is not installed but the process is launched under an MPI environment, socca will raise an ImportError with installation instructions.

Sampler compatibility

While all three backends support MPI, nautilus is the recommended choice for distributed runs. Dynesty’s sequential point proposal does not benefit as much from MPI parallelization, and a warning is printed when this combination is detected:

Dynesty might not benefit from MPI parallelization due to its sequential
point proposal. Consider using method='nautilus' for MPI runs.

NumPyro NUTS

The numpyro backend does not support MPI parallelization (an error is raised if attempted). However, the ncores or pool arguments can be passed to control the number of parallel MCMC chains:

>>> fit.run(method='numpyro', ncores=4)  # runs 4 chains in parallel

MPI-safe post-processing

When running under MPI, several post-processing methods are restricted to rank 0 to avoid redundant output or file conflicts. These include parameters(), dump(), savemodel(), and bmc(). On non-root ranks, these methods return None silently.

Summary

Method

Multi-process

MPI

Notes

'nautilus'

ncores / pool

automatic

Recommended for MPI runs

'dynesty'

ncores / pool

automatic

MPI supported, but might not be efficient

'pocomc'

ncores / pool

automatic

MPI supported, but might not be efficient

'numpyro'

ncores (chains)

not supported

ncores sets number of chains

'optimizer'

not supported

not supported

Single-process only