Parallelization¶

socca supports parallel likelihood evaluation to speed up sampling runs. Two parallelization modes are available: multi-process parallelization on a single machine, and MPI-based distributed parallelization across multiple nodes. Both modes are supported by the nautilus, dynesty, and pocomc sampling backends.

Important

Parallelization is most beneficial when the likelihood evaluation is computationally expensive (e.g., involving large images, complex models, or PSF convolution on fine grids). If the likelihood is fast to evaluate, the overhead of inter-process communication can outweigh the speedup, resulting in slower runs than serial execution. This is especially true for dynesty and pocomc, where the sampler frequently synchronizes with the worker pool. nautilus is less affected due to its batch-oriented point proposal, but even in that case the gains diminish for cheap likelihoods, resulting in a sub-linear scaling with the number of processes/cores.

As a rule of thumb, parallelization pays off when a single likelihood call takes at least a few milliseconds.

Multi-process parallelization¶

The simplest way to parallelize a sampling run is to pass the ncores argument to the run() method. This spawns a pool of worker processes on the local machine, each evaluating the likelihood function independently:

>>> fit = socca.fitter(img=img, mod=mod)
>>> fit.run(method='nautilus', ncores=4)

Under the hood, this creates a MultiPool from socca.pool.mp, using loky for robust process management. Workers do not re-import __main__, so no if __name__ == "__main__": guard should be needed in user scripts.

The ncores argument is accepted by all three sampling backends:

>>> fit.run(method='nautilus', ncores=4)
>>> fit.run(method='dynesty', ncores=4, nlive=500)
>>> fit.run(method='pocomc', ncores=4)

Alternatively, you can pass a pre-built pool object via the pool argument. An integer value for pool is treated identically to ncores:

>>> fit.run(method='nautilus', pool=4)  # equivalent to ncores=4

Please note that specifying ncores and pool at the same time will raise a ValueError.

MPI parallelization¶

For distributed computing across multiple nodes (e.g., on HPC clusters), socca supports MPI-based parallelization via mpi4py. MPI parallelization is activated automatically when the script is launched with mpirun or mpiexec:

mpirun -np 8 python my_fit_script.py

No changes to the Python script are required. socca detects the MPI environment by checking for standard environment variables set by common MPI launchers (Open MPI, MPICH, Intel MPI, SLURM). When multiple MPI ranks are detected, an MPIPool is created automatically and the likelihood evaluation is distributed across all ranks.

The rank-0 process (master) drives the sampler, while worker ranks enter a loop that receives batches of likelihood evaluations. After sampling completes, the results (samples, weights, logw, logz, logz_prior) are broadcast to all ranks.

Installing MPI support¶

MPI support requires mpi4py:

pip install mpi4py

If mpi4py is not installed but the process is launched under an MPI environment, socca will raise an ImportError with installation instructions.

Sampler compatibility¶

While all three backends support MPI, nautilus is the recommended choice for distributed runs. Dynesty’s sequential point proposal does not benefit as much from MPI parallelization, and a warning is printed when this combination is detected:

Dynesty might not benefit from MPI parallelization due to its sequential
point proposal. Consider using method='nautilus' for MPI runs.

JAX vectorization (pocomc only)¶

For pocomc, an alternative to process-based parallelization is JAX vectorization via vectorize=True. Instead of dispatching one likelihood call per particle to a pool of workers, the entire batch of active particles is evaluated in a single jax.vmap-compiled call:

>>> fit.run(method='pocomc', vectorize=True)

This eliminates the Python-level overhead of N separate likelihood dispatches and leverages SIMD instructions (or GPU parallelism if available), without spawning any worker processes. It is particularly effective when the likelihood is a JAX-compiled function, as in socca.

Important

vectorize=True is incompatible with pool/ncores and with MPI. pocomc’s internal vectorization path bypasses the pool entirely, so combining them raises a ValueError. Use one or the other:

Single node, JAX model → vectorize=True
Multi-node cluster → MPI (without vectorize)

NumPyro NUTS¶

The numpyro backend does not support MPI parallelization (an error is raised if attempted). However, the ncores or pool arguments can be passed to control the number of parallel MCMC chains:

>>> fit.run(method='numpyro', ncores=4)  # runs 4 chains in parallel

MPI-safe post-processing¶

When running under MPI, several post-processing methods are restricted to rank 0 to avoid redundant output or file conflicts. These include parameters(), dump(), savemodel(), and bmc(). On non-root ranks, these methods return None silently.

Summary¶

Method	Multi-process	MPI	Notes
`'nautilus'`	`ncores` / `pool`	automatic	Recommended for MPI runs
`'dynesty'`	`ncores` / `pool`	automatic	MPI supported, but might not be efficient
`'pocomc'`	`ncores` / `pool`	automatic	MPI supported, but might not be efficient; also supports `vectorize=True` (incompatible with pool/MPI)
`'numpyro'`	`ncores` (chains)	not supported	`ncores` sets number of chains
`'optimizer'`	not supported	not supported	Single-process only