socca.fitting.methods¶

Sampling methods for Bayesian inference.

This module provides various backends for posterior sampling:

Nested Sampling¶

nautilus: Neural network-accelerated nested sampling
dynesty: Dynamic nested sampling

Monte Carlo Sampling¶

pocomc: Preconditioned Monte Carlo sampling
emcee: Affine-invariant ensemble MCMC
numpyro: NUTS sampler (experimental)

Optimization¶

optimizer: Maximum a posteriori optimization

socca.fitting.methods.run_nautilus(self, log_likelihood, log_prior, prior_transform, checkpoint, resume, getzprior, **kwargs)[source]¶

Run nested sampling using the Nautilus sampler.

Performs Bayesian parameter estimation using neural network- accelerated nested sampling via the Nautilus package. Supports checkpointing, resuming, and optional prior evidence computation.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
prior_transform (callable) – Function that transforms unit hypercube to parameter space.
checkpoint (str or None) – Path to checkpoint file for saving/resuming the sampler state.
resume (bool) – If True and checkpoint file exists, resume from saved state.
getzprior (bool) – If True, run a second nested sampling to estimate the prior evidence for Bayesian model comparison with prior deboosting.
**kwargs (dict) –
Additional keyword arguments passed to nautilus.Sampler and its run method. Common options include:
- nlive/n_live : int, number of live points (default: 1000)
- flive/f_live : float, stopping criterion (default: 0.01)
- discard_explorationbool, discard exploration phase
  samples (default: True)
Set (Attributes)
--------------
sampler (nautilus.Sampler) – The main Nautilus sampler object.
samples (ndarray) – Posterior samples from nested sampling.
logw (ndarray) – Log-weights for each sample.
weights (ndarray) – Normalized importance weights for each sample.
logz (float) – Log-evidence (marginal likelihood) estimate.
sampler_prior (nautilus.Sampler or None) – Prior sampler object if getzprior=True, else None.
logz_prior (float or None) – Prior evidence if getzprior=True, else None.

Notes

Nautilus uses neural networks to learn the iso-likelihood contours, making it efficient for high-dimensional problems. The method prints elapsed time after completion.

References

Lange, J. U., MNRAS, 525, 3181 (2023) Nautilus documentation: https://nautilus-sampler.readthedocs.io/en/latest/

socca.fitting.methods.run_dynesty(self, log_likelihood, log_prior, prior_transform, checkpoint, resume, getzprior, **kwargs)[source]¶

Run nested sampling using the Dynesty sampler.

Performs Bayesian parameter estimation using nested sampling via the Dynesty package. Supports checkpointing, resuming, and optional prior evidence computation.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
prior_transform (callable) – Function that transforms unit hypercube to parameter space.
checkpoint (str or None) – Path to checkpoint file for saving/resuming the sampler state. If None, no checkpointing is performed.
resume (bool) – If True and checkpoint file exists, resume from saved state.
getzprior (bool) – If True, run a second nested sampling to estimate the prior evidence for Bayesian model comparison with prior deboosting.
**kwargs (dict) –
Additional keyword arguments passed to dynesty.NestedSampler and its run_nested method. Common options include:
- nlive : int, number of live points (default: 1000)
- dlogz : float, stopping criterion (default: 0.01)
Set (Attributes)
--------------
sampler (dynesty.NestedSampler) – The main Dynesty sampler object.
samples (ndarray) – Posterior samples from nested sampling.
weights (ndarray) – Importance weights for each sample.
logz (float) – Log-evidence (marginal likelihood) estimate.
sampler_prior (dynesty.NestedSampler or None) – Prior sampler object if getzprior=True, else None.
logz_prior (float or None) – Prior evidence if getzprior=True, else None.

Notes

The method automatically extracts valid kwargs for NestedSampler and run_nested based on their function signatures.

References

Speagle, J. S., MNRAS, 493, 3132 (2020) Dynesty documentation: https://dynesty.readthedocs.io/en/v3.0.0/

socca.fitting.methods.run_pocomc(self, log_likelihood, log_prior, checkpoint, resume, getzprior, **kwargs)[source]¶

Run nested sampling using the pocoMC sampler.

Performs Bayesian parameter estimation using preconditioned Monte Carlo nested sampling via the pocoMC package. Supports checkpointing, resuming, and optional prior evidence computation.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
checkpoint (str or None) – Path prefix for checkpoint files. If None and resume=True, defaults to “run”. Checkpoint files are saved in a directory named “{checkpoint}_pocomc_dump”.
resume (bool) – If True, resume from the latest saved state in the checkpoint directory.
getzprior (bool) – If True, run a second nested sampling to estimate the prior evidence for Bayesian model comparison with prior deboosting.
**kwargs (dict) –
Additional keyword arguments passed to pocomc.Sampler and its run method. Common options include:
- nlive/n_live/n_effectiveint, effective sample size
  (default: 1000)
- n_activeint, number of active particles
  (default: nlive // 2)
- save_everyint, save state every N iterations
  (default: 10)
- seed : int, random seed (default: 0)
- vectorizebool, if True evaluate the likelihood on the
  full batch of active particles at once using JAX vmap, rather than calling it once per particle. Incompatible with pool/ncores and MPI (raises ValueError if combined). (default: False)
Set (Attributes)
--------------
sampler (pocomc.Sampler) – The main pocoMC sampler object.
samples (ndarray) – Posterior samples from nested sampling.
logw (ndarray) – Log-weights for each sample.
weights (ndarray) – Normalized importance weights for each sample.
logz (float) – Log-evidence (marginal likelihood) estimate.
sampler_prior (pocomc.Sampler or None) – Prior sampler object if getzprior=True, else None.
logz_prior (float or None) – Prior evidence if getzprior=True, else None.

Notes

pocoMC uses normalizing flows for preconditioned sampling, making it efficient for complex, multimodal posteriors. Prior distributions must be NumPyro distributions.

References

Karamanis, M., Beutler, F., Peacock, J. A., Nabergoj, D., Seljak, U., MNRAS, 516, 1644 (2022) Karamanis, M., Nabergoj, D., Beutler, F., Peacock, J. A., Seljak, U., arXiv:2207.05660 (2022) pocoMC documentation: https://pocomc.readthedocs.io/en/latest/

socca.fitting.methods.run_numpyro(self, log_likelihood, **kwargs)[source]¶

Run Hamiltonian Monte Carlo sampling using NumPyro’s NUTS.

Performs Bayesian parameter estimation using the No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC), via the NumPyro package.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
**kwargs (dict) –
Additional keyword arguments passed to numpyro.infer.NUTS and numpyro.infer.MCMC. Common options include:
- n_warmup/nwarmup/num_warmupint, number of warmup
  iterations (default: 1000)
- n_samples/nsamples/num_samplesint, number of posterior
  samples (default: 2000)
- seed : int, random seed (default: 0)
Set (Attributes)
--------------
samples (ndarray) – Posterior samples from MCMC, shape (n_samples, n_params).
weights (ndarray) – Uniform weights (all ones) since MCMC samples are unweighted.

Notes

NUTS automatically tunes step size and number of leapfrog steps during the warmup phase. This method does not compute evidence, so it cannot be used for Bayesian model comparison. The method requires parameter priors to be NumPyro distributions.

References

numpyro documentation: https://num.pyro.ai/en/stable/

socca.fitting.methods.run_emcee(self, log_likelihood, log_prior, checkpoint, resume, **kwargs)[source]¶

Run ensemble MCMC sampling using the emcee package.

Performs Bayesian parameter estimation using the affine-invariant ensemble sampler from the emcee package. Supports checkpointing via HDF5 backends and parallelization via MPI or multiprocessing.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
checkpoint (str or None) – Path to the HDF5 backend file for checkpointing. If None and resume=True, defaults to “run.hdf5”.
resume (bool) – If True, resume from the checkpoint file.
**kwargs (dict) –
Additional keyword arguments passed to emcee.EnsembleSampler and its run_mcmc method. Common options include:
- nwalkers : int, number of walkers (default: 2 * ndim)
- nsteps/n_steps/num_stepsint, number of MCMC steps
  (default: 5000). Acts as maximum when converge=True.
- discard/nburn/n_burnint, number of burn-in steps to
  discard (default: 0). When converge=True and not set, auto-set to 2 * max(tau).
- thin : int, thinning factor (default: 1)
- seed : int, random seed (default: None)
- convergebool, if True run until convergence based on
  autocorrelation time estimates (default: False)
- check_everyint, check convergence every N steps
  (default: 100)
- tau_factorfloat, require chain length > tau_factor * tau
  for convergence (default: 50)
- tau_rtolfloat, relative tolerance for tau stability
  (default: 0.01)
- thin_factorfloat, thinning factor applied to tau when checking convergence
  (default: 0.50)
- discard_factorfloat, factor multiplied by tau to determine burn-in discard when converge=True
  (default: 2.0)
Set (Attributes)
--------------
sampler (emcee.EnsembleSampler) – The emcee sampler object.
samples (ndarray) – Posterior samples, shape (n_samples, n_params).
weights (ndarray) – Uniform weights (all ones) since MCMC samples are unweighted.
tau (ndarray or None) – Integrated autocorrelation time per parameter, if computed.

Notes

emcee uses an affine-invariant ensemble of walkers to explore the posterior. It does not compute evidence, so it cannot be used for Bayesian model comparison. Initial walker positions are drawn from the prior distributions.

When converge=True, the sampler runs in batches and checks the integrated autocorrelation time after each batch. Convergence is declared when (1) the chain is longer than tau_factor * tau for all parameters, and (2) the tau estimate has stabilized (relative change < tau_rtol). The burn-in (discard) is then auto-set to 2 * max(tau) unless explicitly provided.

References

Foreman-Mackey, D., Hogg, D. W., Lang, D., Goodman, J., PASP, 125, 306 (2013) emcee documentation: https://emcee.readthedocs.io/en/stable/

socca.fitting.methods.run_optimizer(self, pinits, **kwargs)[source]¶

Run maximum likelihood optimization using scipy.optimize.

Finds the maximum likelihood estimate (MLE) or maximum a posteriori (MAP) estimate using L-BFGS-B optimization with automatic differentiation via JAX.

Parameters:

pinits (array_like, str) –
Initial parameter values. Can be:
- array_likespecific initial values in parameter space
  (will be transformed to unit hypercube internally)
- ”median”start from the median of each prior (0.5 in
  unit hypercube)
- ”random” : start from random values in unit hypercube
**kwargs (dict) –
Additional keyword arguments passed to scipy.optimize.minimize. Common options include:
- tol : float, tolerance for termination
- options : dict, solver-specific options
Set (Attributes)
--------------
results (scipy.optimize.OptimizeResult) –
Optimization result object containing:
- x : optimal parameters in unit hypercube
- fun : negative log-likelihood at optimum
- success : whether optimization succeeded
- message : description of termination cause

Raises:

ValueError – If pinits is a string other than “median” or “random”.

Notes

The optimization is performed in the unit hypercube space with bounds [0, 1] for each parameter. The objective function is the negative log-likelihood, and gradients are computed automatically using JAX. The L-BFGS-B method is used for box-constrained optimization.

References

Byrd, R. H,, Lu, P., Nocedal, J., SIAM J. Sci. Statist. Comput 16, 1190 (1995) Zhu, C., Byrd, R. H., Nocedal, J. TOMS, 23, 550 (1997) scipy.optimize.minimize documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html L-BFGS-B documentation: https://docs.scipy.org/doc/scipy/reference/optimize.minimize-lbfgsb.html#optimize-minimize-lbfgsb

Nested Sampling¶

nautilus¶

Nautilus neural network-accelerated nested sampling backend.

socca.fitting.methods.nautilus.run_nautilus(self, log_likelihood, log_prior, prior_transform, checkpoint, resume, getzprior, **kwargs)[source]¶

Run nested sampling using the Nautilus sampler.

Performs Bayesian parameter estimation using neural network- accelerated nested sampling via the Nautilus package. Supports checkpointing, resuming, and optional prior evidence computation.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
prior_transform (callable) – Function that transforms unit hypercube to parameter space.
checkpoint (str or None) – Path to checkpoint file for saving/resuming the sampler state.
resume (bool) – If True and checkpoint file exists, resume from saved state.
getzprior (bool) – If True, run a second nested sampling to estimate the prior evidence for Bayesian model comparison with prior deboosting.
**kwargs (dict) –
Additional keyword arguments passed to nautilus.Sampler and its run method. Common options include:
- nlive/n_live : int, number of live points (default: 1000)
- flive/f_live : float, stopping criterion (default: 0.01)
- discard_explorationbool, discard exploration phase
  samples (default: True)
Set (Attributes)
--------------
sampler (nautilus.Sampler) – The main Nautilus sampler object.
samples (ndarray) – Posterior samples from nested sampling.
logw (ndarray) – Log-weights for each sample.
weights (ndarray) – Normalized importance weights for each sample.
logz (float) – Log-evidence (marginal likelihood) estimate.
sampler_prior (nautilus.Sampler or None) – Prior sampler object if getzprior=True, else None.
logz_prior (float or None) – Prior evidence if getzprior=True, else None.

Notes

Nautilus uses neural networks to learn the iso-likelihood contours, making it efficient for high-dimensional problems. The method prints elapsed time after completion.

References

Lange, J. U., MNRAS, 525, 3181 (2023) Nautilus documentation: https://nautilus-sampler.readthedocs.io/en/latest/

dynesty¶

Dynesty nested sampling backend.

socca.fitting.methods.dynesty.run_dynesty(self, log_likelihood, log_prior, prior_transform, checkpoint, resume, getzprior, **kwargs)[source]¶

Run nested sampling using the Dynesty sampler.

Performs Bayesian parameter estimation using nested sampling via the Dynesty package. Supports checkpointing, resuming, and optional prior evidence computation.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
prior_transform (callable) – Function that transforms unit hypercube to parameter space.
checkpoint (str or None) – Path to checkpoint file for saving/resuming the sampler state. If None, no checkpointing is performed.
resume (bool) – If True and checkpoint file exists, resume from saved state.
getzprior (bool) – If True, run a second nested sampling to estimate the prior evidence for Bayesian model comparison with prior deboosting.
**kwargs (dict) –
Additional keyword arguments passed to dynesty.NestedSampler and its run_nested method. Common options include:
- nlive : int, number of live points (default: 1000)
- dlogz : float, stopping criterion (default: 0.01)
Set (Attributes)
--------------
sampler (dynesty.NestedSampler) – The main Dynesty sampler object.
samples (ndarray) – Posterior samples from nested sampling.
weights (ndarray) – Importance weights for each sample.
logz (float) – Log-evidence (marginal likelihood) estimate.
sampler_prior (dynesty.NestedSampler or None) – Prior sampler object if getzprior=True, else None.
logz_prior (float or None) – Prior evidence if getzprior=True, else None.

Notes

The method automatically extracts valid kwargs for NestedSampler and run_nested based on their function signatures.

References

Speagle, J. S., MNRAS, 493, 3132 (2020) Dynesty documentation: https://dynesty.readthedocs.io/en/v3.0.0/

Monte Carlo Sampling¶

pocomc¶

pocoMC preconditioned Monte Carlo sampling backend.

socca.fitting.methods.pocomc.run_pocomc(self, log_likelihood, log_prior, checkpoint, resume, getzprior, **kwargs)[source]¶

Run nested sampling using the pocoMC sampler.

Performs Bayesian parameter estimation using preconditioned Monte Carlo nested sampling via the pocoMC package. Supports checkpointing, resuming, and optional prior evidence computation.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
checkpoint (str or None) – Path prefix for checkpoint files. If None and resume=True, defaults to “run”. Checkpoint files are saved in a directory named “{checkpoint}_pocomc_dump”.
resume (bool) – If True, resume from the latest saved state in the checkpoint directory.
getzprior (bool) – If True, run a second nested sampling to estimate the prior evidence for Bayesian model comparison with prior deboosting.
**kwargs (dict) –
Additional keyword arguments passed to pocomc.Sampler and its run method. Common options include:
- nlive/n_live/n_effectiveint, effective sample size
  (default: 1000)
- n_activeint, number of active particles
  (default: nlive // 2)
- save_everyint, save state every N iterations
  (default: 10)
- seed : int, random seed (default: 0)
- vectorizebool, if True evaluate the likelihood on the
  full batch of active particles at once using JAX vmap, rather than calling it once per particle. Incompatible with pool/ncores and MPI (raises ValueError if combined). (default: False)
Set (Attributes)
--------------
sampler (pocomc.Sampler) – The main pocoMC sampler object.
samples (ndarray) – Posterior samples from nested sampling.
logw (ndarray) – Log-weights for each sample.
weights (ndarray) – Normalized importance weights for each sample.
logz (float) – Log-evidence (marginal likelihood) estimate.
sampler_prior (pocomc.Sampler or None) – Prior sampler object if getzprior=True, else None.
logz_prior (float or None) – Prior evidence if getzprior=True, else None.

Notes

pocoMC uses normalizing flows for preconditioned sampling, making it efficient for complex, multimodal posteriors. Prior distributions must be NumPyro distributions.

References

emcee¶

emcee affine-invariant ensemble MCMC backend.

socca.fitting.methods.emcee.run_emcee(self, log_likelihood, log_prior, checkpoint, resume, **kwargs)[source]¶

Run ensemble MCMC sampling using the emcee package.

Performs Bayesian parameter estimation using the affine-invariant ensemble sampler from the emcee package. Supports checkpointing via HDF5 backends and parallelization via MPI or multiprocessing.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
log_prior (callable) – Function that computes the log-prior given parameters.
checkpoint (str or None) – Path to the HDF5 backend file for checkpointing. If None and resume=True, defaults to “run.hdf5”.
resume (bool) – If True, resume from the checkpoint file.
**kwargs (dict) –
Additional keyword arguments passed to emcee.EnsembleSampler and its run_mcmc method. Common options include:
- nwalkers : int, number of walkers (default: 2 * ndim)
- nsteps/n_steps/num_stepsint, number of MCMC steps
  (default: 5000). Acts as maximum when converge=True.
- discard/nburn/n_burnint, number of burn-in steps to
  discard (default: 0). When converge=True and not set, auto-set to 2 * max(tau).
- thin : int, thinning factor (default: 1)
- seed : int, random seed (default: None)
- convergebool, if True run until convergence based on
  autocorrelation time estimates (default: False)
- check_everyint, check convergence every N steps
  (default: 100)
- tau_factorfloat, require chain length > tau_factor * tau
  for convergence (default: 50)
- tau_rtolfloat, relative tolerance for tau stability
  (default: 0.01)
- thin_factorfloat, thinning factor applied to tau when checking convergence
  (default: 0.50)
- discard_factorfloat, factor multiplied by tau to determine burn-in discard when converge=True
  (default: 2.0)
Set (Attributes)
--------------
sampler (emcee.EnsembleSampler) – The emcee sampler object.
samples (ndarray) – Posterior samples, shape (n_samples, n_params).
weights (ndarray) – Uniform weights (all ones) since MCMC samples are unweighted.
tau (ndarray or None) – Integrated autocorrelation time per parameter, if computed.

Notes

References

Foreman-Mackey, D., Hogg, D. W., Lang, D., Goodman, J., PASP, 125, 306 (2013) emcee documentation: https://emcee.readthedocs.io/en/stable/

numpyro¶

NumPyro NUTS Hamiltonian Monte Carlo backend.

socca.fitting.methods.numpyro.run_numpyro(self, log_likelihood, **kwargs)[source]¶

Run Hamiltonian Monte Carlo sampling using NumPyro’s NUTS.

Performs Bayesian parameter estimation using the No-U-Turn Sampler (NUTS), a variant of Hamiltonian Monte Carlo (HMC), via the NumPyro package.

Parameters:

log_likelihood (callable) – Function that computes the log-likelihood given parameters.
**kwargs (dict) –
Additional keyword arguments passed to numpyro.infer.NUTS and numpyro.infer.MCMC. Common options include:
- n_warmup/nwarmup/num_warmupint, number of warmup
  iterations (default: 1000)
- n_samples/nsamples/num_samplesint, number of posterior
  samples (default: 2000)
- seed : int, random seed (default: 0)
Set (Attributes)
--------------
samples (ndarray) – Posterior samples from MCMC, shape (n_samples, n_params).
weights (ndarray) – Uniform weights (all ones) since MCMC samples are unweighted.

Notes

References

numpyro documentation: https://num.pyro.ai/en/stable/

Optimization¶

optimizer¶

L-BFGS-B maximum likelihood and MAP optimization backend.

socca.fitting.methods.optimizer.run_optimizer(self, pinits, **kwargs)[source]¶

Run maximum likelihood optimization using scipy.optimize.

Finds the maximum likelihood estimate (MLE) or maximum a posteriori (MAP) estimate using L-BFGS-B optimization with automatic differentiation via JAX.

Parameters:

pinits (array_like, str) –
Initial parameter values. Can be:
- array_likespecific initial values in parameter space
  (will be transformed to unit hypercube internally)
- ”median”start from the median of each prior (0.5 in
  unit hypercube)
- ”random” : start from random values in unit hypercube
**kwargs (dict) –
Additional keyword arguments passed to scipy.optimize.minimize. Common options include:
- tol : float, tolerance for termination
- options : dict, solver-specific options
Set (Attributes)
--------------
results (scipy.optimize.OptimizeResult) –
Optimization result object containing:
- x : optimal parameters in unit hypercube
- fun : negative log-likelihood at optimum
- success : whether optimization succeeded
- message : description of termination cause

Raises:

ValueError – If pinits is a string other than “median” or “random”.

Notes

References

Utilities¶

Utility functions for sampler output processing.

socca.fitting.methods.utils.get_imp_weights(logw, logz=None)[source]¶

Compute importance weights from log-weights and log-evidence.

Converts log-weights to normalized importance weights using the log-evidence for numerical stability. The weights are normalized such that they sum to 1.0.

Parameters:

logw (array_like) – Log-weights from importance-weighted sampling.
logz (float or array_like, optional) – Log-evidence value(s). If None, uses the maximum log-weight. If not None and not iterable, converts to a single-element list. Default is None.

Returns:

weights – Normalized importance weights in linear space.

Return type:

ndarray

Notes

The importance weights are computed as:

\[w_i = \exp[(\log w_i - \log Z) - \log\sum_j \exp(\log w_j - \log Z)]\]

where \(\log Z\) is the log-evidence (logz[-1]).