openscm_calibration.emcee_utils#

Helpers for emcee

Classes:

Name	Description
`AutoCorrelationInfo`	Information about auto-correlation within an MCMC chain
`ChainProgressInfo`	Information about the progress of MCMC chains

Functions:

Name	Description
`get_acceptance_fractions`	Get acceptance fraction in each chain of an MCMC ensemble of chains
`get_autocorrelation_info`	Get info about autocorrelation in chains
`get_labelled_chain_data`	Get labelled chain data
`get_neg_log_prior`	Get a function for calculating the negative log prior
`get_start`	Get starting point for emcee sampling
`neg_log_info`	Get negative log probability and likelihood information

AutoCorrelationInfo #

Information about auto-correlation within an MCMC chain

Methods:

Name	Description
`any_non_nan_tau`	Check if any of the tau information is not nan

Attributes:

Name	Type	Description
`converged`	`NDArray[bool]`	Whether, based on `convergence_ratio`, the chains for each parameter have converged
`convergence_ratio`	`float`	Convergence ratio used to assess convergence.
`steps_post_burnin`	`int`	The number of steps in the chains post burn-in.
`tau`	`NDArray[float64]`	Auto-correlation for each parameter

Source code in src/openscm_calibration/emcee_utils.py

@define
class AutoCorrelationInfo:
    """Information about auto-correlation within an MCMC chain"""

    steps_post_burnin: int
    """The number of steps in the chains post burn-in."""

    tau: nptype.NDArray[np.float64]
    """
    Auto-correlation for each parameter

    I.e. the first value is the auto-correlation for the first parameter,
    second value is the auto-correlation for the second parameter,
    ...
    nth value is the auto-correlation for the nth parameter,
    ...
    """

    convergence_ratio: float
    """
    Convergence ratio used to assess convergence.

    For details, see
    [`get_autocorrelation_info`][openscm_calibration.emcee_utils.get_autocorrelation_info].
    """

    converged: nptype.NDArray[np.bool]
    """
    Whether, based on `convergence_ratio`, the chains for each parameter have converged

    The first value is whether the chain for the first parameter has converged,
    the second value is whether the chain for the second parameter has converged,
    ...
    nth value is whether the chain for the nth parameter has converged,
    ...
    """

    def any_non_nan_tau(self) -> bool:
        """
        Check if any of the tau information is not nan

        Returns
        -------
        :
            Whether any of the tau values are non-nan
        """
        return bool(np.any(np.logical_not(np.isnan(self.tau))))

converged `instance-attribute` #

converged: NDArray[bool]

Whether, based on convergence_ratio, the chains for each parameter have converged

The first value is whether the chain for the first parameter has converged, the second value is whether the chain for the second parameter has converged, ... nth value is whether the chain for the nth parameter has converged, ...

convergence_ratio `instance-attribute` #

convergence_ratio: float

Convergence ratio used to assess convergence.

For details, see get_autocorrelation_info.

steps_post_burnin `instance-attribute` #

steps_post_burnin: int

The number of steps in the chains post burn-in.

tau `instance-attribute` #

tau: NDArray[float64]

Auto-correlation for each parameter

I.e. the first value is the auto-correlation for the first parameter, second value is the auto-correlation for the second parameter, ... nth value is the auto-correlation for the nth parameter, ...

any_non_nan_tau #

any_non_nan_tau() -> bool

Check if any of the tau information is not nan

Returns:

Type	Description
`bool`	Whether any of the tau values are non-nan

Source code in src/openscm_calibration/emcee_utils.py

def any_non_nan_tau(self) -> bool:
    """
    Check if any of the tau information is not nan

    Returns
    -------
    :
        Whether any of the tau values are non-nan
    """
    return bool(np.any(np.logical_not(np.isnan(self.tau))))

ChainProgressInfo #

Information about the progress of MCMC chains

Attributes:

Name	Type	Description
`acceptance_fraction`	`float`	Acceptance fraction of proposed steps
`steps`	`int`	Number of steps in the chain
`steps_post_burnin`	`int`	Number of steps after the chain's burn-in period

Source code in src/openscm_calibration/emcee_utils.py

@define
class ChainProgressInfo:
    """Information about the progress of MCMC chains"""

    steps: int
    """Number of steps in the chain"""

    steps_post_burnin: int
    """Number of steps after the chain's burn-in period"""

    acceptance_fraction: float
    """Acceptance fraction of proposed steps"""

acceptance_fraction `instance-attribute` #

acceptance_fraction: float

Acceptance fraction of proposed steps

steps `instance-attribute` #

steps: int

Number of steps in the chain

steps_post_burnin `instance-attribute` #

steps_post_burnin: int

Number of steps after the chain's burn-in period

get_acceptance_fractions #

get_acceptance_fractions(
    chains: NDArray[floating[Any] | integer[Any]],
) -> NDArray[float64]

Get acceptance fraction in each chain of an MCMC ensemble of chains

Parameters:

Name	Type	Description	Default
`chains`	`NDArray[floating[Any] \| integer[Any]]`	Chains. We expected that the the axes are ["step", "chain", "parameter"]	required

Returns:

Type	Description
`Acceptance fraction in each chain`

Source code in src/openscm_calibration/emcee_utils.py

def get_acceptance_fractions(
    chains: nptype.NDArray[np.floating[Any] | np.integer[Any]],
) -> nptype.NDArray[np.float64]:
    """
    Get acceptance fraction in each chain of an MCMC ensemble of chains

    Parameters
    ----------
    chains
        Chains. We expected that the the axes are ["step", "chain",
        "parameter"]

    Returns
    -------
        Acceptance fraction in each chain
    """
    # This is complicated, in short:
    # 1. find differences between steps across all calibration parameters
    # 2. check, in each chain, if there is any difference in any of the
    #   parameter values (where this happens, it means that the step was
    #   accepted)
    # 3. sum up the number of accepted steps in each chain
    accepted: nptype.NDArray[np.int_] = np.sum(
        np.any(np.diff(chains, axis=0), axis=2), axis=0
    )
    n_proposals = chains.shape[0] - 1  # first step isn't a proposal
    acceptance_fractions = accepted / np.float64(n_proposals)

    return acceptance_fractions

get_autocorrelation_info #

get_autocorrelation_info(
    inp: Backend | EnsembleSampler,
    burnin: int,
    thin: int = 1,
    autocorr_tol: int = 0,
    convergence_ratio: float = 50,
) -> AutoCorrelationInfo

Get info about autocorrelation in chains

Parameters:

Name	Type	Description	Default
`inp`	`Backend \| EnsembleSampler`	Object of which to check autocorrelation	required
`burnin`	`int`	Number of iterations to treat as burn-in	required
`thin`	`int`	Thinning to apply to the chains. Emcee handles this such that the returned output is in steps, not thinned steps.	`1`
`autocorr_tol`	`int`	Tolerance for auto-correlation calculations. Set to zero to force calculation to never fail	`0`
`convergence_ratio`	`float`	Convergence ratio to apply when checking for convergence If the number of iterations (excluding burn-in) is greater than `convergence_ratio` multiplied by the autocorrelation (averaged over all the chains), we assume the chain has converged.	`50`

Returns:

Type	Description
`AutoCorrelationInfo`	Results of calculation

Source code in src/openscm_calibration/emcee_utils.py

def get_autocorrelation_info(
    inp: emcee.backends.Backend | emcee.ensemble.EnsembleSampler,
    burnin: int,
    thin: int = 1,
    autocorr_tol: int = 0,
    convergence_ratio: float = 50,
) -> AutoCorrelationInfo:
    """
    Get info about autocorrelation in chains

    Parameters
    ----------
    inp
        Object of which to check autocorrelation

    burnin
        Number of iterations to treat as burn-in

    thin
        Thinning to apply to the chains.

        Emcee handles this such that the returned output is in steps,
        not thinned steps.

    autocorr_tol
        Tolerance for auto-correlation calculations.

        Set to zero to force calculation to never fail

    convergence_ratio
        Convergence ratio to apply when checking for convergence

        If the number of iterations (excluding burn-in)
        is greater than `convergence_ratio`
        multiplied by the autocorrelation (averaged over all the chains),
        we assume the chain has converged.

    Returns
    -------
    :
        Results of calculation
    """
    tau = inp.get_autocorr_time(discard=burnin, tol=autocorr_tol, thin=thin)

    converged = inp.iteration - burnin > convergence_ratio * tau

    out = AutoCorrelationInfo(
        steps_post_burnin=inp.iteration - burnin,
        tau=tau,
        convergence_ratio=convergence_ratio,
        converged=converged,
    )

    return out

get_labelled_chain_data #

get_labelled_chain_data(
    inp: Backend | EnsembleSampler,
    parameter_order: tuple[str, ...],
    neg_log_likelihood_name: str | None = None,
    burnin: int = 0,
    thin: int = 0,
) -> dict[str, NDArray[floating[Any] | integer[Any]]]

Get labelled chain data

Parameters:

Name	Type	Description	Default
`inp`	`Backend \| EnsembleSampler`	Object from which to plot the state	required
`parameter_order`	`tuple[str, ...]`	Order of model parameters. This must match the order used by `inp`.	required
`neg_log_likelihood_name`	`str \| None`	Name to use for the negative log likelihood data. If not provided, negative log likelihood information is not returned.	`None`
`burnin`	`int`	Number of iterations to treat as burn in	`0`
`thin`	`int`	Thinning to use when sampling the chains	`0`

Returns:

Type	Description
`dict[str, NDArray[floating[Any] \| integer[Any]]]`	Chain data, labelled with parameter names and, if requested, `neg_log_likelihood_name`

Source code in src/openscm_calibration/emcee_utils.py

def get_labelled_chain_data(
    inp: emcee.backends.Backend | emcee.ensemble.EnsembleSampler,
    parameter_order: tuple[str, ...],
    neg_log_likelihood_name: str | None = None,
    burnin: int = 0,
    thin: int = 0,
) -> dict[str, np.typing.NDArray[np.floating[Any] | np.integer[Any]]]:
    """
    Get labelled chain data

    Parameters
    ----------
    inp
        Object from which to plot the state

    parameter_order
        Order of model parameters. This must match the order used by  `inp`.

    neg_log_likelihood_name
        Name to use for the negative log likelihood data.

        If not provided, negative log likelihood information is not returned.

    burnin
        Number of iterations to treat as burn in

    thin
        Thinning to use when sampling the chains

    Returns
    -------
    :
        Chain data, labelled with parameter names
        and, if requested, `neg_log_likelihood_name`
    """
    all_samples = inp.get_chain(discard=burnin, thin=thin)

    out = {para: all_samples[:, :, i] for i, para in enumerate(parameter_order)}

    if neg_log_likelihood_name:
        all_neg_ll = inp.get_log_prob(discard=burnin, thin=thin)
        out[neg_log_likelihood_name] = all_neg_ll

    return out

get_neg_log_prior #

get_neg_log_prior(
    parameter_order: ParameterOrder,
    kind: Literal["uniform"] = "uniform",
) -> SupportsLikelihoodCalculation

Get a function for calculating the negative log prior

Parameters:

Name	Type	Description	Default
`parameter_order`	`ParameterOrder`	The parameters being calibrated	required
`kind`	`Literal['uniform']`	The kind of log prior to create. Options: "uniform": a uniform prior within the bounds for each parameter	`'uniform'`

Returns:

Type	Description
`SupportsLikelihoodCalculation`	Function that, given a parameter vector, returns the negative log prior

Source code in src/openscm_calibration/emcee_utils.py

def get_neg_log_prior(
    parameter_order: ParameterOrder,
    kind: Literal["uniform"] = "uniform",
) -> SupportsLikelihoodCalculation:
    """
    Get a function for calculating the negative log prior

    Parameters
    ----------
    parameter_order
        The parameters being calibrated

    kind
        The kind of log prior to create.

        Options:

        - "uniform": a uniform prior within the bounds for each parameter

    Returns
    -------
    :
        Function that, given a parameter vector, returns the negative log prior
    """
    if kind == "uniform":
        bounds = np.array(
            [parameter.bounds_m() for parameter in parameter_order.parameters]
        )

        def neg_log_prior(x: nptype.NDArray[np.float64]) -> float:
            """
            Get log prior for a given parameter vector, `x`

            This was generated in `get_neg_log_prior_from_bounds`
            and assumes a uniform distribution for parameters
            within their bounds.

            Future uses must match the bounds order provided to
            `get_neg_log_prior_from_bounds`.

            Parameters
            ----------
            x
                Parameter vector.

            Returns
            -------
            :
                Negative log prior of `x`
            """
            in_bounds = (x > bounds[:, 0]) & (x < bounds[:, 1])
            if np.all(in_bounds):
                return 0

            return -np.inf

    else:
        raise NotImplementedError(kind)

    return neg_log_prior

get_start #

get_start(
    sampler: EnsembleSampler,
    start: NDArray[float64] | None = None,
) -> NDArray[float64]

Get starting point for emcee sampling

Parameters:

Name	Type	Description	Default
`sampler`	`EnsembleSampler`	Sampler which will do the sampling	required
`start`	`NDArray[float64] \| None`	Starting point to use. This is only used if the sampler has not already performed some iterations.	`None`

Returns:

Type	Description
`NDArray[float64]`	Starting point for the sampling

Raises:

Type	Description
`TypeError`	`start` is `None` and the sampler has not performed any iterations yet.

Warns:

Type	Description
`UserWarning`	`start` is not `None` but the sampler has already performed iterations. In this case, we use the sampler's last iteration and ignore the value of `start`.

Source code in src/openscm_calibration/emcee_utils.py

def get_start(
    sampler: emcee.ensemble.EnsembleSampler,
    start: nptype.NDArray[np.float64] | None = None,
) -> nptype.NDArray[np.float64]:
    """
    Get starting point for emcee sampling

    Parameters
    ----------
    sampler
        Sampler which will do the sampling

    start
        Starting point to use.

        This is only used if the sampler has not already performed some iterations.

    Returns
    -------
    :
        Starting point for the sampling

    Raises
    ------
    TypeError
        `start` is `None` and the sampler has not performed any iterations yet.

    Warns
    -----
    UserWarning
        `start` is not `None` but the sampler has already performed iterations.

        In this case, we use the sampler's last iteration
        and ignore the value of `start`.
    """
    if sampler.iteration < 1:
        # Haven't used any samples yet, hence must have start
        if start is None:
            msg = (
                "The sampler has not performed any iterations yet. "
                "You must provide a value for `start`. "
                f"Received {start=}."
            )
            raise TypeError(msg)

        res = start

    else:
        # Sampler has already done iterations, hence ignore start
        if start is not None:
            # Avoidable user side-effect, hence warn
            # (see https://docs.python.org/3/howto/logging.html#when-to-use-logging)
            warn_msg = (
                "The sampler has already performed iterations. "
                "We will use its last sample as the starting point "
                "rather than the provided value for `start`."
                "(If you with to re-start the sampling, reset the sampler)."
            )
            warnings.warn(warn_msg)

        res = sampler.get_last_sample()

    return res

neg_log_info #

neg_log_info(
    x: NDArray[float64],
    neg_log_prior: Callable[[NDArray[float64]], float],
    model_runner: SupportsModelRun[DataContainer],
    negative_log_likelihood_calculator: SupportsNegativeLogLikelihoodCalculation[
        DataContainer
    ],
) -> tuple[float, float, float | None]

Get negative log probability and likelihood information

Specifically, negative log probability, log prior and log likelihood for a given parameter vector.

Parameters:

Name	Type	Description	Default
`x`	`NDArray[float64]`	Parameter vector. Be careful with the order and units of parameters. No checks of parameter order or units are performed in this function.	required
`neg_log_prior`	`Callable[[NDArray[float64]], float]`	Function that calculates the negative log prior for a given value of `x`	required
`model_runner`	`SupportsModelRun[DataContainer]`	Runner of the model for a given value of `x`	required
`negative_log_likelihood_calculator`	`SupportsNegativeLogLikelihoodCalculation[DataContainer]`	Calculator of negative log likelihood based on the outputs of the model run	required

Returns:

Type Description

tuple[float, float, float | None]

Negative log probability of x, negative log prior of x and negative log likelihood of x.

If the negative log probability of x is -np.inf, then no likelihood calculation is performed and None is returned for the negative log likelihood of x.

If the log likelihood calculation raises a ValueError, we assume that the likelihood calculation failed and return -np.inf for the log probability and negative log likelihood of x.

Source code in src/openscm_calibration/emcee_utils.py

def neg_log_info(
    x: nptype.NDArray[np.float64],
    neg_log_prior: Callable[[nptype.NDArray[np.float64]], float],
    model_runner: SupportsModelRun[DataContainer],
    negative_log_likelihood_calculator: SupportsNegativeLogLikelihoodCalculation[
        DataContainer
    ],
) -> tuple[float, float, float | None]:
    """
    Get negative log probability and likelihood information

    Specifically, negative log probability, log prior and log likelihood
    for a given parameter vector.

    Parameters
    ----------
    x
        Parameter vector.

        Be careful with the order and units of parameters.
        No checks of parameter order or units are performed in this function.

    neg_log_prior
        Function that calculates the negative log prior for a given value of `x`

    model_runner
        Runner of the model for a given value of `x`

    negative_log_likelihood_calculator
        Calculator of negative log likelihood based on the outputs of the model run

    Returns
    -------
    :
        Negative log probability of `x`,
        negative log prior of `x`
        and negative log likelihood of `x`.

        If the negative log probability of `x` is `-np.inf`,
        then no likelihood calculation is performed
        and `None` is returned for the negative log likelihood of `x`.

        If the log likelihood calculation raises a `ValueError`,
        we assume that the likelihood calculation failed
        and return `-np.inf` for the log probability
        and negative log likelihood of `x`.
    """
    neg_log_prior_x = neg_log_prior(x)

    if not np.isfinite(neg_log_prior_x):
        return -np.inf, neg_log_prior_x, None

    try:
        model_results = model_runner.run_model(x)
    except ValueError:
        return -np.inf, neg_log_prior_x, -np.inf

    neg_log_likelihood_x = (
        negative_log_likelihood_calculator.calculate_negative_log_likelihood(
            model_results
        )
    )
    neg_log_prob = neg_log_prior_x + neg_log_likelihood_x

    return neg_log_prob, neg_log_prior_x, neg_log_likelihood_x

openscm_calibration.emcee_utils#

AutoCorrelationInfo #

converged instance-attribute #

convergence_ratio instance-attribute #

steps_post_burnin instance-attribute #

tau instance-attribute #

any_non_nan_tau #

ChainProgressInfo #

acceptance_fraction instance-attribute #

steps instance-attribute #

steps_post_burnin instance-attribute #

get_acceptance_fractions #

get_autocorrelation_info #

get_labelled_chain_data #

get_neg_log_prior #

get_start #

neg_log_info #

converged `instance-attribute` #

convergence_ratio `instance-attribute` #

steps_post_burnin `instance-attribute` #

tau `instance-attribute` #

acceptance_fraction `instance-attribute` #

steps `instance-attribute` #

steps_post_burnin `instance-attribute` #