GPCalibration¶

class pycalib.calibration_methods.GPCalibration(n_classes, logits=False, mean_function=None, kernel=None, likelihood=None, n_inducing_points=10, maxiter=1000, n_monte_carlo=100, max_samples_monte_carlo=10000000, inf_mean_approx=False, session=None, random_state=1, verbose=False)[source]¶

Bases: pycalib.calibration_methods.CalibrationMethod

Probability calibration using a latent Gaussian process

Gaussian process calibration 1 is a non-parametric approach to calibrate posterior probabilities from an arbitrary classifier based on a hold-out data set. Inference is performed using a sparse variational Gaussian process (SVGP) 2 implemented in gpflow 3.

Parameters

n_classes (int) – Number of classes in calibration data.
logits (bool, default=False) – Are the inputs for calibration logits (e.g. from a neural network)?
mean_function (GPflow object) – Mean function of the latent GP.
kernel (GPflow object) – Kernel function of the latent GP.
likelihood (GPflow object) – Likelihood giving a prior on the class prediction.
n_inducing_points (int, default=100) – Number of inducing points for the variational approximation.
maxiter (int, default=1000) – Maximum number of iterations for the likelihood optimization procedure.
n_monte_carlo (int, default=100) – Number of Monte Carlo samples for the inference procedure.
max_samples_monte_carlo (int, default=10**7) – Maximum number of Monte Carlo samples to draw in one batch when predicting. Setting this value too large can cause memory issues.
inf_mean_approx (bool, default=False) – If True, when inferring calibrated probabilities, only the mean of the latent Gaussian process is taken into account, not its covariance.
session (tf.Session, default=None) – tensorflow session to use.
random_state (int, default=0) – Random seed for reproducibility. Needed for Monte-Carlo sampling routine.
verbose (bool) – Print information on optimization routine.

References

1: Wenger, J., Kjellström H. & Triebel, R. Non-Parametric Calibration for Classification in Proceedings of AISTATS (2020)
2: Hensman, J., Matthews, A. G. d. G. & Ghahramani, Z. Scalable Variational Gaussian Process Classification in Proceedings of AISTATS (2015)
3: Matthews, A. G. d. G., van der Wilk, M., et al. GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research 18, 1–6 (Apr. 2017)

Methods Summary

`fit`(X, y)	Fit the calibration method based on the given uncalibrated class probabilities or logits X and ground truth labels y.
`get_params`([deep])	Get parameters for this estimator.
`latent`(z)	Evaluate the latent function f(z) of the GP calibration method.
`plot`(filename[, xlim])	Plot the calibration map.
`plot_latent`(z, filename[, plot_classes])	Plot the latent function of the calibration method.
`predict`(X)	Predict the class of new samples after scaling.
`predict_proba`(X[, mean_approx])	Compute calibrated posterior probabilities for a given array of posterior probabilities from an arbitrary classifier.
`set_params`(**params)	Set the parameters of this estimator.

Methods Documentation

fit(X, y)[source]¶

Fit the calibration method based on the given uncalibrated class probabilities or logits X and ground truth labels y.

Parameters

X (array-like, shape (n_samples, n_classes)) – Training data, i.e. predicted probabilities or logits of the base classifier on the calibration set.
y (array-like, shape (n_samples,)) – Target classes.

Returns

self – Returns an instance of self.

Return type

object

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: mapping of string to any

latent(z)[source]¶

Evaluate the latent function f(z) of the GP calibration method.

Parameters

z (array-like, shape=(n_evaluations,)) – Input confidence for which to evaluate the latent function.

Returns

f (array-like, shape=(n_evaluations,)) – Values of the latent function at z.
f_var (array-like, shape=(n_evaluations,)) – Variance of the latent function at z.

plot(filename, xlim=[0, 1], **kwargs)¶

Plot the calibration map.

Parameters

xlim (array-like) – Range of inputs of the calibration map to be plotted.
**kwargs – Additional arguments passed on to matplotlib.plot().

plot_latent(z, filename, plot_classes=True, **kwargs)[source]¶

Plot the latent function of the calibration method.

Parameters

z (array-like, shape=(n_evaluations,)) – Input confidence to plot latent function for.
filename – Filename / -path where to save output.
plot_classes (bool, default=True) – Should classes also be plotted?
kwargs – Additional arguments passed on to matplotlib.pyplot.subplots.

predict(X)¶

Predict the class of new samples after scaling. Predictions are identical to the ones from the uncalibrated classifier.

Parameters: X (array-like, shape (n_samples, n_classes)) – The uncalibrated posterior probabilities.
Returns: C – The predicted classes.
Return type: array, shape (n_samples,)

predict_proba(X, mean_approx=False)[source]¶

Compute calibrated posterior probabilities for a given array of posterior probabilities from an arbitrary classifier.

Parameters

X (array-like, shape=(n_samples, n_classes)) – The uncalibrated posterior probabilities.
mean_approx (bool, default=False) – If True, inference is performed using only the mean of the latent Gaussian process, not its covariance. Note, if self.inference_mean_approximation==True, then the logical value of this option is not considered.

Returns

P – The predicted probabilities.

Return type

array, shape (n_samples, n_classes)

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters: **params (dict) – Estimator parameters.
Returns: self – Estimator instance.
Return type: object