GPCalibration¶
-
class
pycalib.calibration_methods.
GPCalibration
(n_classes, logits=False, mean_function=None, kernel=None, likelihood=None, n_inducing_points=10, maxiter=1000, n_monte_carlo=100, max_samples_monte_carlo=10000000, inf_mean_approx=False, session=None, random_state=1, verbose=False)[source]¶ Bases:
pycalib.calibration_methods.CalibrationMethod
Probability calibration using a latent Gaussian process
Gaussian process calibration 1 is a non-parametric approach to calibrate posterior probabilities from an arbitrary classifier based on a hold-out data set. Inference is performed using a sparse variational Gaussian process (SVGP) 2 implemented in gpflow 3.
- Parameters
n_classes (int) – Number of classes in calibration data.
logits (bool, default=False) – Are the inputs for calibration logits (e.g. from a neural network)?
mean_function (GPflow object) – Mean function of the latent GP.
kernel (GPflow object) – Kernel function of the latent GP.
likelihood (GPflow object) – Likelihood giving a prior on the class prediction.
n_inducing_points (int, default=100) – Number of inducing points for the variational approximation.
maxiter (int, default=1000) – Maximum number of iterations for the likelihood optimization procedure.
n_monte_carlo (int, default=100) – Number of Monte Carlo samples for the inference procedure.
max_samples_monte_carlo (int, default=10**7) – Maximum number of Monte Carlo samples to draw in one batch when predicting. Setting this value too large can cause memory issues.
inf_mean_approx (bool, default=False) – If True, when inferring calibrated probabilities, only the mean of the latent Gaussian process is taken into account, not its covariance.
session (tf.Session, default=None) – tensorflow session to use.
random_state (int, default=0) – Random seed for reproducibility. Needed for Monte-Carlo sampling routine.
verbose (bool) – Print information on optimization routine.
References
- 1
Wenger, J., Kjellström H. & Triebel, R. Non-Parametric Calibration for Classification in Proceedings of AISTATS (2020)
- 2
Hensman, J., Matthews, A. G. d. G. & Ghahramani, Z. Scalable Variational Gaussian Process Classification in Proceedings of AISTATS (2015)
- 3
Matthews, A. G. d. G., van der Wilk, M., et al. GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research 18, 1–6 (Apr. 2017)
Methods Summary
fit
(X, y)Fit the calibration method based on the given uncalibrated class probabilities or logits X and ground truth labels y.
get_params
([deep])Get parameters for this estimator.
latent
(z)Evaluate the latent function f(z) of the GP calibration method.
plot
(filename[, xlim])Plot the calibration map.
plot_latent
(z, filename[, plot_classes])Plot the latent function of the calibration method.
predict
(X)Predict the class of new samples after scaling.
predict_proba
(X[, mean_approx])Compute calibrated posterior probabilities for a given array of posterior probabilities from an arbitrary classifier.
set_params
(**params)Set the parameters of this estimator.
Methods Documentation
-
fit
(X, y)[source]¶ Fit the calibration method based on the given uncalibrated class probabilities or logits X and ground truth labels y.
- Parameters
X (array-like, shape (n_samples, n_classes)) – Training data, i.e. predicted probabilities or logits of the base classifier on the calibration set.
y (array-like, shape (n_samples,)) – Target classes.
- Returns
self – Returns an instance of self.
- Return type
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
-
latent
(z)[source]¶ Evaluate the latent function f(z) of the GP calibration method.
- Parameters
z (array-like, shape=(n_evaluations,)) – Input confidence for which to evaluate the latent function.
- Returns
f (array-like, shape=(n_evaluations,)) – Values of the latent function at z.
f_var (array-like, shape=(n_evaluations,)) – Variance of the latent function at z.
-
plot
(filename, xlim=[0, 1], **kwargs)¶ Plot the calibration map.
- Parameters
xlim (array-like) – Range of inputs of the calibration map to be plotted.
**kwargs – Additional arguments passed on to
matplotlib.plot()
.
-
plot_latent
(z, filename, plot_classes=True, **kwargs)[source]¶ Plot the latent function of the calibration method.
- Parameters
z (array-like, shape=(n_evaluations,)) – Input confidence to plot latent function for.
filename – Filename / -path where to save output.
plot_classes (bool, default=True) – Should classes also be plotted?
kwargs – Additional arguments passed on to matplotlib.pyplot.subplots.
-
predict
(X)¶ Predict the class of new samples after scaling. Predictions are identical to the ones from the uncalibrated classifier.
- Parameters
X (array-like, shape (n_samples, n_classes)) – The uncalibrated posterior probabilities.
- Returns
C – The predicted classes.
- Return type
array, shape (n_samples,)
-
predict_proba
(X, mean_approx=False)[source]¶ Compute calibrated posterior probabilities for a given array of posterior probabilities from an arbitrary classifier.
- Parameters
X (array-like, shape=(n_samples, n_classes)) – The uncalibrated posterior probabilities.
mean_approx (bool, default=False) – If True, inference is performed using only the mean of the latent Gaussian process, not its covariance. Note, if self.inference_mean_approximation==True, then the logical value of this option is not considered.
- Returns
P – The predicted probabilities.
- Return type
array, shape (n_samples, n_classes)
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.