SyntheticBeta¶

class pycalib.benchmark.SyntheticBeta(run_dir, cal_methods, cal_method_names, beta_params, miscal_functions, miscal_function_names, size, marginal_probs=None, n_splits=10, test_size=0.9, train_size=None, random_state=None)[source]¶

Bases: pycalib.benchmark.Benchmark

Model evaluation using synthetic data sampled from a Beta distribution.

Implements a data generation method returning a new evaluation data with maximum posterior probabilities sampled from a Beta distribution \(\hat{p}_{\max} \sim (1-\frac{1}{K})\text{Beta}(\alpha, \beta)+\frac{1}{K}\) and corresponding class labels sampled from a Bernoulli distribution with parameter \(f(\hat{p}_{\max})\), where \(f : [\frac{1}{n_\text{classes}},1] \rightarrow [\frac{1}{n_\text{classes}},1]\) is a miscalibration function.

Parameters

run_dir (str) – Directory to run benchmarking in and save output and logs to.
cal_methods (list) – Calibration methods to benchmark.
cal_method_names (list) – Names of calibration methods.
beta_params (tuple or list, shape=(n,2)) – Parameters \((\alpha, \beta)\) of the Beta distribution.
miscal_functions (function or list) – Function(s) \(f : [0,1] \rightarrow [0,1]\) for miscalibration. When this function is different from the identity, the generated output from this function is miscalibrated. The function automatically gets rescaled to \(f : [\frac{1}{n_\text{classes}},1] \rightarrow [\frac{1}{n_\text{classes}},1]\).
miscal_function_names (str or list) – Names of miscalibration functions.
size (int or list) – Size of data set.
marginal_probs (float or list, default=None) – Marginal class probabilities.
n_splits (int, default=10) – Number of splits for cross validation.
test_size (float, default=0.9) – Size of test set.
train_size (float, default=None) – Size of calibration set.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Methods Summary

`data_gen`()	Returns the full dataset or a generator of datasets.
`plot`(**kwargs)	Plots the result of the benchmark experiment.
`plot_miscal_function`([function_names])	Plots the miscalibration functions.
`run`([n_jobs])	Train all models, evaluate on test data and save the results.
`sample_miscal_data`(alpha, beta, miscal_func, …)	Sample a synthetic data set based on the Beta distribution and a miscalibration function.

Methods Documentation

data_gen()[source]¶

Returns the full dataset or a generator of datasets.

Returns
Return type: X, y giving uncalibrated predictions and corresponding classes.

plot(**kwargs)[source]¶

Plots the result of the benchmark experiment.

Parameters: **kwargs – Additional arguments passed on to matplotlib.plot().

plot_miscal_function(function_names=None, **kwargs)[source]¶

Plots the miscalibration functions.

Parameters

function_names (list) – List of miscalibration functions to plot.
**kwargs – Additional arguments passed on to matplotlib.plot().

run(n_jobs=None)¶

Train all models, evaluate on test data and save the results.

Parameters: n_jobs (int or None, optional (default=None)) – The number of CPUs to use to do the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

static sample_miscal_data(alpha, beta, miscal_func, miscal_func_name, size, marginal_probs, random_state=None)[source]¶

Sample a synthetic data set based on the Beta distribution and a miscalibration function.

Parameters

alpha (float) – Parameter \(lpha\) of the Beta distribution.
beta (float) – Parameter \(lpha\) of the Beta distribution.
miscal_func (function) – Function \(f : [\frac{1}{n_\text{classes}},1] \rightarrow [\frac{1}{n_\text{classes}},1]\) giving the accuracy for a given confidence. When this function is different from the identity, the generated output from this function is miscalibrated.
miscal_func_name (str) – Name of the miscalibration function.
size (int) – Size of data set.
marginal_probs (np.ndarray, size=(n_classes,)) – Marginal class probabilities.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns

Return type

X, y giving uncalibrated predictions and corresponding classes.