SyntheticBeta

class pycalib.benchmark.SyntheticBeta(run_dir, cal_methods, cal_method_names, beta_params, miscal_functions, miscal_function_names, size, marginal_probs=None, n_splits=10, test_size=0.9, train_size=None, random_state=None)[source]

Bases: pycalib.benchmark.Benchmark

Model evaluation using synthetic data sampled from a Beta distribution.

Implements a data generation method returning a new evaluation data with maximum posterior probabilities sampled from a Beta distribution \(\hat{p}_{\max} \sim (1-\frac{1}{K})\text{Beta}(\alpha, \beta)+\frac{1}{K}\) and corresponding class labels sampled from a Bernoulli distribution with parameter \(f(\hat{p}_{\max})\), where \(f : [\frac{1}{n_\text{classes}},1] \rightarrow [\frac{1}{n_\text{classes}},1]\) is a miscalibration function.

Parameters
  • run_dir (str) – Directory to run benchmarking in and save output and logs to.

  • cal_methods (list) – Calibration methods to benchmark.

  • cal_method_names (list) – Names of calibration methods.

  • beta_params (tuple or list, shape=(n,2)) – Parameters \((\alpha, \beta)\) of the Beta distribution.

  • miscal_functions (function or list) – Function(s) \(f : [0,1] \rightarrow [0,1]\) for miscalibration. When this function is different from the identity, the generated output from this function is miscalibrated. The function automatically gets rescaled to \(f : [\frac{1}{n_\text{classes}},1] \rightarrow [\frac{1}{n_\text{classes}},1]\).

  • miscal_function_names (str or list) – Names of miscalibration functions.

  • size (int or list) – Size of data set.

  • marginal_probs (float or list, default=None) – Marginal class probabilities.

  • n_splits (int, default=10) – Number of splits for cross validation.

  • test_size (float, default=0.9) – Size of test set.

  • train_size (float, default=None) – Size of calibration set.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Methods Summary

data_gen()

Returns the full dataset or a generator of datasets.

plot(**kwargs)

Plots the result of the benchmark experiment.

plot_miscal_function([function_names])

Plots the miscalibration functions.

run([n_jobs])

Train all models, evaluate on test data and save the results.

sample_miscal_data(alpha, beta, miscal_func, …)

Sample a synthetic data set based on the Beta distribution and a miscalibration function.

Methods Documentation

data_gen()[source]

Returns the full dataset or a generator of datasets.

Returns

Return type

X, y giving uncalibrated predictions and corresponding classes.

plot(**kwargs)[source]

Plots the result of the benchmark experiment.

Parameters

**kwargs – Additional arguments passed on to matplotlib.plot().

plot_miscal_function(function_names=None, **kwargs)[source]

Plots the miscalibration functions.

Parameters
  • function_names (list) – List of miscalibration functions to plot.

  • **kwargs – Additional arguments passed on to matplotlib.plot().

run(n_jobs=None)

Train all models, evaluate on test data and save the results.

Parameters

n_jobs (int or None, optional (default=None)) – The number of CPUs to use to do the computation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

static sample_miscal_data(alpha, beta, miscal_func, miscal_func_name, size, marginal_probs, random_state=None)[source]

Sample a synthetic data set based on the Beta distribution and a miscalibration function.

Parameters
  • alpha (float) – Parameter \(lpha\) of the Beta distribution.

  • beta (float) – Parameter \(lpha\) of the Beta distribution.

  • miscal_func (function) – Function \(f : [\frac{1}{n_\text{classes}},1] \rightarrow [\frac{1}{n_\text{classes}},1]\) giving the accuracy for a given confidence. When this function is different from the identity, the generated output from this function is miscalibrated.

  • miscal_func_name (str) – Name of the miscalibration function.

  • size (int) – Size of data set.

  • marginal_probs (np.ndarray, size=(n_classes,)) – Marginal class probabilities.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Returns

Return type

X, y giving uncalibrated predictions and corresponding classes.