My research focuses on resource-efficient methods for large-scale probabilistic machine learning. Much of my work views numerical algorithms through the lens of probabilistic inference. This perspective enables the acceleration of learning algorithms via an explicit trade-off between computational efficiency and predictive precision.
I received my PhD in Computer Science from the University of Tübingen advised by Philipp Hennig and I was an IMPRS-IS fellow at the Max-Planck Institute for Intelligent Systems.
Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deploying deep learning models out-of-distribution, in sequential decision-making tasks, or in safety-critical domains, necessitates reliable uncertainty quantification, not just a point estimate. The machinery of modern approximate inference – Bayesian deep learning – should answer the need for uncertainty quantification, but its effectiveness has been challenged by our inability to define useful explicit inductive biases through priors, as well as the associated computational burden. Instead, in this work we demonstrate, both theoretically and empirically, how to regularize a variational deep network implicitly via the optimization procedure, just as for standard deep learning. We fully characterize the inductive bias of (stochastic) gradient descent in the case of an overparametrized linear model as generalized variational inference and demonstrate the importance of the choice of parametrization. Finally, we show empirically that our approach achieves strong in- and out-of-distribution performance without tuning of additional hyperparameters and with minimal time and memory overhead over standard deep learning.
@misc{Wenger2025VariationalDeep,title={Variational Deep Learning via Implicit Regularization},author={Wenger, Jonathan and Coker, Beau and Marusic, Juraj and Cunningham, John P.},howpublished={arXiv},year={2025},doi={10.48550/arXiv.2505.20235},}
Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference
Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, and John P. Cunningham
Advances in Neural Information Processing Systems (NeurIPS), 2024
Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables – at the cost of quadratic complexity – an explicit tradeoff between computation and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty – a fundamental prerequisite for optimal decision-making.
@inproceedings{Wenger2024ComputationAwareGaussian,title={Computation-{Aware} {Gaussian} {Processes}: {Model} {Selection} {And} {Linear}-{Time} {Inference}},author={Wenger, Jonathan and Wu, Kaiwen and Hennig, Philipp and Gardner, Jacob R. and Pleiss, Geoff and Cunningham, John P.},booktitle={Advances in Neural Information Processing Systems (NeurIPS)},year={2024},doi={10.48550/arXiv.2411.01036},}
Non-Parametric Calibration for Classification
Jonathan Wenger, Hedvig Kjellström, and Rudolph Triebel
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures.
@inproceedings{Wenger2020NonParametricCalibration,title={Non-{Parametric} {Calibration} for {Classification}},author={Wenger, Jonathan and Kjellström, Hedvig and Triebel, Rudolph},booktitle={International Conference on Artificial Intelligence and Statistics (AISTATS)},year={2020},doi={10.48550/arXiv.1906.04933},}