My research focuses on resource-efficient methods for large-scale probabilistic machine learning. Much of my work views numerical algorithms through the lens of probabilistic inference. This perspective enables the acceleration of learning algorithms via an explicit trade-off between computational efficiency and predictive precision.
I received my PhD in Computer Science from the University of Tübingen advised by Philipp Hennig and I was an IMPRS-IS fellow at the Max-Planck Institute for Intelligent Systems.
Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization. Instead, current theory credits implicit regularization imposed by the choice of architecture, hyperparameters and optimization procedure. However, deep neural networks can be surprisingly non-robust, resulting in overconfident predictions and poor out-of-distribution generalization. Bayesian deep learning addresses this via model averaging, but typically requires significant computational resources as well as carefully elicited priors to avoid overriding the benefits of implicit regularization. Instead, in this work, we propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent. We theoretically characterize this inductive bias in overparametrized linear models as generalized variational inference and demonstrate the importance of the choice of parametrization. Empirically, our approach demonstrates strong in- and out-of-distribution performance without additional hyperparameter tuning and with minimal computational overhead.
@inproceedings{Wenger2025VariationalDeep,title={Variational Deep Learning via Implicit Regularization},author={Wenger, Jonathan and Coker, Beau and Marusic, Juraj and Cunningham, John P.},booktitle={NeurIPS Workshop on Structured Probabilistic Inference \& Generative Modeling (SPIGM)},year={2025},doi={10.48550/arXiv.2505.20235},}
Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference
Jonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, and John P. Cunningham
Advances in Neural Information Processing Systems (NeurIPS), 2024
Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables – at the cost of quadratic complexity – an explicit tradeoff between computation and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty – a fundamental prerequisite for optimal decision-making.
@inproceedings{Wenger2024ComputationAwareGaussian,title={Computation-{Aware} {Gaussian} {Processes}: {Model} {Selection} {And} {Linear}-{Time} {Inference}},author={Wenger, Jonathan and Wu, Kaiwen and Hennig, Philipp and Gardner, Jacob R. and Pleiss, Geoff and Cunningham, John P.},booktitle={Advances in Neural Information Processing Systems (NeurIPS)},year={2024},doi={10.48550/arXiv.2411.01036},}
Non-Parametric Calibration for Classification
Jonathan Wenger, Hedvig Kjellström, and Rudolph Triebel
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures.
@inproceedings{Wenger2020NonParametricCalibration,title={Non-{Parametric} {Calibration} for {Classification}},author={Wenger, Jonathan and Kjellström, Hedvig and Triebel, Rudolph},booktitle={International Conference on Artificial Intelligence and Statistics (AISTATS)},year={2020},doi={10.48550/arXiv.1906.04933},}