Publications
Publications and preprints in reversed chronological order.
2025
- Computation-Aware Kalman Filtering and SmoothingMarvin Pförtner, Jonathan Wenger, Jon Cockayne, and Philipp HennigIn International Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Kalman filtering and smoothing are the foundational mechanisms for efficient inference in Gauss–Markov models. However, their time and memory complexities scale prohibitively with the size of the state space. This is particularly problematic in spatiotemporal regression problems, where the state dimension scales with the number of spatial observations. Existing approximate frameworks leverage low-rank approximations of the covariance matrix. But since they do not model the error introduced by the computational approximation, their predictive uncertainty estimates can be overly optimistic. In this work, we propose a probabilistic numerical method for inference in high-dimensional Gauss–Markov models which mitigates these scaling issues. Our matrix-free iterative algorithm leverages GPU acceleration and crucially enables a tunable trade-off between computational cost and predictive uncertainty. Finally, we demonstrate the scalability of our method on a large-scale climate dataset.
@inproceedings{Pfoertner2024ComputationAwareKalman, author = {Pf\"ortner, Marvin and Wenger, Jonathan and Cockayne, Jon and Hennig, Philipp}, title = {Computation-Aware {K}alman Filtering and Smoothing}, year = {2025}, booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)}, doi = {10.48550/arxiv.2405.08971}, }
2024
- Computation-Aware Gaussian Processes: Model Selection And Linear-Time InferenceJonathan Wenger, Kaiwen Wu, Philipp Hennig, Jacob R. Gardner, Geoff Pleiss, and John P. CunninghamIn Advances in Neural Information Processing Systems (NeurIPS), 2024
Model selection in Gaussian processes scales prohibitively with the size of the training dataset, both in time and memory. While many approximations exist, all incur inevitable approximation error. Recent work accounts for this error in the form of computational uncertainty, which enables – at the cost of quadratic complexity – an explicit tradeoff between computation and precision. Here we extend this development to model selection, which requires significant enhancements to the existing approach, including linear-time scaling in the size of the dataset. We propose a novel training loss for hyperparameter optimization and demonstrate empirically that the resulting method can outperform SGPR, CGGP and SVGP, state-of-the-art methods for GP model selection, on medium to large-scale datasets. Our experiments show that model selection for computation-aware GPs trained on 1.8 million data points can be done within a few hours on a single GPU. As a result of this work, Gaussian processes can be trained on large-scale datasets without significantly compromising their ability to quantify uncertainty – a fundamental prerequisite for optimal decision-making.
@inproceedings{Wenger2024ComputationAwareGaussian, title = {Computation-{Aware} {Gaussian} {Processes}: {Model} {Selection} {And} {Linear}-{Time} {Inference}}, author = {Wenger, Jonathan and Wu, Kaiwen and Hennig, Philipp and Gardner, Jacob R. and Pleiss, Geoff and Cunningham, John P.}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2024}, doi = {10.48550/arXiv.2411.01036}, }
- Large-Scale Gaussian Processes via Alternating ProjectionKaiwen Wu, Jonathan Wenger, Haydn Jones, Geoff Pleiss, and Jacob R. GardnerIn International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Training and inference in Gaussian processes (GPs) require solving linear systems with n × n kernel matrices. To address the prohibitive O(n^3) time complexity, recent work has employed fast iterative methods, like conjugate gradients (CG). However, as datasets increase in magnitude, the kernel matrices become increasingly ill-conditioned and still require O(n^2) space without partitioning. Thus, while CG increases the size of datasets GPs can be trained on, modern datasets reach scales beyond its applicability. In this work, we propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling mini-batching. Our algorithm, based on alternating projection, has O(n) per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets. Theoretically, we prove the method enjoys linear convergence. Empirically, we demonstrate its fast convergence in practice and robustness to ill-conditioning. On large-scale benchmark datasets with up to four million data points, our approach accelerates GP training and inference by speed-up factors up to 27× and 72×, respectively, compared to CG.
@inproceedings{Wu2024LargeScaleGaussian, title = {Large-{Scale} {Gaussian} {Processes} via {Alternating} {Projection}}, author = {Wu, Kaiwen and Wenger, Jonathan and Jones, Haydn and Pleiss, Geoff and Gardner, Jacob R.}, booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)}, year = {2024}, doi = {10.48550/arXiv.2310.17137}, }
- Accelerating Generalized Linear Models by Trading off Computation for UncertaintyLukas Tatzel, Jonathan Wenger, Frank Schneider, and Philipp Hennig2024
Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training compared to competitive baselines by trading off reduced computation for increased uncertainty.
@misc{Tatzel2024AcceleratingGeneralized, title = {Accelerating {Generalized} {Linear} {Models} by {Trading} off {Computation} for {Uncertainty}}, author = {Tatzel, Lukas and Wenger, Jonathan and Schneider, Frank and Hennig, Philipp}, howpublished = {arXiv}, year = {2024}, doi = {10.48550/arXiv.2310.20285}, }
2023
- On the Disconnect Between Theory and Practice of Neural Networks: Limits of the Neural Tangent Kernel PerspectiveJonathan Wenger, Felix Dangel, and Agustinus Kristiadi2023
The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, which can be demonstrated to hold in wide synthetic neural network architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that architectures used in practice do not exhibit behaviors as predicted by the NTK. Here, we supplement previous work on the NTK by empirically investigating whether the limiting regime predicts practically relevant behavior of large-width architectures. Our results demonstrate that this is not the case across multiple domains. This observed disconnect between theory and practice further calls into question to what degree NTK theory should inform architectural and algorithmic choices.
@misc{Wenger2023DisconnectTheory, title = {On the {Disconnect} {Between} {Theory} and {Practice} of {Neural} {Networks}: Limits of the Neural Tangent Kernel Perspective}, author = {Wenger, Jonathan and Dangel, Felix and Kristiadi, Agustinus}, howpublished = {arXiv}, year = {2023}, doi = {10.48550/arXiv.2310.00137}, }
- Physics-Informed Gaussian Process Regression Generalizes Linear PDE SolversMarvin Pförtner, Ingo Steinwart, Philipp Hennig, and Jonathan Wenger2023
Linear partial differential equations (PDEs) are an important, widely applied class of mechanistic models, describing physical processes such as heat transfer, electromagnetism, and wave propagation. In practice, specialized numerical methods based on discretization are used to solve PDEs. They generally use an estimate of the unknown model parameters and, if available, physical measurements for initialization. Such solvers are often embedded into larger scientific models with a downstream application and thus error quantification plays a key role. However, by ignoring parameter and measurement uncertainty, classical PDE solvers may fail to produce consistent estimates of their inherent approximation error. In this work, we approach this problem in a principled fashion by interpreting solving linear PDEs as physics-informed Gaussian process (GP) regression. Our framework is based on a key generalization of the Gaussian process inference theorem to observations made via an arbitrary bounded linear operator. Crucially, this probabilistic viewpoint allows to (1) quantify the inherent discretization error; (2) propagate uncertainty about the model parameters to the solution; and (3) condition on noisy measurements. Demonstrating the strength of this formulation, we prove that it strictly generalizes methods of weighted residuals, a central class of PDE solvers including collocation, finite volume, pseudospectral, and (generalized) Galerkin methods such as finite element and spectral methods. This class can thus be directly equipped with a structured error estimate. In summary, our results enable the seamless integration of mechanistic models as modular building blocks into probabilistic models by blurring the boundaries between numerical analysis and Bayesian inference.
@misc{Pfortner2023PhysicsInformedGaussian, title = {Physics-{Informed} {Gaussian} {Process} {Regression} {Generalizes} {Linear} {PDE} {Solvers}}, author = {Pförtner, Marvin and Steinwart, Ingo and Hennig, Philipp and Wenger, Jonathan}, howpublished = {arXiv}, year = {2023}, doi = {10.48550/arXiv.2212.12474}, }
2022
- Posterior and Computational Uncertainty in Gaussian ProcessesJonathan Wenger, Geoff Pleiss, Marvin Pförtner, Philipp Hennig, and John P. CunninghamIn Advances in Neural Information Processing Systems (NeurIPS), 2022
Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method’s posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
@inproceedings{Wenger2022PosteriorComputational, title = {Posterior and Computational Uncertainty in {Gaussian} Processes}, author = {Wenger, Jonathan and Pleiss, Geoff and Pf{\"o}rtner, Marvin and Hennig, Philipp and Cunningham, John P.}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2022}, doi = {10.48550/arXiv.2205.15449}, }
- Preconditioning for Scalable Gaussian Process Hyperparameter OptimizationJonathan Wenger, Geoff Pleiss, Philipp Hennig, John P. Cunningham, and Jacob R. GardnerIn International Conference on Machine Learning (ICML), 2022
Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.
@inproceedings{Wenger2022PreconditioningScalable, title = {Preconditioning for Scalable {G}aussian Process Hyperparameter Optimization}, author = {Wenger, Jonathan and Pleiss, Geoff and Hennig, Philipp and Cunningham, John P. and Gardner, Jacob R.}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2022}, doi = {10.48550/arXiv.2107.00243}, }
2021
- ProbNum: Probabilistic Numerics in PythonJonathan Wenger, Nicholas Krämer, Marvin Pförtner, Jonathan Schmidt, Nathanael Bosch, Nina Effenberger, Johannes Zenn, Alexandra Gessner, Toni Karvonen, François-Xavier Briol, Maren Mahsereci, and Philipp Hennig2021
Probabilistic numerical methods (PNMs) solve numerical problems via probabilistic inference. They have been developed for linear algebra, optimization, integration and differential equation simulation. PNMs naturally incorporate prior information about a problem and quantify uncertainty due to finite computational resources as well as stochastic input. In this paper, we present ProbNum: a Python library providing state-of-the-art probabilistic numerical solvers. ProbNum enables custom composition of PNMs for specific problem classes via a modular design as well as wrappers for off-the-shelf use. Tutorials, documentation, developer guides and benchmarks are available online at https://probnum.readthedocs.io/.
@misc{Wenger2021ProbNumProbabilistic, title = {{ProbNum}: {Probabilistic} {Numerics} in {Python}}, author = {Wenger, Jonathan and Krämer, Nicholas and Pförtner, Marvin and Schmidt, Jonathan and Bosch, Nathanael and Effenberger, Nina and Zenn, Johannes and Gessner, Alexandra and Karvonen, Toni and Briol, François-Xavier and Mahsereci, Maren and Hennig, Philipp}, year = {2021}, howpublished = {arXiv}, }
2020
- Probabilistic Linear Solvers for Machine LearningJonathan Wenger, and Philipp HennigIn Advances in Neural Information Processing Systems (NeurIPS), 2020
Linear systems are the bedrock of virtually all numerical computation. Machine learning poses specific challenges for the solution of such systems due to their scale, characteristic structure, stochasticity and the central role of uncertainty in the field. Unifying earlier work we propose a class of probabilistic linear solvers which jointly infer the matrix, its inverse and the solution from matrix-vector product observations. This class emerges from a fundamental set of desiderata which constrains the space of possible algorithms and recovers the method of conjugate gradients under certain conditions. We demonstrate how to incorporate prior spectral information in order to calibrate uncertainty and experimentally showcase the potential of such solvers for machine learning.
@inproceedings{Wenger2020ProbabilisticLinear, title = {Probabilistic {Linear} {Solvers} for {Machine} {Learning}}, author = {Wenger, Jonathan and Hennig, Philipp}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2020}, doi = {10.48550/arXiv.2010.09691}, }
- Non-Parametric Calibration for ClassificationJonathan Wenger, Hedvig Kjellström, and Rudolph TriebelIn International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Many applications of classification methods not only require high accuracy but also reliable estimation of predictive uncertainty. However, while many current classification frameworks, in particular deep neural networks, achieve high accuracy, they tend to incorrectly estimate uncertainty. In this paper, we propose a method that adjusts the confidence estimates of a general classifier such that they approach the probability of classifying correctly. In contrast to existing approaches, our calibration method employs a non-parametric representation using a latent Gaussian process, and is specifically designed for multi-class classification. It can be applied to any classifier that outputs confidence estimates and is not limited to neural networks. We also provide a theoretical analysis regarding the over- and underconfidence of a classifier and its relationship to calibration, as well as an empirical outlook for calibrated active learning. In experiments we show the universally strong performance of our method across different classifiers and benchmark data sets, in particular for state-of-the art neural network architectures.
@inproceedings{Wenger2020NonParametricCalibration, title = {Non-{Parametric} {Calibration} for {Classification}}, author = {Wenger, Jonathan and Kjellström, Hedvig and Triebel, Rudolph}, booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)}, year = {2020}, doi = {10.48550/arXiv.1906.04933}, }