Statistics Colloquium: Simon Mak, Duke University

This event is part of the Spring 2022 Statistics Colloquium


A Graphical Multi-Fidelity Gaussian Process Model, With Application to Emulation of Expensive Computer Simulations

Presented by Simon Mak, Assistant Professor, Department of Statistical Science, Duke University

Wednesday, February 16, 2022
4:00 p.m. ET
Online

With advances in scientific computing and mathematical modeling, complex phenomena can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hours to perform. One solution is multi-fidelity emulation, which uses data of varying accuracies (or fidelities) to train an efficient predictive model (or emulator) for the expensive simulator. In complex problems, simulation data with different fidelities are often connected scientifically via a directed acyclic graph (DAG), which is difficult to integrate within existing multi-fidelity emulator models. We thus propose a new Graphical Multi-fidelity Gaussian process (GMGP) model, which embeds this DAG (capturing scientific dependencies) within a Gaussian process framework. We show that the GMGP has desirable modeling traits via two Markov properties, and admits a scalable formulation for recursive computation of the posterior predictive distribution along sub-graphs. We also present an experimental design framework over the DAG given an experimental budget, and propose a nonlinear extension of the GMGP model via deep Gaussian processes. The advantages of the GMGP model are then demonstrated via a suite of numerical experiments and an application to emulation of heavy-ion collisions, which can be used to study the conditions of matter in the Universe shortly after the Big Bang.

Speaker Bio

Dr. Simon Mak is an Assistant Professor in the Department of Statistical Science at Duke University. Prior to Duke, he was a Postdoctoral Fellow at the Stewart School of Industrial & Systems Engineering at Georgia Tech. His research involves integrating domain knowledge (e.g., scientific theories, mechanistic models, financial principles) as prior information for statistical inference and prediction. This gives a holistic framework for interpretable statistical learning, providing a principled way for scientists to validate theories from data, and for statisticians to integrate scientific knowledge. His research tackles methodological, theoretical, and algorithmic challenges in this integration. This involves building probabilistic models on complex objects (e.g., functions, manifolds, networks), and developing efficient algorithms and data collection methods for model training.