This event is part of the Fall 2022 Statistics Colloquium.
Epsilon-Greedy strategy for Nonparametric Bandits
Presented by Sakshi Arya, Eberly Fellow and Postdoctoral Researcher, Department of Statistics, Penn State University
 Wednesday, October 12
4:00 p.m. ET
Virtual meeting - Webex meeting room
Contextual bandit algorithms are popular for sequential decision making in several practical applications, ranging from online advertisement recommendations to mobile health. The goal of such problems is to maximize cumulative reward over time for a set of choices/arms, while taking covariate (or contextual) information into account. Epsilon-Greedy is a popular heuristic for the Multi-Armed Bandits problem, however it is not one of the most studied algorithms theoretically in the presence of contextual information. We study the Epsilon-Greedy strategy in nonparametric bandits (when no parametric form is assumed for the reward functions). Firstly, we modify the strategy to incorporate delayed rewards, show strong consistency and establish finite-time regret bounds for the proposed strategy. Secondly, in order to address the curse of dimensionality for estimation using classical nonparametric estimation approaches, we propose a kernelized epsilon-greedy algorithm. More specifically, we assume that the similarities between the covariates and expected rewards can be modeled as arbitrary linear functions of the contexts’ images in a certain reproducing kernel Hilbert space (RKHS). We establish convergence rates for the estimation error and the regret, which are closely tied to the intrinsic dimensionality of the RKHS. We show that the rates closely match the optimal rates for linear contextual bandits when restricted to a finite dimensional RKHS. Lastly, we illustrate our results through simulation studies and real-data analysis.
Speaker Bio:
Sakshi Arya is an Eberly Fellow and Postdoctoral Researcher at Penn State University. Sakshi received her PhD in Statistics from the University of Minnesota in 2020. Her primary research area is sequential decision making. In particular, she has devised algorithms for contextual bandit problems using nonparametric estimation tools and established theoretical guarantees on the performance of these algorithms. More broadly, she is interested in building the necessary statistical methodology for solving problems using nonparametric tools of estimation. Other areas of interest include spatio-temporal methods with applications in climate modeling and infectious disease modeling.