Paper of the Month: March and April 2018 | Department of Statistics

Notes preparer: Jun Yan

Even an undergraduate student taking a regression course knows how to do model selection or variable selection through model selection criteria such as AIC (Akaike, 1974) or BIC (Schwarz, 1978). Both criteria penalize the measure of fit (log-likelihood) with a multiple of the number of free parameters to combat overfitting. BIC penalizes the number of parameters more heavily than AIC.

AIC is an estimate of the Kullback-Leibler divergence while BIC is an approximation of the posterior probability of a candidate model. AIC is best for prediction as it is asymptotically equivalent to leave-one-out validation. BIC is best for explanation as it allows consistent estimation of the underlying data generating process.

AIC and BIC should be learned together. It is tempting to wonder why the penalty scales on the number of parameters are what they are, which is usually not in textbooks. Both original papers are among the most cited statistical papers; as of now, Akaike (1974) has 39K and Schwarz (1978) has 34K Google citations. We present Schwarz (1978) simply because, despite its importance, it is short (3 pages with 4 references on the 4th page) and accessible. The justifications were made even easier to follow by Cavanugh (1997, Stat Prob Letters) for AIC and by Neath and Cavanaugh (2012, WIREs Comp Stat) for BIC. A StackExchange discussion is also a fun read.

Phone:	(860) 486-3414
E-mail:	statistics@uconn.edu
Address:	Room 323, Philip E. Austin Building 215 Glenbrook Road, Unit 4120 Storrs, Connecticut 06269-4120