#### [ArXiv] Post Model Selection, Nov. 7, 2007

Today’s arxiv-stat email included papers by Poetscher and Leeb, who have been working on post model selection inference. Sometimes model selection is misled as a part of statistical inference. Simply, model selection can be considered as a step prior to inference. How you know your data are from chi-square distribution, or gamma distribution? (this is a model selection problem with nested models.) Should I estimate the degree of freedom, k from Chi-sq or α and β from gamma to know mean and error? Will the errors of the mean be same from both distributions?

Prior to estimating means and errors of parameters, one wishes to choose a model where parameters of interests are properly embedded. *The arising problem is one uses the same data to choose a model (e.g. choosing the model with the largest likelihood value or bayes factor) as well as to perform statistical inference (estimating parameters, calculating confidence intervals and testing hypotheses), which inevitably introduces* **bias.** Such bias has been neglected in general (a priori tells what model to choose: e.g. the 2nd order polynomial is the absolute truth and the residuals are realizations of the error term, by the way how one can sure that the error follows normal distribution?). Asymptotics enables this bias to be O(n^m), where m is smaller than zero. Estimating this bias has been popular since Akaike introduced AIC (one of the most well known model selection criteria). Numerous works are found in the field of robust penalized likelihood. Variable selection has been a very hot topic in a recent few decades. Beyond my knowledge, there were more approaches to cope with this bias not to contaminate the inference results.

The works by Professors Poetscher and Leeb looked unique to me in the line of resolving the intrinsic bias arise from inference after model selection. In stead of being listed in my weekly arxiv lists, their arxiv papers deserved to be listed under a separate posting. I also included some more general references.

The list of paper from today’s arxiv:

- [stat.TH:0702703]
**Can one estimate the conditional distribution of post-model-selection estimators?**by H. Leeb and B. M. P\”{o}tscher - [stat.TH:0702781]
**The distribution of model averaging estimators and an impossibility result regarding its estimation**by B. M. P\”{o}tscher - [stat.TH:0704.1466]
**Sparse Estimators and the Oracle Property, or the Return of Hodges’ Estimator**by H. Leeb and B. M. Poetscher - [stat.TH:0711.0660]
**On the Distribution of Penalized Maximum Likelihood Estimators: The LASSO, SCAD, and Thresholding**by B. M. Poetscher, and H. Leeb - [stat.TH:0701781]
**Learning Trigonometric Polynomials from Random Samples and Exponential Inequalities for Eigenvalues of Random Matrices**by K. Groechenig, B.M. Poetscher, and H. Rauhut

Other resources:

- Prof. Leeb’s website has other published papers
- Effects of Model Selection on Inference B.M.Potscher, Econometric Theory, Vol. 7, No. 2 (Jun., 1991), pp. 163-185
- The Effect of Model Selection on Confidence Regions and Prediction Regions P.Kabaila, Econometric Theory, Vol. 11, No. 3 (Aug., 1995), pp. 537-549
- Model Selection and Multi-Model Inference: a book by Burnham and Anderson
- modelselection.org: it’s a model selection website but looks like pageant show website.

**[Added on Nov.8th]** There were a few more relevant papers from arxiv.

- [stat.AP:0711.0993]
**Upper bounds on the minimum coverage probability of confidence intervals in regression after variable selection**by P. Kabaila and K. Giri - [stat.ME:0710.1036]
**Confidence Sets Based on Sparse Estimators Are Necessarily Large**by B. M. Pötscher

## Leave a comment