The AstroStat Slog » education http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 A short note on Probability for astronomers http://hea-www.harvard.edu/AstroStat/slog/2009/a-short-note-on-probability-for-astronomers/ http://hea-www.harvard.edu/AstroStat/slog/2009/a-short-note-on-probability-for-astronomers/#comments Mon, 28 Dec 2009 03:13:02 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3970 I often feel irksome whenever I see a function being normalized over a feasible parameter space and it being used as a probability density function (pdf) for further statistical inference. In order to be a suitable pdf, normalization has to be done over a measurable space not over a feasible space. Such practice often yields biased best fits (biased estimators) and improper error bars. On the other hand, validating a measurable space under physics seems complicated. To be precise, we often lost in translation.

When I was teaching statistics, despite undergraduate courses, there were both undergraduate and graduate students of various fields except astrophysics majors. I wondered why they were not encouraged to take some basic statistics whereas they were encouraged to take some computer science courses. As there are many astronomers good at programming and designing tools, I’m sure that recommending students to take statistics courses will renovate astronomical data analysis procedures (beyond Bevington’s book) and hind theories (statistics and mathematics per se, not physics laws).

Here’s an interesting lecture for developing a curriculum for the new era in computer science and why the basic probability theory and statistics is important to raise versatile computer scientists. It could be a bit out dated now because I saw it several months ago.

About a little more than the half way through the lecture, he emphasizes that probability course partaking the computer science curriculum. I wonder any astronomy professor has similar arguments and stresses for any needs of basic probability theories to be learned among young future astrophysicists in order to prevent many statistics misuses appearing in astronomical literature. Particularly confusions between fitting (estimating) and inference (both model assessment and uncertainty quantification) are frequently observed in literature where authors claim their superior statistics and statistical data analysis. I personally sometimes attribute this confusion to the lack of distinction between what is random and what is deterministic, or strong believe in their observed and processed data absent from errors and probabilistic nature.

Many introductory books introduce very interesting problems many of which have some historical origins to introduce probability theories (many anecdotes). One can check out the very basics, probability axioms, and measurable function from wikipedia. With examples, probability is high school or lower level math that you already know but with jargon you’ll like to recite lexicons many times so that you are get used to foundations, basics, and their theories.

We often mention measurable to discuss random variables, uncertainties, and distributions without verbosity. “Assume measurable space … ” saves multiple paragraphs in an article and changes the structure of writing. This short adjective implies so many assumptions depending on statistical models and equations that you are using for best fits and error bars.

Consider a LF, that is truncated due to observational limits. The common practice I saw is drawing a histogram in a way that the adaptive binning makes the overall shape reflecting a partial bell shape curve. Thanks to its smoothed look, scientists impose a gaussian curve to partially observed data and find parameter estimates that determine the shape of this gaussian curve. There is no imputation step to fake unobserved points to comprise the full probability space. The parameter space of gaussian curves frequently does not coincide with the physically feasible space; however, such discrepancy is rarely discussed in astronomical literature and subsequent biased results look like a taboo.

Although astronomers emphasize the importance of uncertainties, factorization nor stratification of uncertainties has never been clear (model uncertainty, systematic uncertainty or bias, statistical uncertainties or variance). Hierarchical relationships or correlations among these different uncertainties are never addressed in a full measure. Basics of probability theory and the understanding of random variables would help to characterize uncertainties both in mathematical sense and astrophysical sense. This knowledge also assist appropriate quantification of these characterized uncertainties.

Statistical models are rather simple compared to models of astrophysics. However, statistics is the science of understanding uncertainties and randomness and therefore, some strategies of transcribing from complicated astrophysical models into statistical models, in order to reflect the probabilistic nature of observed (or parameters, for Bayesian modeling), are necessary. Both raw or processed data manifest the behavior of random variables. Their underlying processes determine not only physics models but also statistical models written in terms of random variables and the link functions connecting physics and uncertainties. To my best understanding, bridging and inventing statistical models for astrophysics researches seem tough due to the lack of awareness of basics of probability theory.

Once I had a chance to observe a Decadal survey meeting, which covered so diverse areas in astronomy. They discussed new projects, advancing current projects, career developments, and a little bit about educating professional astronomers apart from public reach (which often receives more importance than university curriculum. I also believe that wide spread public awareness of astronomy is very important). What I missed while I observing the meeting is that interdisciplinary knowledge transferring efforts to broaden the field of astronomy and astrophysics nor curriculum design ideas. Because of its long history, I thought astronomy is a science of everything. Marching a path for a long time made astronomy more or less the most isolated and exclusive science.

Perhaps asking astronomy majors taking multiple statistics courses is too burdensome; therefore being taught by faculty who are specialized in (statistical) data analysis organizes a data analysis course and incorporates several hours of basic probability is more realistic and what I anticipate. With a few hours of bringing fundamental notions in random variables and probability, the claims of “statistical rigorous methods and powerful results” will become more appropriate. Currently, statistics is science but in astronomy literature, it looks more or less like an adjective that modify methods and results like “powerful”, “superior”, “excellent”, “better”, “useful,” and so on. Basics of probability is easily incorporated into introduction of algorithms in designing experiments and optimization methods, which are currently used in a brute force fashion[1].

Occasionally, I see gems from arxiv written by astronomers. Their expertise in astronomy and their interest in statistics has produced intriguing accounts for statistically rigorous data analysis and inference procedures. Their papers includes explanation of fundamentals of statistics and probability more appropriate to astronomers than statistics textbooks for scientists and engineers of different fields. I wish more astronomers join this venture knowing basics and diversities of statistics to rectify many unconscious misuses of statistics while they argue that their choice of statistics is the most powerful one thanks to plausible results.

  1. What I mean by a brute force fashion is that trying all methods listed in the software manual, and then later, stating that the method A gave most plausible values that matches with data in a scatter plot
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/a-short-note-on-probability-for-astronomers/feed/ 0
arxiv list http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-list/ http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-list/#comments Thu, 10 Dec 2009 21:18:36 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=4122 When I begin to subscribe arXiv/astro-ph and arXiv/stat, although only for a year I listed astro-ph papers featuring relatively advanced statistics, I also kept more papers relevant to astrostatistics beyond astro-ph or introducing hot topics in statistics and computer science for astronomical data applications. While creating my own arXiv as follows, I had a hope to write up short introductions of statistics that are unlikely known to most of astronomers (like my MADS) and matching subjects/targets in astronomy. I thought such effort could spawn new collaborations or could expand understanding of statistics among astronomers (see Magic Crystal). Well, I couldn’t catch up the growth rate and it’s about time to terminate the hope. However, I thought some papers can be useful to some slog subscribers. I hope they do.

  • [0704.1743] Fukugita, Nakamura, Okamura, et al (catalogue of morphologically classified galaxies from the SDSS database for trying various machine learning algorithms for automated classification)
  • [0911.1015] Gudendort, Segers ( Extreme-Value Copulas)
  • [0710.2024] Franz (Ratios: A short guide to confidence limits and proper use)
  • [0707.4473] Covey, Ivezic, Schlegel, Finkbeiner, et al. (Outliers in SDSS and 2MASS)
  • [0511503] (astro-ph) MNRAS,Nolan, Harva, Kaban, Raychaudhury, data driven bayesian approach
  • [0505017] (cs) Abellanas, Clavero, Hurtado, Delaunay depth
  • [0706.2704] (astro-ph) Wang, Zhang, Liu, Zhao (SDSS, kernel regression) Quantile regression can be applied
  • [0805.0056] Kong, Mizera, Quantile Tomography: using quantiles with multivariate data
  • [0907.5236] Gosh, Resnick Mean Excess Plots, Pareto
  • [0907.3454] Rinaldo, Wasserman (Low-Noise Density Clustering)
  • [0906.3979] Friendly (Golden Age of Statistical Graphics)
  • [0905.2819] Benjamini, Gavrilov (FDR control)
  • [0903.2651] Ambler, Silverman (Spatial point processes)
  • [0906.0562] Loubes, Rochet, Regularization with Approx. L^2 maximum entropy method
  • [0904.0430] Diederichs, Juditski, et al (Sparse NonGaussian Component Analysis)
  • [0905.0454] McWhirter,Proudler (eds) *Mathematics in Signal Processing V*
    [Tensor Decompositions, by *Peirre Comon*]
  • [0904.3842] Li, Dong (Dimension Reduction)
  • [0903.1283] Wiesel, Eldar, Hero III (Covariance estimation, graphical models)
  • [0904.1148] Beynaud-Bouret, Rivoirard
  • [0903.5147] cai, Zhou (Data driven BLock Thresholding approach to wavelet estimation)
  • [0905.0483] Harmany, Marcia, Willet (Sparse Poisson intensity reconstruction)
  • [0904.2949] Jhort, McKeague, van Keilegom (Empirical Likelihood)
  • [0809.3373] (astro-ph) Bailer-Jones, Smith, et al. (GAIA, SVM)
  • [0904.0156] Berger, Bernardo, Sun (formal definition of reference priors)
  • [0703360] (math.st) Drton *(LRTs and singularities)*
  • [0807.3719] Shi, Belkin, Bin Yu
  • [0903.5480] Andrieu, Roberts
  • [0903.3620] Casella, Consonni (Reconciling Model Selection and Prediction)
  • [0903.0447] Alqallaf, van Aelst et al (propa. outliers in multivariate data)
  • [0903.2654] Ambler, Silverman (Bayesian wvelet thresholding)
  • [0206366] (astro-ph) van de Weygaert, *Comis Foam*
  • [0806.0560] Noble, Nowak, Beyond XSPEC, ISIS
  • [0908.3553] Liang, Stochastic approximation (SAMC), Bayesian model selection
  • [0804.3829] Liu, Li, *Hao,* Jin
  • [0802.2097] Roelofs, Bassa, et al
  • [0805.3983] Carlberg, Sullivan, et al (Clusering of SN IA host galaxies)
  • [0808.0572] *Efron, Microarrays, Empirical Bayes, and Two groups model*
  • [0805.4264] Tempel, Einasto, Einasto, Saar, Anatomy of galaxy functions
  • [0909.0170] Estate, Khmaladze, Koul, (GoF problem for errors in nonparametric regression: dist’n free approach)
  • [0909.0608] *Liu, Lindsay*
  • [0702052] de Wit, Auchere (astro-ph, multispectral analysis of solar EUV images)
  • [0508651] Pires, Juin, Yvon, et al (astro-ph, Sunyaev-Zel’dovich clusters)
  • [0808.0012] Caticha (on slog, lectures on prob., entropy & stat. physics)
  • [0808.3587] Verbeke, Molenberghs, Beunckens, Model selection with incomplete data
  • [0806.1487] Scheider et al. Sim. and cos. inference: a statistical model for power spectra means and covarances.
  • [0807.4209] Adamakis, Morton-Jones, Walsh (solar physics, Bayes Factor)
  • [0808.3852] Diaconis, Khare, Saloff-Coste
  • [0807.3734] Rocha, Zhao, *Bin Yu* (SPLICE)
  • [0807.1005] Erven, Grunwald, Rooij ( … AIC-BIC dilemma)
  • [0805.2838] *E.L. Lehmann* (historical account)
  • [0805.4136] Genovese, Freeman, Wasserman, Nichol, Miller
  • [0806.3301] Tibshirani (not robert, but ryan)
  • [0706.3622] Wittek, Barko (physics,data-an)
  • [0805.4417] Georgakakis, et at (logN-logS, a bit fishy to me)
  • [0805.4141] Genovese, Perone-Pacifico, et al
  • [0806.3286] Chipman, George, McChulloch (BART)
  • [0710.2245] Efron (size, power, and FDR)
  • [0807.2900] Richards, Freeman, Lee, Schafer (PCA)
  • [0609042] (math.ST) Hoff (SVD)
  • [0707.0701] (cs.AI) Luss, d’Aspremont (Sparse PCA)
  • [0901.4252] Benko, Hardle, Kneip (Common Functional PC)
  • [0505017] (cs.CG) Abellanas, Claverol, Hutado (Delaunay depth)
  • [0906.1905] (astro-ph.IM) Guio, Achilleos, VOISE, Voronoi Image Segmentation algorithm
  • [0605610] (astro-ph) Sochting, Huber, Clowes, Howell (FSVS Cluster Catalogue, Voronoi Tessellation)
  • [0611473] (math.ST) Rigollet, Vert, Plug-in, Density Level Sets
  • [0707.0481] Lee, Nadler, Wasserman (Treelets)
  • [0805.4417] Georgakakis, et at (logN-logS, a bit fishy to me)
  • [0805.4141] Genovese, Perone-Pacifico, et al
  • [0806.3286] Chipman, George, McChulloch (BART)
  • [0710.2245] Efron (size, power, and FDR)
  • [0807.2900] Richards, Freeman, Lee, Schafer (PCA)
  • [0609042] (math.ST) Hoff (SVD)
  • [0707.0701] (cs.AI) Luss, d’Aspremont (Sparse PCA)
  • [0901.4252] Benko, Hardle, Kneip (Common Functional PC)
  • [0505017] (cs.CG) Abellanas, Claverol, Hutado (Delaunay depth)
  • [0906.1905] (astro-ph.IM) Guio, Achilleos, VOISE, Voronoi Image Segmentation algorithm
  • [0605610] (astro-ph) Sochting, Huber, Clowes, Howell (FSVS Cluster Catalogue, Voronoi Tessellation)
  • [0611473] (math.ST) Rigollet, Vert, Plug-in, Density Level Sets
  • [0707.0481] Lee, Nadler, Wasserman (Treelets)
  • [0805.2325] (astro-ph) Loh (block boostrap, subsampling)
  • [0901.0751] Chen, Wu, Yi (Copula, Semiparametric Markov Model)
  • [0911.3944] White, Khudanpur, Wolfe (Likelihood based Semi-Supervised Model Selection with applications to Speech Processing)
  • [0911.4650] Varoquaux, Sadaghiani
  • [0803.2344] Vossen
  • [0805.0269] Leach et al (Component Separation methods for the Plank mission: Appendix reviews various component separation/dimension reduction methods)
  • [0907.4728] Arlot, Celisse (survey of CV for model selection)
  • [0908.2503] Biau, Parta (sequential quantile prediction of time series)
  • [0905.4378] Ben-Haim, Eldar, (CRBound for Sparse Estimation)
  • [0906.3082] Cohen, Sackrowitz, Xu (Multiple Testing for dependent case)
  • [0906.3091] Sarkar, Guo (FDR)
  • [0903.5161] Rinner, Dickhaus, Roters (FDR)
  • [0810.4808] Huang, CHen (ANOVA, coefficient, F-test for local poly. regression)
  • [0901.4752] Chretien, (Robust est. of Gaussian mixtures)
  • [0908.2918] James, Wang, Zhu (Functional linear regression)
  • [0908.3961] Clifford, Cosma
  • [0906.3662] Lindquist (stat. anal. fMRI data)
  • [0706.1062] Clauset, Shalizi, Newman (PowerLaw dist’n)
  • [0712.0881] Zuo, Hastie, Tibshirani (DoF, Lasso)
  • [0712.0901] Jiang, Luan, Wang
  • [0705.4020] Chattopadhyay, Misra, et al (GRB, classification, model based)
  • [0707.1891] Holmberg, Nordstrom, Anderson (isochrones, calibration, Geneva-Copenhagen)
  • [0708.1510] Cobb, Bailyn, Connecting GRBs and galaxies:
  • [0705.2774] Kelly
  • [0708.0302] Chamger, James, Lambert, Wiel (incremental quantile, monitoring)
  • [0708.0169] Mikhail, Data-driven goodness of fit tests, attempts to generalize the theory of score tests
  • [0706.1495] Huskova, Kirch, Bootstrapping CI for the change point of time series
  • [0708.4030] Richer, Dotter, et al (NGC6397, GC, CMD, LF)
  • [0708.1071] Shepp, Statistical thinking: From Tukey to Vardi and beyond
  • [0708.0499] *Hunter, Wang, Hettmansperger *
  • [0704.0781] Cabrera, Firmani et al (Swift, long GRBs)
  • [0706.2590] Ramos, &Extreme Value Theory and the solar cycle (pareto dist’n, survival)*
  • [0706.2704] Wang, Zhang, Liu, Zhao (kernel regression, CV, redshift) <- quantile regression?
  • [0707.1611] Budavari, Szalay, (identification, Bayes factor)
  • [0707.1900] Vetere, Soffitta, et al. (GRB, BeppoSAX)
  • [0707.1982] Kim, *Liddle* (random matrix mass spectrum)
  • [0707.2064] Allen, (Star Formation, Bayesian)
  • [0011057] (hep-ex) Cranmer, Kernel Estimation in High Energy Physics
  • [0512484] (astro-ph) Mukherjee, Parkinson, Corasaniti, *Liddle* (model selection, dark energy)
  • [0701113] (astro-ph) Liddle (information criteria for astrophysical model selection)
  • [0810.2821] Cozman, concentration inequalities and LLNs under irrelevance of lower and upper expectations.
  • [0810.5275] Hall, Park, Samworth
  • [0709.1538] Einbeck, Evers, *Bailer-Jones*, localized principal components
  • [0804.4068] *Pires, Stark*, et al, LASTLens (week lensing)
  • [0804.0713] Delaigle, Hall, Meister
  • [0802.0131] (astro-ph) Bobin, Starck, Ottensamer (*Compressed Sensing* in Astronomy)
  • [0803.1708] Taylor, Worsley, (Random Fields of Multivariate Test Statistics, shape analysis)
  • [0803.1736] Salibian-Barrera, Yohai (high breakdown point robust regression, censored data)
  • [0803.4026] Amini, Wainwright, (Sparse Principal Components)
  • [0803.1752] Ren, (weighted empirical liklihood)
  • [0803.3863] Efron (simultaneous inference)
  • [0801.3552] Clifford, Cosma, probabilistic counting algorithms
  • [0802.1406] Blanchard, Roquain (multiple testing)
  • [0707.2877] van de Weygaert
  • [0806.3932] Vavrek, Balazs, Meszaros, etal (testing the randomness in the sky distribution of GRBs), MNRAS, 391(3), 2008
  • [0911.3769] Chan, Spatial clustering, LRT
  • [0911.3749] Hall, Miller
  • [0909.0184] Chan, Hall robust nearest neighbor methods for classifying high dimensional data
  • [0911.3827] Jung, Marron, PCA High Dim
  • [0911.3531] Owen, Karl Pearson’s meta analysis revisited
  • [0911.3501] Wang, Zhu, Zhou, Quantile regression varying coefficient models
  • [0505200] (physica) *Pilla, Loader, Taylor*
  • [0501289] (math.ST) *Meinshausen, Rice* Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses.
  • [0806.1326] Velez, Ariste, Semel (PCA, Sun, magnetic fields)
  • [0906.4582] *Belabbas, Wolfe*, PCA, high-dimensional data
  • [0903.3002] Huang, Zhang, Metaxas Learning with Structured Sparsity
  • [9209010] (gr-qc) Finn, Detection, Measurement, and Gravitational Radiation
  • [0112467] (astro-ph) Petrosian
  • [0103040] (astro-ph) Peebles, N-point correlation functions
  • [9912329] (astro-ph) Kerscher, Stat. analysis of large scale structure in the universe Minkowski functional and J function
  • [0107417] Connolly, Scranton, et al. Early SDSS
  • [0511503] (math.ST) Pilla, Loader, Volume-of-Tube Formula: Perturbation tests, mixture models, and scan statistics
  • [0503033] (astro-ph) Battye, Moss
  • [0504022] (astro-ph) Trotta, Applications of Bayes Model Selection to cosmological parameters
  • [0304301] (astro-ph) Nakamichi, Morikawa, AIC, is galaxy dist’n non-extensive and non-gaussian?
  • [0110230] (astro-ph) Nichol, Chong, Connolly, et al
  • [0806.1506] (astro-ph) Unzicker, Fischer, 2D galaxy dist’n, SDSS
  • [0304005] (astro-ph) Maller, McIntosh, et al. (Angular correlation funtion, Power spectrum)
  • [0108439] (astro-ph) Boschan (angular and 3D correlation functions)
  • [9601103 (astro-ph) Maddox, Efstathiou, Sutherland (sys errors, angular correlation function)
  • [0806.0520] Vio, Andreani
  • [0807.4672] Zhang, Johnson, Little, Cao
  • [0911.4546] Hobert, Roy, Robert
  • [0911.4207] Calsaverini, Vicente (information theory and copula)
  • [0911.4021] Fan, Wu, Feng (Local Quasi-Likelihood with a parametric guide) *
  • [0911.4076] Hall, Jin, Miller
  • [0911.4080] Genovese, Jin, Wasserman
  • [0802.2174] Faure, Kneib, et al. (strong lense, COSMOS)
  • [0802.1213] Bridle et al (Great08 Challenge)
  • [0711.0690] Davies, Kovac, Meise (Nonparametric Regression, Confidence regions and regularization)
  • [0901.3245] Nadler
  • [0908.2901] Hong, Meeker, McCalley
  • [0501221] (math) Cadre (Kernel Estimation of Density Level Sets)
  • [0908.2926] Oreshkin, Coates (Error Propagation in Particle Filters)
  • [0811.1663] *Lyons* (Open Statistical Issues in Particle Physics)
  • [0901.4392] Johnstone, Lu (Sparse Principle Component Analysis)
  • [0803.2095] Hall, Jin (HC)
  • [0709.4078] Giffin (… Life after Shannon)
  • [0802.3364] Leeb (model selection and evalutioin)
  • [0810.4752] Luxburg, Scholkopf (Stat. Learning Theory…)
  • [0708.1441] van de Weygaert, Schaap, The cosmic web: geometric analysis
  • [0804.2752] Buhlmann, Hothorn (Boosting algorithms…)
  • [0810.0944] Aydin, Pataki, Wang, Bullitt, Marron (PCA for trees)
  • [0711.0989] Chen (SDSS, volume limited sample)
  • [0709.1538] Einbeck, Evers, Bailer-Jones (Localized PC)
  • [0610835] (math.ST) Lehmann (On LRTs)
  • [0604410] (math.ST) Buntine, Discrete Component Analysis
  • [0707.4621] Hallin, Paindaveine (semiparametrically efficient rank-based inference I)
  • [0708.0079] Hallin, H. Oja, Paindaveine ( same as above II)
  • [0708.0976] Singh, Xie, Strawderman (confidence distribution)
  • [0706.3014] Gordon, Trotta (Bayesian calibrated significance levels.. the usage of p-values looks awkward)
  • [0709.0711] Quireza, Rocha-Pinto, Maciel
  • [0709.1208] Kuin, Rosen (measurement erros)
  • [0709.1359] Huertas-Company, et al (SVM, morphological classification)
  • [0708.2340] Miller, Kitching, Heymans, et. al. (Bayesian Galaxy Shape Measurement, weak lensing survey)
  • [0709.4316] Farchione, Kabaila (confidence intervals for the normal mean)
  • [0710.4245] Fearnhead, Papaspiliopoulos, Roberts (Particle Filters)
  • [0705.4199] (astro-ph) Leccardi, Molendi , an unbiased temp estimator for stat. poor X-ray specra (can be improved… )
  • [0712.1663] Meinshausen, *Bickel, Rice* (efficient blind search)
  • [0706.4108] *Bickel, Kleijn, Rice* (Detecting Periodicity in Photon Arrival Times)
  • [0704.1584] Leeb, Potscher (estimate the unconditional distribution of post model selection estimator)
  • [0711.2509] Pope, Szapudi (Shrinkage Est. Power Spectrum Covariance matrix)
  • [0703746] (math.ST) Flegal, Maran, Jones (MCMC: can we trush the third significant figure?)
  • [0710.1965] (physics.soc-ph) Volchenkov, Blanchard, Sestieri of Venice
  • [0712.0637] Becker, Silvestri, Owen, Ivezic, Lupton (in pursuit of LSST science requirements)
  • [0703040] Johnston, Teodoro, *Martin Hendry* Completeness I: revisted, reviewed, and revived
  • [0910.5449] Friedenberg, Genovese (multiple testing, remote sensing, LSST)
  • [0903.0474] Nordman, Stationary Bootstrap’s Variance (Check Lahiri99)
  • [0706.1062] (physics.data-an) Clauset, Shalizi, Newman (power law distributions in empirical data)
  • [0805.2946] Kelly, Fan, Vestergaard (LF, Gaussian mixture, MCMC)
  • [0503373] (astro-ph) Starck, Pires, Refregier (weak lensing mass reconstruction using wavelets)
  • [0909.0349] Panaretos
  • [0903.5463] Stadler, Buhlmann
  • [0906.2128] Hall, Lee, Park, Paul
  • [0906.2530] Donoho, Tanner
  • [0905.3217] Hirakawa, Wolfe
  • [0903.0464] Clarke, Hall
  • [0701196] (math) Lee, Meng
  • [0805.4136] Genovese, Freeman, Wasserman, NIchol, Miller
  • [0705.2774] Kelly
  • [0910.1473] Lieshout
  • [0906.1698] Spokoiny
  • [0704.3704] Feroz, Hobson
  • [0711.2349] Muller, Welsh
  • [0711.3236] Kabaila, Giri
  • [0711.1917] Leng
  • [0802.0536] Wang
  • [0801.4627] Potscher, Scheider
  • [0711.0660] Potscher, Leeb
  • [0711.1036] Potscher
  • [0702781] (math.st) Potscher
  • [0711.0993] Kabaila, Giri
  • [0802.0069] Ghosal, Lember, Vaart
  • [0704.1466] Leeb, Potscher
  • [0701781] (math) Grochenig, Potscher, Rauhut
  • [0702703] (math.ST) Leeb, Potscher
  • [astro-ph:0911.1777] Computing the Bayesian Evidence from a Markov Chain Monte Carlo Simulation of the Posterior Distribution (Martin Weinberg)
  • [0812.4933] Wiaux, Jacques (Compressed sensing, interferometry)
  • [0708.2184] Sung, Geyer
  • [0811.1705] Meyer
  • [0811.1700] Witten, Tibshirani
  • [0706.1703] Land, SLosar
  • [0712.1458] Loh, Zhu
  • [0808.4042] Commenges
  • [0806.3978] Vincent Vu, Bin Yu, Robert Kass
  • [0808.4032] Stigler
  • [0805.1944] astro-ph
  • [0807.1815] Cabella, Marinucci
  • [0808.0777] Buja, Kunsch
  • [0809.1024] Xu, Grunwald
  • [0807.4081] Roquain, Wiel
  • [0806.4105] Rofling, Ribshirani
  • [0808.0657 HUbert, Rousseeuw, Aelst
  • [0112467] (astro-ph) Petrosian
  • [0808.2902] Robert, Casella, A History of MCMC
  • [0809.2754] Grunwald, VItanyi, Algorithmic INofmration THeory
  • [0809.4866] Carter, Raich, Hero, An information geometric framework for DImensionality reduction
  • [0809.5032] Allman, Matias, Rhodes
  • [0811.0528] Owen
  • [0811.0757] Chamandy, Tayler, Gosselin
  • [0810.3985] Stute, Wang
  • [0804.2996] Stigler
  • [0807.4086] Commenges, Sayyareh, Letenneur…
  • [0710.5343] Peng, Paul, MLE, functional PC, sparse longitudinal data
  • [0709.1648] Cator, Jongbloed, et al. *Asymptotics: Particles, Processes, and Inverse problems*
  • [0710.3478] *Hall, Qiu, Nonparametric Est. of a PSF in Multivariate Problems*
  • [0804.3034] Catalan, Isern, Carcia-Berro, Ribas (some stellar clusters, LF, Mass F, weighted least square)
  • [0801.1081] Hernandez, Valls-Gabaud, estimation of basic parameters, stellar populations
  • [0410072] (math.ST) Donoho, Jin, HC, detecting sparse heterogeneous mixtures
  • [0803.3863] Efron
  • [0706.4190] Rondonotti, Marron, Park, SiZer for time series
  • [0709.0709] Lian, Bayes and empirical Bayes changepoint problems
  • [0802.3916] Carvalho, Rocha, Hobson, PowellSnakes
  • [0709.0300] Roger, Ferrera, Lahav, et al, Decoding the spectra of SDSS early-type galaxies
  • [0810.4807] Pesquet, et al. SURE, Signal/Image Devonvolution
  • [0906.0346] (cs.DM) Semiparametric estimation of a noise model with quantization errors
  • [0207026] (hep-ex) Barlow, Systematic Errors: Facts and Fictions
  • [0705.4199, Leccardi, Molendi, unbiased temperature estimator for statistically poor x-ray spectra
  • [0709.1208] Kuin, Rosen, measurement error Swift
  • [0708.4316] Farchione, *Kabila* confidence intervals for the normal mean utilizing prior information
  • [0708.0976] Singh, Xia, Strawderman confidence distribution
  • [0901.0721] Albrecht, et al. (dark energy)
  • [0908.3593] Singh, Scott, Nowak, adaptive hausdorff estimation of density level sets
  • [0702052] (astro-ph) de Wit, Auchere, Multipectral analysis, sun, EUV, morphology
  • [0706.1580] Lopes, photometric redshifts, SDSS
  • [0106038] (astro-ph) Richards et al photometric redshifts of quasars
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-list/feed/ 0
[Book] Elements of Information Theory http://hea-www.harvard.edu/AstroStat/slog/2009/book-elements-of-information-theory/ http://hea-www.harvard.edu/AstroStat/slog/2009/book-elements-of-information-theory/#comments Wed, 11 Mar 2009 17:04:26 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1506 by T. Cover and J. Thomas website: http://www.elementsofinformationtheory.com/

Once, perhaps more, I mentioned this book in my post with the most celebrated paper by Shannon (see the posting). Some additional recommendation of the book has been made to answer offline inquiries. And this book always has been in my favorite book list that I like to use for teaching. So, I’m not shy with recommending this book to astronomers with modern objective perspectives and practicality. Before advancing for more praises, I must say that those admiring words do not imply that I understand every line and problem of the book. Like many fields, Information theory has grown fast since the monumental debut paper by Shannon (1948) like the speed of astronomers observation techniques. Without the contents of this book, most of which came after Shannon (1948), internet, wireless communication, compression, etc could not have been conceived. Since the notion of “entropy“, the core of information theory, is familiar to astronomers (physicists), the book would be received better among them than statisticians. This book should be read easier to astronomers than statisticians.

My reason for recommending this book is that, personally thinking, having some knowledge in information theory (in data compression and channel capacity) would help to resolve limited bandwidth in the era of massive unprecedented astronomical survey projects with satellites or ground base telescopes.

The content can be viewed from the aspect of applied probability; therefore, the basics of probability theories including distributions and uncertainties become familiar to astronomers than indulging probability textbooks.

Many of my [MADS] series are motivated by the content of this book, where I learned many practical data processing ideas and objectives (data compression, data transmission, network information theory, ergodic theory, hypothesis testing, statistical mechanic, quantum mechanics, inference, probability theory, lossless coding/decoding, convex optimization, etc) although those [MADS] postings are not visible on the slog yet (I hope I can get most of them through within several months; otherwise, someone should continue my [MADS] and introducing modern statistics to astronomers). The most commonly practiced ideas in engineering could help accelerating the data processing procedures in astronomy and turning astronomical inference processes more efficient and consistent, which have been neglected because of many other demands. Here, I’d rather defer discussing details of particular topics from the book and describing how astronomers applied them (There are quite hidden statistical jewels from ADS but not well speculated). Through [MADS], I will discuss further more, how information theory could help processing astronomical data from data collecting, pipe-lining, storing, extracting, and exploring to summarizing, modeling, estimating, inference, and prediction. Instead of discussing topics of the book, I’d like to quote interesting statements in the introductory chapter of the book to offer delicious flavors and to tempt you for reading it.

… it [information theory] has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam’s Razor: The simplest explanation is best), and to probability and statistics (error exponents for optimal hypothesis testing and estimation).

… information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory), and computer science (algorithmic complexity).

There is a pleasing complementary relationship between algorithmic complexity and computational complexity. One can think about computational complexity (time complexity) and Kolmogorov complexity (program length or descriptive complexity) as two axes corresponding to program running time and program length. Kolmogorov complexity focuses on minimizing along the second axis, and computational complexity focuses on minimizing along the first axis. Little work has been done on the simultaneous minimzation of the two.

The concept of entropy in information theory is related to the concept of entropy in statistical mechanics.

In addition to the book’s website, googling the title will show tons of links spanning from gambling/establishing portfolio to computational complexity, in between there are statistics, probability, statistical mechanics, communication theory, data compression, etc where the order does not imply relevance or importance of the subjects. Such broad notion is discussed in the intro chapter. If you have the book in your hand, regardless of their editions, you might first want to check Fig. 1.1 “Relationship of information theory to other fields” a diagram explaining connections and similarities among these subjects.

Data analysis tools, methods, algorithms, and theories including statistics (both exploratory data analysis and inference) should follow the direction of retrieving meaningful information from observations. Sometimes, I feel that priority is lost, ship without captain, treating statistics or information science as black box without any interests of knowing what’s in there.

I don’t know how many astronomy departments offer classes for data analysis, data mining, information theory, machine learning, or statistics for graduate students. I saw none from my alma matter although it offers the famous summer school recently. The closest one I had was computational physics, focusing how to solve differential equations (stochastic differential equations were not included) and optimization (I learned the game theory, unexpected. Overall, I am still fond of what I learned from that class). I haven’t seen any astronomy graduate students in statistics classes nor in EE/CS classes related to signal processing, information theory, and data mining (some departments offer statistics classes for their own students, like the course of experimental designs for students of agriculture science). Not enough educational efforts for the new information era and big survey projects is what I feel in astronomy. Yet, I’m very happy to see some apprenticeships to cope with those new patterns in astronomical science. I only hope it grows, beyond a few small guilds. I wish they have more resources to make their works efficient as time goes.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/book-elements-of-information-theory/feed/ 0