]]>A tremendous amount of information is contained within the temporal variations of various measurable quantities, such as the energy distributions of the incident photons, the overall intensity of the source, and the spatial coherence of the variations. While the detection and interpretation of periodic variations is well studied, the same cannot be said for non-periodic behavior in a multi-dimensional domain. Methods to deal with such problems are still primitive, and any attempts at sophisticated analyses are carried out on a case-by-case basis. Some of the issues we seek to focus on are methods to deal with are:

* Stochastic variability

* Chaotic Quasi-periodic variability

* Irregular data gaps/unevenly sampled data

* Multi-dimensional analysis

* Transient classificationOur goal is to present some basic questions that require sophisticated temporal analysis in order for progress to be made. We plan to bring together astronomers and statisticians who are working in many different subfields so that an exchange of ideas can occur to motivate the development of sophisticated and generally applicable algorithms to astronomical time series data. We will review the problems and issues with current methodology from an algorithmic and statistical perspective and then look for improvements or for new methods and techniques.

Phillips Auditorium, CfA,

60 Garden St., Cambridge, MA 02138

URL: http://hea-www.harvard.edu/AstroStat/CAS2010

]]>The California-Boston-Smithsonian Astrostatistics Collaboration plans to host a mini-workshop on Computational Astro-statistics. With the advent of new missions like the Solar Dynamic Observatory (SDO), Panoramic Survey and Rapid Response (Pan-STARRS) and Large Synoptic Survey (LSST), astronomical data collection is fast outpacing our capacity to analyze them. Astrostatistical effort has generally focused on principled analysis of individual observations, on one or a few sources at a time. But the new era of data intensive observational astronomy forces us to consider combining multiple datasets and infer parameters that are common to entire populations. Many astronomers really want to use every data point and even non-detections, but this becomes problematic for many statistical techniques.

The goal of the Workshop is to explore new problems in Astronomical data analysis that arise from data complexity. Our focus is on problems that have generally been considered intractable due to insufficient computational power or inefficient algorithms, but are now becoming tractable. Examples of such problems include: accounting for uncertainties in instrument calibration; classification, regression, and density estimations of massive data sets that may be truncated and contaminated with measurement errors and outliers; and designing statistical emulators to efficiently approximate the output from complex astrophysical computer models and simulations, thus making statistical inference on them tractable. We aim to present some issues to the statisticians and clarify difficulties with the currently used methodologies, e.g. MCMC methods. The Workshop will consist of review talks on current Statistical methods by Statisticians, descriptions of data analysis issues by astronomers, and open discussions between Astronomers and Statisticians. We hope to define a path for development of new algorithms that target specific issues, designed to help with applications to SDO, Pan-STARRS, LSST, and other survey data.

We hope you will be able to attend the workshop and present a brief talk on the scope of the data analysis problem that you confront in your project. The workshop will have presentations in the morning sessions, followed by a discussion session in the afternoons of both days.

As the post title [MADS] indicates, no abstract has keywords **multiscale modeling.** It seems like that just the jargon is not listed in ADS since “multiscale modeling” is practiced in astronomy. One of examples is our group’s work. Those CHASC scientists take Bayesian modeling approaches, which makes them unique to my knowledge in the astronomical society. However, I expected constructing an intensity map through statistical inference (estimation) or “multiscale modeling” to be popular among astronomers in recent years. Well, none came along from my abstract keyword search.

Wikipedia also shows a very brief description of **multiscale modeling** and emphasized that it is a fairly new interdisciplinary topic. wiki:multiscale_modeling. TomLoredo kindly informed me some relevant references from ADS after my post [MADS] HMM. He mentioned his search words were **Markov Random Fields** which can be found from __ stochastic geometry__ and

- Quantifying Doubt and Confidence in Image “Deconvolution” by Connors, Alanna; van Dyk, D.; Chiang, J.; CHASC
- Blind Bayesian restoration of adaptive optics telescope images using generalized Gaussian Markov random field models by Jeffs, Brian D.; Christou, Julian C.
- Segmenting Chromospheric Images with Markov Random Fields (paper in SCMA II) Turmon, Michael J.; Pap, Judit M.
- Bayesian deconvolution methods in astronomy by Molina, R.; Katsaggelos, A. K.; Mateos, J
- Compound Gauss-Markov random fields for astronomical image restoration by Molina, R.; Katsaggelos, A. K.; Mateos, J.; Abad, J
- Markov random field applications in image analysis by Jain, A. K.; Nadabar, S. G (I bet “Jain” is the author of many celebrated papers in image processing and machine learning. I often find that well known computer scientists involve in astronomical researches ).

The reason I was not able to find these papers was that they are not published in the 4 major astronomical publications + Solar Physics. The reason for this limited search is that I was overwhelmed by the amount of unlimited search results including arxiv. (I wonder if there is a way to do exclusive searches in ADS by excluding arxiv:comp, arxiv:phys, arxiv:math, etc). Thank you, Tom, for providing me these references.

Please, check out CHASC website for more study results related to “multiscale modeling” from our group.

**[Added]** Nice tutorials related to Markov Random Fields (MRF) recommended by an expert in the field and a friend (all are pdfs).

When it comes to **survey,** the first thing comes in my mind is the census packet although I only saw it once (an easy way to disguise my age but this is true) but the questionaire layouts are so carefully and extensively done so as to give me a strong impression. Such survey is designed prior to collecting data so that after collection, data can be analyzed according to statistical methodology suitable to the design of the survey. Strategies for response quantification is also included (yes/no for 0/1, responses in 0 to 10 scale, bracketing salaries, age groups, and such, handling missing data) for elaborated statistical analysis to avoid subjective data transformation and arbitrary outlier eliminations.

In contrast, survey in astronomy means designing a mesh, not questionaires, unable to be transcribed into statistical models. This mesh has multiple layers like telescope, detector, and source detection algorithm, and eventually produces a catalog. Designing statistical methodology is not a part of it that draws interpretable conclusion. Collecting what goes through that mesh is astronomical survey. Analyzing the catalog does not necessarily involve sophisticated statistics but often times adopts chi-sq fittings and cast aways of unpleasant/uninteresting data points.

As other conflicts in jargon, –a simplest example is H_{o}: I used to know it as **Hubble constant** but now, it is recognized first as a notation for a **null hypothesis** — **survey** has been one of them and like the measurement error, some clarification about the term, **survey** is expected to be given by knowledgeable astrostatisticians to draw more statisticians involvement in grand survey projects soon to come. Luckily, the first opportunity will be given soon at the Special Session: Meaning from Surveys and Population Studies: BYOQ during the 213 AAS meeting, at Long Beach, California on Jan. 5th, 2009.

There are various tests applicable according to needs and conditions from data and source models but it seems no popular astronomical lexicons have these on demand tests except the LRT (Once I saw the score test since posting [ArXiv]s in the slog and a few non-parametric rank based tests over the years). I’m sure well knowledgeable astronomers soon point out that I jumped into a conclusion too quickly and bring up counter-examples. Until then, be advised that your LRTs, χ^2 tests, and F-tests ask for your statistical attention prior to their applications for any statistical inferences. These tests are not magic crystals, producing answers you are looking for. To bring such care and attentions, here’s a thought provoking titled paper that I found some years ago.

The LRT is worthless for testing a mixture when the set of parameters is large

JM Azaıs, E Gassiat, C Mercadier (click here :I found it from internet but it seems the link was on and off and sometimes was not available.)

Here, quotes replace theorems and their proofs^{[1]} :

- We prove in this paper that the LRT is worthless from testing a distribution against a two components mixture when the set of parameters is large.
- One knows that the traditional Chi-square theory of Wilks[16
^{[2]}] does not apply to derive the asymptotics of the LRT due to a lack of__identifiability__of the alternative under the null hypothesis. - …for unbounded sets of parameters, the LRT statistic tends to infinity in probability, as Hartigan[7
^{[3]}] first noted for normal mixtures. - …the LRT cannot distinguish the null hypothesis (single gaussian) from any contiguous alternative (gaussian mixtures). In other words, the LRT is worthless
^{[4]}.

For astronomers, the large set of parameters are of no concern due to theoretic constraints from physics. Experiences and theories bound the set of parameters small. Sometimes, however, the distinction between small and large sets can be vague.

The characteristics of the LRT is well established under the compactness set assumption (either compact or bounded) but troubles happen when the limit goes to the boundary. As cited before in the slog a few times, readers are recommend to read for more rigorously manifested ideas from astronomy about the LRT, Protassov, et.al. (2002) Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test, ApJ, 571, p. 545

- Readers might want to look for mathematical statements and proofs from the paper
- S.S.Wilks. The large sample distribution of the likelihood ratio for testing composite hypothesis, Ann. Math. Stat., 9:60-62, 1938
- J.A.Hartigan, A failure of likelihood asymptotics for normal mixtures. in Proc. Berkeley conf. Vol. II, pp.807-810
- Comment to Theorem 1. They proved the lack of worth in the LRT under more general settings, see Theoremm 2

[astro-ph:0712.2820]

The Production Rate and Employment of Ph.D. AstronomersT.S. Metcalfe

Related Comments:

- Much more jobs than I expected. However, it cannot compete with jobs in Statistics.
- Three jobs before having a stable one in astronomy. Do not know in statistics.
- Astronomy Ph.D. students receive more cares, in a sense that the job market is controlled to guarantee a position for every student. In statistics, without care you can find something (not necessary a research position).

Unrelated Comment on Correlation:

It’s a cultural difference. Maybe not. When I learned correlation years ago from a textbook, the procedure is, 1. compute the correlation and 2. do a t-test. In astronomical papers, 1. do regression and 2. plot the simple linear regression line with error bands and data points. The computing procedure is same but the way illustrates the results seems different.

* I wonder what would it be like when we narrow the job market for astrostatisticians. *

Not all agree with it. I myself changed this biased opinion to the lesser degree. However, I still think astronomers appear to be full of pride from the eyes of non astronomers.

Having educations from both fields, while learning statistics and being away from astronomy for some years, I’ve tried to talk astronomers if I have some chance. For years I couldn’t get rid of the impression that astronomers are arrogant in a sense that they believe they can do statistics without the help of statisticians. When I was looking for a job as statistician in astronomy, most of astronomers said they only need technicians who are good at coding algorithms including statistical ones. Their behind implication looked as only astronomers do statistics and statistical data analysis with astronomical data.

However, as mentioned above, my opinion has been changed by some degree after coming to CfA and working with CHASC. The reasons for this change are:

- Assistants can serve arrogant chiefs (this is for a laugh).
- When you work with astronomers, you don’t feel they are arrogant at all. Astronomers are very passionate about what they are doing and have very pure minds. The passion and objectivity of astronomers is different from that of statisticians, which gave a little barrier me talking to astronomers, particularly when I was looking for a job.
- Most importantly, astronomers know better in their data, in general very expensive, and the data peculiarity, partially discussed in Quote of the week, Aug. 31, 2007 hinders quick collaborations between two communities (Instead of explaining instruments and physics to statisticians years and years, it could be quicker that astronomers do statistics).

As we separate probability, theoretical statistics, biostatistics, spatial statistics, bioinformatics, applied statistics, data mining, machine learning, etc, I hope astrostatistics could share its position equivalently with other subfields in the statistics community. I hope more astronomers explain astronomy to statisticians with patience. Eventually, the impression of proud astronomers will die and many statistics for improved estimation and inference will born.

[Addendum] A few senior statisticians, in a casual fashion, expressed that my interests in astrostatistics is reckless.

]]>This paper presented an observational study of a globular cluster, named NGC 6397, enhanced and more informative compared to previous observations in a sense that 1) a truncation in the white dwarf cooling sequence occurs at 28 magnitude, 2) the cluster main sequence seems to terminate approximately at the hydrogen-burning limit predicted by two independent stellar evolution models, and 3) luminosity functions (LFs) or mass functions (MFs) are well defined. Nothing statistical, but the idea of defining color magnitude diagrams (CMDs) and LFs described in the paper, will assist developing suitable statistics on CMD and LF fitting problems in addition to the improved measurements (ACS imaging) of stars in NGC 6397.

Instead of adding details of data properties and calibration process including the instrument characteristics, I like to add a few things for statisticians: First, ACS stands for Advance Camera of Surveys and its information can be found at this link. Second, NGC is an abbreviation of New General Catalogue, one of astronomers’ cataloging systems (click for its wiki). Third, CMDs and LFs are results of data processing, described in the paper, but can be considered as scatter plots and kernel density plots (histograms) to be analyzed for inferencing physical parameters. This data processing, or calibration requires multi-level transformations, which cause error propagation. Finally, the chi-square method is incorporated to fit LFs and MFs. Among numerous fitting methods, in astronomy, only the chi-square is ubiquitously used (link to a discussion on the chi-square). **Could we develop more robust statistics for fitting astronomical (empirical) functions?**

My assertion is that I find replicated results more convincing than extreme p-values. And the controversial part: Astronomers should aim for replication rather than worry about 5-sigma.]]>

David van Dyk (representing statistics culture):

Can’t you look at it again? Collect more data?

Vinay Kashyap (representing astronomy and physics culture):

…I can confidently answer this question: no, alas, we usually cannot look at it again!!

Ah. Hmm. To rephrase [the question]: if you have a “7.5 sigma” feature, with a day-long [imaging Markov Chain Monte Carlo] run you can only show that it is “>3sigma”, but is it possible, even with that day-long run, to tell that the feature is

really at 7.5sigma — is that the question? Well that would be nice, but I don’t understand how observing again will help?

]]>No one believes any realistic test is properly calibrated that far into the tail. Using 5-sigma is really just a high bar, but the precise calibration will never be done. (This is a reason not to sweet the computation TOO much.)

Most other scientific areas set the bar lower (2 or 3 sigma) BUT don’t really believe the results unless they are replicated.

My assertion is that I find replicated results more convincing than extreme p-values. And the controversial part: Astronomers should aim for replication rather than worry about 5-sigma.

The authors applied MATCH (Dolphin, 2002^{[1]} -note that the year is corrected) to M13, M15, M92, NGC2419, NGC6229, and Pal14 (well known globular clusters), and BooI, BooII, CvnI, CVnII, Com, Her, LeoIV, LeoT, Segu1, UMaI, UMaII and Wil1 (newly discovered Milky Way satellites) from Sloan Digital Sky Survey (SDSS) to fit Color Magnitude diagrams (CMDs) of these stellar clusters and find the properties of these satellites.

A traditional CMD fitting begins with building synthetic CMDs: Completeness of SDSS Data Release 5, Hess diagram (a bivariate histogram from a CMD), and features in MATCH for CMD synthesis were taken into account. The synthetic CMDs of these well known globular clusters were utilized with the observations from SDSS and compared to previous discoveries to validate their modified MATCH for the SDSS data sets. Afterwards, their method was applied to the newly discovered Milky Way satellites and discussion on their findings of these satellites was presented.

The paper provides plots that enhance the understanding of age, metalicity, and other physical parameter distributions of stellar clusters after they were fit with synthetic CMDs. The paper also describes steps and tricks (to a statistician, the process of simulating stars looks very technical without a mathematical/probabilistic justification) to acquire proper synthetic CMDs that match observations. The paper adopted Padova database of stellar evolutionary tracks and isochrones (there are other databases beyond Padova).

At last, I’d like to add a sentence from their paper, which supports my idea that a priori knowledge in choosing a proper isochrone database is necessary.

In the case of M15, this is due to the blue horizontal branch (BHB) stars that are not properly reproduced by the theoretical isochrones, causing the code to fit them as a younger turn-off.

- Numerical methods of star formation history measurement and applications to seven dwarf spheroidals,Dolphin (2002), MNRAS, 332, p. 91