The AstroStat Slog » Search Results » google+sky http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 News and related stories http://hea-www.harvard.edu/AstroStat/slog/2009/news-and-related-stories/ http://hea-www.harvard.edu/AstroStat/slog/2009/news-and-related-stories/#comments Mon, 27 Jul 2009 11:09:18 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3180 I’m getting behind these days because of chasing too many rabbits. One of those rabbits is hunting online lectures useful for everyone. Prof. Feynman’s lectures have great reputations but they have been hard to come by. I once listened to a pirate version of his lecture tape with horrible sound quality. Thanks to Bill Gates and Microsoft Research, although it is a belated news, I’m very delighted to say “Feynman lectures are online.”

I once described how iconic Prof. Richard Feynman is (see Feynman and Statistics). At last, these lectures are publicly viewable through Project Tuva. Not knowing what this Project Tuva is, naturally I checked wikipedia, from which I found it’s related WorldWide Telescope which runs on Silverlight by Microsoft Research. Virtual Observatory is one of the most sought projects in astronomy. Several postings related to Google Sky are available here but not much about WorldWide Telescope. I attribute its lack of discussion in the slog to its late debut. Also, the fact that renown astronomers are working on site for the WorldWide Telescope has pressured me. Please, visit worldwidetelescope.org.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/news-and-related-stories/feed/ 1
accessing data, easier than before but… http://hea-www.harvard.edu/AstroStat/slog/2009/accessing-data/ http://hea-www.harvard.edu/AstroStat/slog/2009/accessing-data/#comments Tue, 20 Jan 2009 17:59:56 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=301 Someone emailed me for globular cluster data sets I used in a proceeding paper, which was about how to determine the multi-modality (multiple populations) based on well known and new information criteria without binning the luminosity functions. I spent quite time to understand the data sets with suspicious numbers of globular cluster populations. On the other hand, obtaining globular cluster data sets was easy because of available data archives such as VizieR. Most data sets in charts/tables, I acquire those data from VizieR. In order to understand science behind those data sets, I check ADS. Well, actually it happens the other way around: check scientific background first to assess whether there is room for statistics, then search for available data sets.

However, if you are interested in massive multivariate data or if you want to have a subsample from a gigantic survey project, impossible all to be documented in contrast to those individual small catalogs, one might like to learn a little about Structured Query Language (SQL). With nice examples and explanation, some Tera byte data are available from SDSS. Instead of images in fits format, one can get ascii/table data sets (variables of million objects are magnitudes and their errors; positions and their errors; classes like stars, galaxies, AGNs; types or subclasses like elliptical galaxies, spiral galaxies, type I AGN, type Ia, Ib, Ic, and II SNe, various spectral types, etc; estimated variables like photo-z, which is my keen interest; and more). Furthermore, thousands of papers related to SDSS are available to satisfy your scientific cravings. (Here are Slog postings under SDSS tag).

If you don’t want to limit yourself with ascii tables, you may like to check the quick guide/tutorial of Gator, which aggregated archives of various missions: 2MASS (Two Micron All-Sky Survey), IRAS (Infrared Astronomical Satellite), Spitzer Space Telescope Legacy Science Programs, MSX (Midcourse Space Experiment), COSMOS (Cosmic Evolution Survey), DENIS (Deep Near Infrared Survey of the Southern Sky), and USNO-B (United States Naval Observatory B1 Catalog). Probably, you also want to check NED or NASA/IPAC Extragalactic Database. As of today, the website said, 163 million objects, 170 million multiwavelength object cross-IDs, 188 thousand associations (candidate cross-IDs), 1.4 million redshifts, and 1.7 billion photometric measurements are accessible, which seem more than enough for data mining, exploring/summarizing data, and developing streaming/massive data analysis tools.

Probably, astronomers might wonder why I’m not advertising Chandra Data Archive (CDA) and its project oriented catalog/database. All I can say is that it’s not independent statistician friendly. It is very likely that I am the only statistician who tried to use data from CDA directly and bother to understand the contents. I can assure you that without astronomers’ help, the archive is just a hot potato. You don’t want to touch it. I’ve been there. Regardless of how painful it is, I’ve kept trying to touch it since It’s hard to resist after knowing what’s in there. Fortunately, there are other data scientist friendly archives that are quite less suffering compared to CDA. There are plethora things statisticians can do to improve astronomers’ a few decade old data analysis algorithms based on Gaussian distribution, iid assumption, or L2 norm; and to reflect the true nature of data and more relaxed assumptions for robust analysis strategies than for traditionally pursued parametric distribution with specific models (a distribution free method is more robust than Gaussian distribution but the latter is more efficient) not just with CDA but with other astronomical data archives. The latter like vizieR or SDSS provides data sets which are less painful to explore with without astronomical software/package familiarity.

Computer scientists are well aware of UCI machine learning archive, with which they can validate their new methods with previous ones and empirically prove how superior their methods are. Statisticians are used to handle well trimmed data; otherwise we suggest strategies how to collect data for statistical inference. Although tons of data collecting and sampling protocols exist, most of them do not match with data formats, types, natures, and the way how data are collected from observing the sky via complexly structured instruments. Some archives might be extensively exclusive to the funded researchers and their beneficiaries. Some archives might be super hot potatoes with which no statistician wants to involve even though they are free of charges. I’d like to warn you overall not to expect the well tabulated simplicity of text book data sets found in exploratory data analysis and machine learning books.

Some one will raise another question why I do not speculate VOs (virtual observatories, click for slog postings) and Google Sky (click for slog postings), which I praised in the slog many times as good resources to explore the sky and to learn astronomy. Unfortunately, for the purpose of direct statistical applications, either VOs or Google sky may not be fancied as much as their names’ sake. It is very likely spending hours exploring these facilities and later you end up with one of archives or web interfaces that I mentioned above. It would be easier talking to your nearest astronomer who hopefully is aware of the importance of statistics and could offer you a statistically challenging data set without worries about how to process and clean raw data sets and how to build statistically suitable catalogs/databases. Every astronomer of survey projects builds his/her catalog and finds common factors/summary statistics of the catalog from the perspective of understanding/summarizing data, the primary goal of executing statistical analyses.

I believe some astronomers want to advertise their archives and show off how public friendly they are. Such advertising comments are very welcome because I intentionally left room for those instead of listing more archives I heard of without hands-on experience. My only wish is that more statisticians can use astronomical data from these archives so that the application section of their papers is filled with data from these archives. As if with sunspots, I wish that more astronomical data sets can be used to validate methodologies, algorithms, and eventually theories. I sincerely wish that this shall happen in a short time before I become adrift from astrostatistics and before I cannot preach about the benefits of astronomical data and their archives anymore to make ends meet.

There is no single well known data repository in astronomy like UCI machine learning archive. Nevertheless, I can assure you that the nature of astronomical data and catalogs bear various statistical problems and many of those problems have never been formulated properly towards various statistical inference problems. There are so many statistical challenges residing in them. Not enough statisticians bother to look these data because of the gigantic demands for statisticians from uncountably many data oriented scientific disciplines and the persistent shortage in supplies.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/accessing-data/feed/ 3
Killer App http://hea-www.harvard.edu/AstroStat/slog/2008/killer-app/ http://hea-www.harvard.edu/AstroStat/slog/2008/killer-app/#comments Sun, 19 Oct 2008 03:27:19 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=1069 The iPhone is an amazing device. I have heard that some people use it as a phone, too, but it really is an extraordinary portable computer. It is faster and more powerful than the Sparcstations I used as a grad student, and will fit into your pocket. And most importantly, you can fit an entire planetarium on it.

There are many good planetarium programs that you can access on laptops, but it is really not that much fun to lug them around on camping trips or even out on to the roof at night. But now, thanks to the iPhone (and the iPod Touch) there has been a great leap forward.

The iTunes AppStore now has a number of astronomy themed apps, including apps that tell you the distance to the Moon correct to a meter. But the most impressive of the lot has to be the ones that produce skycharts and let you search for and find stars, constellations, and deep sky objects at any time, from anywhere. There are four such available now: Starmap, GoSkyWatch, iAstronomica, and iStellar.

I have only tried Starmap so far, and it is incredible. The developer says that there is a PRO version in the works, but this one is already plenty good for me.

It is quite well known that, unlike amateur astronomers, professional astronomers are quite ignorant of the night sky. Really, if someone turns us around to face North, we might figure out where Polaris is, but that’s it. Oh, and we can usually find the Moon. And daytime, we can point to where the Sun is, provided it is not cloudy, which though it often is in New England. True story: I still haven’t set eyes on the star which formed the basis of my PhD thesis (α Triangulum Australis; in my defence, it is only visible from the southern hemisphere). But all that is in the past, now I can rediscover my amateur roots, now I am feeling pretty confident that I can find anything, even dear old α TrA, all I need to do is cross the Equator and point with my tricorder.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/killer-app/feed/ 4
The LRT is worthless for … http://hea-www.harvard.edu/AstroStat/slog/2008/the-lrt-is-worthless-for/ http://hea-www.harvard.edu/AstroStat/slog/2008/the-lrt-is-worthless-for/#comments Fri, 25 Apr 2008 05:48:06 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=214 One of the speakers from the google talk series exemplified model based clustering and mentioned the likelihood ratio test (LRT) for defining the number of clusters. Since I’ve seen the examples of ill-mannerly practiced LRTs from astronomical journals, like testing two clusters vs three, or a higher number of components, I could not resist indicating that the LRT is improperly used from his illustration. As a reply, the citation regarding the LRT was different from his plot and the test was carried out to test one component vs. two, which closely observes the regularity conditions. I was relieved not to find another example of the ill-used LRT.

There are various tests applicable according to needs and conditions from data and source models but it seems no popular astronomical lexicons have these on demand tests except the LRT (Once I saw the score test since posting [ArXiv]s in the slog and a few non-parametric rank based tests over the years). I’m sure well knowledgeable astronomers soon point out that I jumped into a conclusion too quickly and bring up counter-examples. Until then, be advised that your LRTs, χ^2 tests, and F-tests ask for your statistical attention prior to their applications for any statistical inferences. These tests are not magic crystals, producing answers you are looking for. To bring such care and attentions, here’s a thought provoking titled paper that I found some years ago.

The LRT is worthless for testing a mixture when the set of parameters is large
JM Azaıs, E Gassiat, C Mercadier (click here :I found it from internet but it seems the link was on and off and sometimes was not available.)

Here, quotes replace theorems and their proofs[1] :

  • We prove in this paper that the LRT is worthless from testing a distribution against a two components mixture when the set of parameters is large.
  • One knows that the traditional Chi-square theory of Wilks[16[2]] does not apply to derive the asymptotics of the LRT due to a lack of identifiability of the alternative under the null hypothesis.
  • …for unbounded sets of parameters, the LRT statistic tends to infinity in probability, as Hartigan[7[3]] first noted for normal mixtures.
  • …the LRT cannot distinguish the null hypothesis (single gaussian) from any contiguous alternative (gaussian mixtures). In other words, the LRT is worthless[4].

For astronomers, the large set of parameters are of no concern due to theoretic constraints from physics. Experiences and theories bound the set of parameters small. Sometimes, however, the distinction between small and large sets can be vague.

The characteristics of the LRT is well established under the compactness set assumption (either compact or bounded) but troubles happen when the limit goes to the boundary. As cited before in the slog a few times, readers are recommend to read for more rigorously manifested ideas from astronomy about the LRT, Protassov, et.al. (2002) Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test, ApJ, 571, p. 545

  1. Readers might want to look for mathematical statements and proofs from the paper
  2. S.S.Wilks. The large sample distribution of the likelihood ratio for testing composite hypothesis, Ann. Math. Stat., 9:60-62, 1938
  3. J.A.Hartigan, A failure of likelihood asymptotics for normal mixtures. in Proc. Berkeley conf. Vol. II, pp.807-810
  4. Comment to Theorem 1. They proved the lack of worth in the LRT under more general settings, see Theoremm 2
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/the-lrt-is-worthless-for/feed/ 0
Google Sky http://hea-www.harvard.edu/AstroStat/slog/2008/google-sky/ http://hea-www.harvard.edu/AstroStat/slog/2008/google-sky/#comments Tue, 08 Apr 2008 20:32:59 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=268 For people in the Boston area, a cornucopia of talks on Google Sky in the near future.

  1. Hunting for Needles in Massive Astronomical Data Streams
    Wednesday, April 9, 2008 at 4pm
    Room 330, 60 Oxford St.
    Ryan Scranton, Google Sky Team
  2. Inside Google Sky
    Wednesday, April 9, 2008 at 8pm
    Room 105, Emerson Hall
    Andrew Connolly, Google Sky Team
  3. Sky in Google Earth
    Tuesday, April 15, 2008 at 1pm
    Phillips Auditorium, 60 Garden
    Alberto Conti & Carol Christian, STScI
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/google-sky/feed/ 4
Beyond Google Sky http://hea-www.harvard.edu/AstroStat/slog/2007/beyond-google-sky/ http://hea-www.harvard.edu/AstroStat/slog/2007/beyond-google-sky/#comments Sat, 08 Sep 2007 18:31:42 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/2007/beyond-google-sky/ Google Sky is good for a quick look “what’s that you just saw over there?”, but not for anything more than that. Not yet anyway. Mind you, I think it is a good thing. It is easy to use, and definitely worth a look as an astronomy popularization tool. But there are a number of astro visualization programs that can (so to speak) beat the pants off Google Sky with one hand tied behind the back. Check these out (all open source):

XEphem : http://www.clearskyinstitute.com/xephem/
Celestia : http://www.shatters.net/celestia/
Stellarium : http://www.stellarium.org/

There are many more, of varying degrees of usefulness, user friendliness, and price. Your mileage will vary. But for sheer wow factor, hard to beat Celestia.

[Update 10/01]: The e-Astronomer considers how Google Sky could become more useful. Some interesting tie-ins to Virtual Observatory concepts.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/beyond-google-sky/feed/ 2
[ArXiv] Google Sky, Sept. 05, 2007 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-google-sky-sept-05-2007/ http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-google-sky-sept-05-2007/#comments Fri, 07 Sep 2007 05:46:08 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-google-sky-sept-05-2007/ Ah..Sky in Google Earth made an arxiv appearance [arxiv/astro-ph:0709.0752], Sky in Google Earth: The Next Frontier in Astronomical Data Discovery and Visualization by R. Scranton et al.

]]>
Ah..Sky in Google Earth made an arxiv appearance [arxiv/astro-ph:0709.0752], Sky in Google Earth: The Next Frontier in Astronomical Data Discovery and Visualization by R. Scranton et al.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-google-sky-sept-05-2007/feed/ 2
Photometric Redshifts http://hea-www.harvard.edu/AstroStat/slog/2007/photometric-redshifts/ http://hea-www.harvard.edu/AstroStat/slog/2007/photometric-redshifts/#comments Wed, 25 Jul 2007 06:28:40 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/photometric-redshifts/ Since I began to subscribe arxiv/astro-ph abstracts, from an astrostatistical point of view, one of the most frequent topics has been photometric redshifts. This photometric redshift has been a popular topic as the catalog of remote photometric object observation multiplies its volume and sky survey projects in multiple bands lead to virtual observatories (VO – will discuss in the later posting). Just searching by photometric redshifts in google scholar and arxiv.org provides more than 2000 articles since 2000.

Quantifying redshifts is one of the key astronomical measures to identify the type of objects as well as to provide their distance. Typically, measuring redshifts requires spectral data, which are quite expensive in many aspects compared to photometric data. Let me explain a little what are spectral data and photometric data to enhance understandings for non astronomers.

Collecting photometric data starts from taking pictures with different filters. Through blue, yellow, red optical filters, or infrared, ultra-violet, X-ray filters, objects look different (or have different light intensity) and various astronomical objects can be identify via investigating pictures of many filter combinations. On the other hand, collecting spectral data starts from dispersing light through a specially designed prism. Because of this light dispersion, it takes longer to collect lights from a object and the smaller number of objects are recorded in a picture plate compared to collecting photometric data. A nice feature of this expensive spectral data is providing the physical condition of the object directly: first, the distance by the relative spectral line shifts of spectral lines; second, abundance (the metallic composition of the object), temperature, type of the object also from spectral lines. Therefore, utilizing photometric data to infer measures normally available from spectral data is a very attractive topic in astronomy.

However, there are many challenges. The massive volume of data and sampling bias*, like Malmquist bias (wiki) and Lutz-Kelker bias, hinder traditional regression techniques, where numerous statistical and machine learning methods have been introduced to make most of these photometric data to infer distances economically and quickly.

*((For a reference regarding these biases and astronomical distances, please check Distance Estimation in Cosmology by
Hendry, M. A. and Simmons, J. F. L., Vistas in Astronomy, vol. 39, Issue 3, pp.297-314.))

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/photometric-redshifts/feed/ 0