The AstroStat Slog » Classification http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 [ArXiv] classifying spectra http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-classifying-spectra/ http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-classifying-spectra/#comments Fri, 23 Oct 2009 00:08:07 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3866

[arXiv:stat.ME:0910.2585]
Variable Selection and Updating In Model-Based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications
by Murphy, Dean, and Raftery

Classifying or clustering (or semi supervised learning) spectra is a very challenging problem from collecting statistical-analysis-ready data to reducing the dimensionality without sacrificing complex information in each spectrum. Not only how to estimate spiky (not differentiable) curves via statistically well defined procedures of estimating equations but also how to transform data that match the regularity conditions in statistics is challenging.

Another reason that astrophysics spectroscopic data classification and clustering is more difficult is that observed lines, and their intensities and FWHMs on top of continuum are related to atomic database and latent variables/hyper parameters (distance, rotation, absorption, column density, temperature, metalicity, types, system properties, etc). Frequently it becomes very challenging mixture problem to separate lines and to separate lines from continuum (boundary and identifiability issues). These complexity only appears in astronomy spectroscopic data because we only get indirect or uncontrolled data ruled by physics, as opposed to the the meat species spectra in the paper. These spectroscopic data outside astronomy are rather smooth, observed in controlled wavelength range, and no worries for correcting recession/radial velocity/red shift/extinction/lensing/etc.

Although the most relevant part to astronomers, i.e. spectroscopic data processing is not discussed in this paper, the most important part, statistical learning application to complex curves, spectral data, is well described. Some astronomers with appropriate data would like to try the variable selection strategy and to check out the classification methods in statistics. If it works out, it might save space for storing spectral data and time to collect high resolution spectra. Please, keep in mind that it is not necessary to use the same variable selection strategy. Astronomers can create better working versions for classification and clustering purpose, like Hardness Ratios, often used to reduce the dimensionality of spectral data since low total count spectra are not informative in the full energy (wavelength) range. Curse of dimensionality!.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/arxiv-classifying-spectra/feed/ 0
SINGS http://hea-www.harvard.edu/AstroStat/slog/2009/sings/ http://hea-www.harvard.edu/AstroStat/slog/2009/sings/#comments Wed, 07 Oct 2009 01:30:41 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3628

From SINGS (Spitzer Infrared Nearby Galaxies Survey): Isn’t it a beautiful Hubble tuning fork?

As a first year graduate student of statistics, because of the rumor that Prof. C.R.Rao won’t teach any more and because of his fame, the most famous statistician alive, I enrolled his “multivariate analysis” class without thinking much. Everything is smooth and easy for him and he has incredible memories of equations and proofs. However, I only grasped intuitive concepts like why the method works, not details of mathematics, theorems, and their proofs. Instantly, I began to think how methods can be applied to astronomical data. After a few lessons, I desperately wanted to try out multivariate analysis methods to classify galactic morphology.

The dream died shortly because there’s no data set that can be properly fed into statistical methods for classification. I spent quite time on searching some astronomical data bases including ADS. This was before SDSS or VizieR become popular as now. Then, I thought about applying them to classify supernovae because understanding the pattern of their light curves tells a lot of the history of our universe (Type Ia SNe are standard candle) and because I know some publicly available SN light curves. Immediately, I realize that individual light curves are biased from the sampling perspective. I do not know how to correct them for multivariate analysis. I also thought about applying multivariate analysis methods to stellar spectral types and stars of different mechanical systems (single, binary, association, etc). I thought about how to apply newly learned methods to every astronomical objects that I learned, from sunspots to AGNs.

Regardless of target objects to be scrutinized under this fascinating subject “multivariate analysis,” two factors kept discouraged me: one was that I didn’t have enough training to develop new statistical models in a couple of weeks to reflect unique statistical challenges embedded in data that have missings, irregularities, non-iid, outliers and others that are hardly transcribed into statistical setting, and the other, which was more critical, was that no accessible astronomical database repository for statistical learning. Without deep knowledge in astronomy and trained skills to handle astronomical data, catalogs are generally useless. Those catalogs and data sets in archives are different from data sets from data repositories in machine learning (these data sets are intuitive).

Astronomers would think analyzing toy/mock data sets is not scientific because it’s not leading to any new discovery which they always make. From data analyst viewpoints, scientific advances mean finding tools that summarize data in an optimal manner. As I demanded in Astroinformatics, methods for retrieving information can be attempted and validated with well understood, astrophysically devastated data sets. Pythagoras theorem was proved not only once but there are 39 different ways to prove it.

Seeing this nice poster image (the full resolution image of 56MB is available from the link), brought me some memory of my enthusiasm of applying statistical learning methods for better knowledge discovery. As you can see there are so many different types of galaxies and often times there is no clear boundary between them – consider classifying blurry galaxies by eyes: a spiral can be classified as a irregular, for example. Although I wish for automatic classification of these astrophysical objects, because of difficulties in composing a training set for classification or collecting data of distinctive manifold groups for clustering, as much as complexity that this tuning fork shows, machine learning procedures is equally complicated to be developed. Complex topology of astronomical objects seems to be the primary reason of lacking in statistical learning applications compared to other fields.

Nonetheless, multivariable analysis can be useful for viewing relations from different perspectives, apart from known physics models. It may help to develop more fine tuned physics model by taking latent variables into account that are found from statistical learning processes. Such attempts, I believe, can assist astronomers to design telescopes and to invent efficient ways to collect/analyze data by knowing which features are more significant than others to understand morphological shape of galaxies, patterns in light curves, spectral types, etc. When such experiences accumulate, different insights of physics can kick in like scientists scrambled and assembled galaxies into a tuning fork that led developing various evolution models.

To make a long story short, you have two choices: one, just enjoy these beautiful pictures and apprehend the complexity of our universe, or two, this picture of Hubble’s tuning fork can be inspirational to you for advances in astroinformatics. Whichever path you choose, it’s your time worthy.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/sings/feed/ 0
[MADS] Parallel Coordinates http://hea-www.harvard.edu/AstroStat/slog/2009/mads-parallel-coordinate/ http://hea-www.harvard.edu/AstroStat/slog/2009/mads-parallel-coordinate/#comments Wed, 29 Jul 2009 06:02:18 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1786 Speaking of XAtlas from my previous post I tried another visualization tool called Parallel Coordinates on these Capella observations and two stars with multiple observations (AL Lac and IM Peg). As discussed in [MADS] Chernoff face, full description of the catalog is found from XAtlas website. The reason for choosing these stars is that among low mass stars, next to Capella (I showed 16), IM PEG (HD 21648, 8 times), and AR Lac (although different phases, 6 times) are most frequently observed. I was curious about which variation, within (statistical variation) and between (Capella, IM Peg, AL Lac), is dominant. How would they look like from the parametric space of High Resolution Grating Spectroscopy from Chandra?

Having 13 X-ray line and/or continuum ratios, a typical data display would be the 13 choose 2 combination of scatter plots as follows. Note that the upper left panels with three colors are drawn for the classification purpose (red: AL Lac, blue: IM Peg, green:Capella) while lower right ones are discolored for the clustering analysis purpose. These scatter plots are essential to exploratory data analysis but they do not convey information efficiently with these many scatter plots. In astronomical journals, thanks to astronomers’ a priori knowledge, a fewer pairs of important variables are selected and displayed to reduce the visualization complexity dramatically. Unfortunately, I cannot select physically important variables only.

pairs

I am not a well-knowledged astronomer but believe in reducing dimensionality by the research objective. The goal is set from asking questions like “what do you want from this multivariate data set?” classification (classification rule/regression model that separates three stars, Capella, AL Lac, and IM Peg), clustering (are three stars naturally clustered into three groups? Or are there different number of clusters even if they are not well visible from above scatter plots?), hypothesis testing (are they same type of stars or different?), point estimation and its confidence interval (means and their error bars), and variable selection (or dimension reduction). So far no statistical question is well defined (it can be good thing for new discoveries). Prior to any confirmatory data analysis, we’d better find a way to display this multidimensional data efficiently. I thought parallel coordinates serve the purpose well but surprisingly, it was never discussed in astronomical literature, at least it didn’t appear in ADS.

pc_n
pc_s

Each 13 variable was either normalized (left) or standardized (right). The parallel coordinate plot looks both simpler and more informative. Capella observations occupy relatively separable space than the other stars. It is easy to distinguish that one Capella observation is an obvious outlier to the rest which is hardly seen from scatter plots. It is clear that discriminant analysis or classical support vector machine type classification methods cannot separate AL Lac and IM Pec. Clustering based on distance measures of dissimilarity also cannot be applied in order to see a natural grouping of these two stars whereas Capella observations form its own cluster. To my opinion, parallel coordinates provide more information about multidimensional data (dim>3) in a simpler way than scatter plots of multivariate data. It naturally shows highly correlated variables within the same star observations or across all target stars. This insight from visualization is a key to devising methods of variable selection or reducing dimensionality in the data set.

Personal opinion is that not having an efficient and informative visualization tool for visualizing complex high resolution spectra in many detailed metrics, smoothed bivariate (trivariate at most) information such as hardness ratios and quantiles are utilized in displaying X-ray spectral data, instead. I’m not saying that the parallel coordinates are the ultimate answer to visualizing multivariate data but I’d like to emphasize that this method is more informative, intuitive and simple to understand the structure of relatively high dimensional data cloud.

Parallel coordinates has a long history. The earliest discussion I found was made in 1880ies. It became popular by Alfred Inselberg and gained recognition among statisticians by George Wegman (1990, Hyperdimensional Data Analysis Using Parallel Coordinates). Colorful images of the Sun, stars, galaxies, and their corona, interstellar gas, and jets are the eye catchers. I hope that data visualization tools gain equal spot lights since they summarize data and deliver lots of information. If images are well decorated cakes, then these tools from EDA are sophisticated and well baked cookies.

——————- [Added]
According to

[arxiv:0906.3979] The Golden Age of Statistical Graphics
Michael Friendly (2008)
Statistical Science, Vol. 23, No. 4, pp. 502-535

it is 1885. Not knowing French – if I knew I’d like to read Gauss’ paper immediately prior to anything – I don’t know what the reference is about.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/mads-parallel-coordinate/feed/ 3
Classification and Clustering http://hea-www.harvard.edu/AstroStat/slog/2008/classification-and-clusterin/ http://hea-www.harvard.edu/AstroStat/slog/2008/classification-and-clusterin/#comments Thu, 18 Sep 2008 23:48:43 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=747 Another deduced conclusion from reading preprints listed in arxiv/astro-ph is that astronomers tend to confuse classification and clustering and to mix up methodologies. They tend to think any algorithms from classification or clustering analysis serve their purpose since both analysis algorithms, no matter what, look like a black box. I mean a black box as in neural network, which is one of classification algorithms.

Simply put, classification is regression problem and clustering is mixture problem with unknown components. Defining a classifier, a regression model, is the objective of classification and determining the number of clusters is the objective of clustering. In classification, predefined classes exist such as galaxy types and star types and one wishes to know what prediction variables and their functional allow to separate Quasars from stars without individual spectroscopic observations by only relying on handful variables from photometric data. In clustering analysis, there is no predefined class but some plots visualize multiple populations and one wishes to determine the number of clusters mathematically not to be subjective in concluding remarks saying that the plot shows two clusters after some subjective data cleaning. A good example is that as photons from Gamma ray bursts accumulate, extracting features like F_{90} and F_{50} enables scatter plots of many GRBs, which eventually led people believe there are multiple populations in GRBs. Clustering algorithms back the hypothesis in a more objective manner opposed to the subjective manner of scatter plots with non statistical outlier elimination.

However, there are challenges to make a clear cut between classification and clustering both in statistics and astronomy. In statistics, missing data is the phrase people use to describe this challenge. Fortunately, there is a field called semi-supervised learning to tackle it. (Supervised learning is equivalent to classification and unsupervised learning is to clustering.) Semi-supervised learning algorithms are applicable to data, a portion of which has known class types and the rest are missing — astronomical catalogs with unidentified objects are a good candidate for applying semi-supervised learning algorithms.

From the astronomy side, the fact that classes are not well defined or subjective is the main cause of this confusion in classification and clustering and also the origin of this challenge. For example, will astronomer A and B produce same results in classifying galaxies according to Hubble’s tuning fork?[1] We are not testing individual cognitive skills. Is there a consensus to make a cut between F9 stars and G0 stars? What make F9.5 star instead of G0? With the presence of error bars, how one is sure that the star is F9, not G0? I don’t see any decision theoretic explanation in survey papers when those stellar spectral classes are presented. Classification is generally for data with categorical responses but astronomer tend to make something used to be categorical to continuous and still remain to apply the same old classification algorithms designed for categorical responses.

From a clustering analysis perspective, this challenge is caused by outliers, or peculiar objects that do not belong to the majority. The size of this peculiar objects can make up a new class that is unprecedented before. Or the number is so small that a strong belief prevails to discard these data points, regarded as observational mistakes. How much we can trim the data with unavoidable and uncontrollable contamination (remember, we cannot control astronomical data as opposed to earthly kinds)? What is the primary cause defining the number of clusters? physics, statistics, astronomers’ experience in processing and cleaning data, …

Once the ambiguity in classification, clustering, and the complexity of data sets is resolved, another challenge is still waiting. Which black box? For the most of classification algorithms, Pattern Recognition and Machine Learning by C. Bishop would offer a broad spectrum of black boxes. Yet, the book does not include various clustering algorithms that statisticians have developed in addition to outlier detection. To become more rigorous in selecting a black box for clustering analysis and outlier detection, one is recommended to check,

For me, astronomers tend to be in a haste owing to the pressure of publishing results immediately after data release and to overlook suitable methodologies for their survey data. It seems that there is no time for consulting machine learning specialists to verify the approaches they adopted. My personal prayer is that this haste should not be settled as a trend in astronomical survey and large data analysis.

  1. Check out the project, GALAXY ZOO
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/classification-and-clusterin/feed/ 0
my first AAS. IV. clustering http://hea-www.harvard.edu/AstroStat/slog/2008/my-first-aas-iv-clustering/ http://hea-www.harvard.edu/AstroStat/slog/2008/my-first-aas-iv-clustering/#comments Fri, 20 Jun 2008 03:42:06 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=339 I was questioned by two attendees, acquainted before the AAS, if I can suggest them clustering methods relevant to their projects. After all, we spent quite a time to clarify the term clustering.

  • The statistician’s and astronomer’s understanding of clustering is different:
    • classification vs. clustering or supervised learning vs. unsupervised learning: the former terms from the pairs indicate the fact that the scientist already knows types of objects in his hands. A photometry data set with an additional column saying star, galaxy, quasar, and unknown is a target for classification or supervised learning. Simply put, classification is finding a rule with photometric colors that could classify these different type objects. If there’s no additional column but the scatter plots or plots after dimension reduction manifesting grouping patterns, it is clustering or unsupervised learning whose goal is finding hyperplanes to separates these clusters optimally; in other words, answering these questions, are there real clusters? If so, how many? is the objective of clustering/unsupervised learning. Overall, rudimentarily, the presence of an extra column of types differentiates between classification and clustering.
    • physical clustering vs. statistical clustering:
      Cosmologists and alike are interested in clusters/clumps of matters/particles/objects. For astrophysicists, clusters are associated with spatial evolution of the universe. Inquiries related to clustering from astronomers are more likely related to finding these spatial clumps statistically, which is a subject of stochastic geometry or spatial statistics. On the other hand, statisticians and data analysts like to investigate clusters in a reparameterized multi-dimensional space. Distances computed do not follow the fundamental laws of physics (gravitation, EM, weak, and strong) but reflect relationships in the multi-dimensional space; for example, in a CM diagram, stars of a kind are grouped. The consensus between two communities about clustering is that the number of clusters is unknown, where the plethora of classification methods cannot be applied and that the study objectives are seeking methodologies for quantifying clusters .
  • astronomer’s clustering problems are either statistical classification (closed to semi-supervised learning) or spatial statistics.
    The way of manifesting noisy clusters in the universe or quantifying the current status of matter distribution leads to the very fundamentals of the birth of the universe, where spatial statistics can be a great partner. In the era of photometric redshifts, various classification techniques enhances the accuracy of prediction.
  • astronomer’s testing the reality of clusters seems limited: Cosmology problems have been tackled as inverse problem. Based on theoretical cosmology models, simulations are performed and the results are transformed into some surrogate parameters. These surrogates are generally represented by some smooth curves or straight lines in a plot where observations made their debut as points with bidirectional error bars (so called measurement errors). The judgment about the cosmological model under the test happens by a simple regression (correlation) or eyes on these observed data points. If observations and a curve from a cosmological model presented in a 2D plot match well, the given cosmological model is confirmed in the conclusion section. Personally, this procedure of testing cosmological models to account for clusters of the universe can be developed in a more statistically rigorous fashion instead of matching straight lines.
  • Challenges to statisticians in astronomy, measurement errors: In (statistical) learning, I believe, there has been no standard procedure to account for astronomers’ measurement errors into modeling. I think measurements errors are, in general, ignored because systematics errors are not recognized in statistics. On the other hand, in astronomy, measurement errors accompanying data, are a very crucial piece of information, particularly for verifying the significance of the observations. Often this measurement errors became denominator in the χ2 function which is treated as a χ2 distribution to get best fits and confidence intervals.

Personal lessons from two short discussions at the AAS were more collaboration between statisticians and astronomers to include measurement errors in classification or semi-supervised learning particularly for nowadays when we are enjoying plethora of data sets and moving forward with a better aid from statisticians for testing/verifying the existence of clusters beyond fitting a straight line.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/my-first-aas-iv-clustering/feed/ 0
[ArXiv] 2nd week, May 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-may-2008/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-may-2008/#comments Mon, 19 May 2008 14:42:56 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=306 There’s no particular opening remark this week. Only I have profound curiosity about jackknife tests in [astro-ph:0805.1994]. Including this paper, a few deserve separate discussions from a statistical point of view that shall be posted.

  • [astro-ph:0805.1290]R. Barnard, L. Shaw Greening, U. Kolb
    A multi-coloured survey of NGC 253 with XMM-Newton: testing the methods used for creating luminosity functions from low-count data

  • [astro-ph:0805.1469] Philip J. Marshall et al.
    Automated detection of galaxy-scale gravitational lenses in high resolution imaging data

  • [astro-ph:0805.1470] E. P. Kontar, E. Dickson, J. Kasparova
    Low-energy cutoffs and in electron spectra of solar flares: statistical survey (It is not statistically rigorous but the topic can be connected to dip tests or gap tests in statistics)

  • [astro-ph:0805.1936] J. Yee & B. Gaudi
    Characterizing Long-Period Transiting Planets Observed by Kepler (discusses uncertainty in light curves and Fisher matrix)

  • [astro-ph:0805.1994] the QUad collaboration: C. Pryke et al.
    Second and third season QUaD CMB temperature and polarization power spectra (What is jackknife tests? A brief scan of the paper does not register with my understanding of jackknifing. It looks more close to cross validation. Another slog topic shall come: bootstrap, cross validation, jackknife, and resampling.)

  • [astro-ph:0805.2121] N. Cole et al.
    Maximum Likelihood Fitting of Tidal Streams With Application to the Sagittarius Dwarf Tidal Tails

  • [astro-ph:0805.2155] J Yoo & M Zaldarriaga
    Improved estimation of cluster mass profiles from the cosmic microwave background

  • [astro-ph:0805.2207] A.Vikhlinin et al.
    Chandra Cluster Cosmology Project II: Samples and X-ray Data Reduction (it mentions calibration uncertainty and background, can it be a reference to stacking, coadding, source detection, etc?)

  • [astro-ph:0805.2325] J.M. Loh
    A valid and fast spatial bootstrap for correlation functions

  • [astro-ph:0805.2326] T. Wickramasinghe, M. Struble, J. Nieusma
    Observed Bimodality of the Einstein Crossing Times of Galactic Microlensing Events
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-may-2008/feed/ 0
[ArXiv] 1st week, May 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-may-2008/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-may-2008/#comments Mon, 12 May 2008 02:42:54 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=298 I think I have to review spatial statistics in astronomy, focusing on tessellation (void structure), point process (expanding 2 (3) point correlation function), and marked point process (spatial distribution of hardness ratios of X-ray distant sources, different types of galaxies -not only morphological differences but other marks such as absolute magnitudes and existence of particular features). When? Someday…

In addition to Bayesian methodologies, like this week’s astro-ph, studies on characterizing empirical spatial distributions of voids and galaxies frequently appear, which I believe can be enriched further with the ideas from stochastic geometry and spatial statistics. Click for what was appeared in arXiv this week.

  • [astro-ph:0805.0156]R. D’Abrusco, G. Longo, N. A. Walton
    Quasar candidates selection in the Virtual Observatory era

  • [astro-ph:0805.0201] S. Vegetti& L.V.E. Koopmans
    Bayesian Strong Gravitational-Lens Modelling on Adaptive Grids: Objective Detection of Mass Substructure in Galaxies (many like to see this paper: nest sampling implemented, discusses penalty function and tessllation)

  • [astro-ph:0805.0238] J. A. Carter et al.
    Analytic Approximations for Transit Light Curve Observables, Uncertainties, and Covariances

  • [astro-ph:0805.0269] S.M.Leach et al.
    Component separation methods for the Planck mission

  • [astro-ph:0805.0276] M. Grossi et al.
    The mass density field in simulated non-Gaussian scenarios

  • [astro-ph:0805.0790] Ceccarelli, Padilla, & Lambas
    Large-scale modulation of star formation in void walls
    [astro-ph:0805.0797] Ceccarelli et al.
    Voids in the 2dFGRS and LCDM simulations: spatial and dynamical properties

  • [astro-ph:0805.0875] S. Basilakos and L. Perivolaropoulos
    Testing GRBs as Standard Candles

  • [astro-ph:0805.0968] A. A. Stanislavsky et al.
    Statistical Modeling of Solar Flare Activity from Empirical Time Series of Soft X-ray Solar Emission
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-may-2008/feed/ 1
[ArXiv] 5th week, Apr. 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-5th-week-apr-2008/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-5th-week-apr-2008/#comments Mon, 05 May 2008 07:08:42 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=281 Since I learned Hubble’s tuning fork[1] for the first time, I wanted to do classification (semi-supervised learning seems more suitable) galaxies based on their features (colors and spectra), instead of labor intensive human eye classification. Ironically, at that time I didn’t know there is a field of computer science called machine learning nor statistics which do such studies. Upon switching to statistics with a hope of understanding statistical packages implemented in IRAF and IDL, and learning better the contents of Numerical Recipes and Bevington’s book, the ignorance was not the enemy, but the accessibility of data was.

I’m glad to see this week presented a paper that I had dreamed of many years ago in addition to other interesting papers. Nowadays, I’m more and more realizing that astronomical machine learning is not simple as what we see from machine learning and statistical computation literature, which typically adopted data sets from the data repository whose characteristics are well known over the many years (for example, the famous iris data; there are toy data sets and mock catalogs, no shortage of data sets of public characteristics). As the long list of authors indicates, machine learning on astronomical massive data sets are never meant to be a little girl’s dream. With a bit of my sentiment, I offer the list of this week:

  • [astro-ph:0804.4068] S. Pires et al.
    FASTLens (FAst STatistics for weak Lensing) : Fast method for Weak Lensing Statistics and map making
  • [astro-ph:0804.4142] M.Kowalski et al.
    Improved Cosmological Constraints from New, Old and Combined Supernova Datasets
  • [astro-ph:0804.4219] M. Bazarghan and R. Gupta
    Automated Classification of Sloan Digital Sky Survey (SDSS) Stellar Spectra using Artificial Neural Networks
  • [gr-qc:0804.4144]E. L. Robinson, J. D. Romano, A. Vecchio
    Search for a stochastic gravitational-wave signal in the second round of the Mock LISA Data challenges
  • [astro-ph:0804.4483]C. Lintott et al.
    Galaxy Zoo : Morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey
  • [astro-ph:0804.4692] M. J. Martinez Gonzalez et al.
    PCA detection and denoising of Zeeman signatures in stellar polarised spectra
  • [astro-ph:0805.0101] J. Ireland et al.
    Multiresolution analysis of active region magnetic structure and its correlation with the Mt. Wilson classification and flaring activity

A relevant post related machine learning on galaxy morphology from the slog is found at svm and galaxy morphological classification

< Added: 3rd week May 2008>[astro-ph:0805.2612] S. P. Bamford et al.
Galaxy Zoo: the independence of morphology and colour

  1. Wikipedia link: Hubble sequence
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-5th-week-apr-2008/feed/ 0
[ArXiv] 2nd week, Apr. 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-apr-2008/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-apr-2008/#comments Fri, 11 Apr 2008 06:21:41 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=267 Markov chain Monte Carlo became the most frequent and stable statistical application in astronomy. It will be useful collecting tutorials from both professions.

  • [astro-ph:0804.0620] Q. Wu et al.
    Late transient acceleration of the universe in string theory on $S^{1}/Z_{2}$ (MCMC)

  • [astro-ph:0804.0692] Corless, Dobke & King
    The Hubble constant from galaxy lenses: impacts of triaxiality and model degeneracies (MCMC, Bayesian Modeling)

  • [astro-ph:0804.0788] Zamfir, Sulentic, & Marziani
    New Insights on the QSO Radio-Loud/Radio-Quiet Dichotomy: SDSS Spectra in the Context of the 4D Eigenvector1 Parameter Space

  • [astro-ph:0804.0965] Bloom, Butler, & Perley
    Gamma-ray Bursts, Classified Physically (instead of statistics, it relies on physics to grow a (classification) tree)

  • [astro-ph:0804.1089] G.K.Skinner
    The sensitivity of coded mask telescopes

  • [astro-ph:0804.1197] Bagla, Prasad and Khandai
    Effects of the size of cosmological N-Body simulations on physical quantities – III: Skewness

  • [astro-ph:0804.1447] Marsh, Ireland, & Kucera
    Bayesian Analysis of Solar Oscillations

  • [astro-ph:0804.1532] C. López-Sanjuan, C. E. García-Dabó, M. Balcells
    A maximum likelihood method for bidimensional experimental distributions, and its application to the galaxy merger fraction

  • [astro-ph:0804.1536] V.J.Martinez (One of my favorite astronomers who brings in mathematics and statistics)
    The Large Scale Structure in the Universe: From Power-Laws to Acoustic Peaks
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-apr-2008/feed/ 3
[ArXiv] 2nd week, Mar. 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-mar-2007/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-mar-2007/#comments Fri, 14 Mar 2008 19:44:34 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-mar-2007/ Warning! The list is long this week but diverse. Some are of CHASC’s obvious interest.

  • [astro-ph:0803.0997] V. Smolcic et.al.
       A new method to separate star forming from AGN galaxies at intermediate redshift: The submillijansky radio population in the VLA-COSMOS survey
  • [astro-ph:0803.1048] T.A. Carroll and M. Kopf
       Zeeman-Tomography of the Solar Photosphere — 3-Dimensional Surface Structures Retrieved from Hinode Observations
  • [astro-ph:0803.1066] M. Beasley et.al.
       A 2dF spectroscopic study of globular clusters in NGC 5128: Probing the formation history of the nearest giant Elliptical
  • [astro-ph:0803.1098] Z. Lorenzo
       A new luminosity function for galaxies as given by the mass-luminosity relationship
  • [astro-ph:0803.1199] D. Coe et.al.
       LensPerfect: Gravitational Lens Massmap Reconstructions Yielding Exact Reproduction of All Multiple Images (could it be related to GREAT08 Challenge?)
  • [astro-ph:0803.1213] H.Y.Wang et.al.
       Reconstructing the cosmic density field with the distribution of dark matter halos
  • [astro-ph:0803.1420] E. Lantz et.al.
       Multi-imaging and Bayesian estimation for photon counting with EMCCD’s
  • [astro-ph:0803.1491] Wu, Rozo, & Wechsler
       The Effect of Halo Assembly Bias on Self Calibration in Galaxy Cluster Surveys
  • [astro-ph:0803.1616] P. Mukherjee et.al.
       Planck priors for dark energy surveys (some CHASCians would like to check!)
  • [astro-ph:0803.1738] P. Mukherjee and A. R. Liddle
       Planck and reionization history: a model selection view
  • [astro-ph:0803.1814] J. Cardoso et.al.
       Component separation with flexible models. Application to the separation of astrophysical emissions
  • [astro-ph:0803.1851] A. R. Marble et.al.
        The Flux Auto- and Cross-Correlation of the Lyman-alpha Forest. I. Spectroscopy of QSO Pairs with Arcminute Separations and Similar Redshifts
  • [astro-ph:0803.1857] R. Marble et.al.
        The Flux Auto- and Cross-Correlation of the Lyman-alpha Forest. II. Modelling Anisotropies with Cosmological Hydrodynamic Simulations
]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-2nd-week-mar-2007/feed/ 0