The AstroStat Slog » SVM http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 More on Space Weather http://hea-www.harvard.edu/AstroStat/slog/2009/more-on-space-weather/ http://hea-www.harvard.edu/AstroStat/slog/2009/more-on-space-weather/#comments Tue, 22 Sep 2009 17:03:11 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3319 Thanks to a Korean solar physicist[1] I was able to gather the following websites and some relevant information on Space Weather Forecast in action, not limited to literature nor toy data.


These seem quite informative and I believe more statisticians and data scientists (signal and image processing, machine learning, computer vision, and data mining) easily collaborate with solar physicists. All the complexity, as a matter of fact, comes from data processing to be fed in to (machine, statistical) learning algorithms and defining the objectives of learning. Once settled, one can easily apply numerous methods in the field to these time varying solar images.

I’m writing this short posting because I finally found those interesting articles that I collected for my previous post on Space Weather. After finding them and scanning through, I realized that methodology-wise they only made baby steps. You’ll see a limited number key words are repeated although there is a humongous society of scientists and engineers in the knowledge discovery and data mining.

Note that the objectives of these studies are quite similar. They described machine learning for the purpose of automatizing the procedure of detecting features of interest of the Sun and possible forecasting relevant phenomena that affects our own atmosphere due to associated solar activities.

  1. Automated Prediction of CMEs Using Machine Learning of CME – Flare Associations by Qahwaji et al. (2008) in Solar Phy. vol 248, pp.471-483.
  2. Automatic Short-Term Solar Flare Prediction using Machine Learning and Sunspot Associations by Qahwaji and Colak (2007) in Solar Phy. vol. 241, pp. 195-211

    Space weather is defined by the U.S. National Space Weather Probram (NSWP) as “conditions on the Sun and in the solar wind, magnetosphere, ionosphere, and thermosphere that can influence the performance and reliability of space-borne and ground-based technological systems and can endanger human life or health”

    Personally thinking, the section of “jackknife” needs to be replaced with “cross-validation.”

  3. Automatic Detection and Classification of Coronal Mass Ejections by Qu et al. (2006) in Solar Phy. vol. 237, pp.419-431.
  4. Automatic Solar Filament Detection Using image Processing Techniques by Qu et al. (2005) in Solar Phy., vol. 228, pp. 119-135
  5. Automatic Solar Flare Tracking Using Image-Processing Techniques by Qu, et al. (2004) in Solar Phy. vol. 222, pp. 137-149
  6. Automatic Solar Flare Detection Using MLP, RBF, and SVM by Qu et al. (2003) in Solar Phy. vol. 217, pp.157-172. pp. 157-172

I’d like add a survey paper on another type of learning methods beyond Support Vector Machine (SVM) used in almost all articles above. Luckily, this survey paper happened to address my concern about the “practices of background subtraction” in high energy astrophysics.

A Survey of Manifold-Based Learning methods by Huo, Ni, Smith
[Excerpt] What is Manifold-Based Learning?
It is an emerging and promising approach in nonparametric dimension reduction. The article reviewed principle component analysis, multidimensional scaling (MDS), generative topological mapping (GTM), locally linear embedding (LLE), ISOMAP, Laplacian eigenmaps, Hessian eigenmaps, and local tangent space alignment (LTSA) Apart from these revisits and comparison, this survey paper is useful to understand the danger of background subtraction. Homogeneity does not mean constant background to be subtracted, often cause negative source observation.

More collaborations among multiple disciplines are desired in this relatively new field. For me, it is one of the best data and information scientific fields of the 21st century and any progress will be beneficial to human kind.

  1. I must acknowledge him for his kindness and patience. He was my wikipedia to questions while I was studying the Sun.
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/more-on-space-weather/feed/ 0
space weather http://hea-www.harvard.edu/AstroStat/slog/2009/space-weather/ http://hea-www.harvard.edu/AstroStat/slog/2009/space-weather/#comments Thu, 21 May 2009 22:55:26 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=2413 Among billion objects in our Galaxy, outside the Earth, our Sun drags most attention from astronomers. These astronomers go by solar physicists, who enjoy the most abundant data including 400 year long sunspot counts. Their joy is not only originated from the fascinating, active, and unpredictable characteristics of the Sun but also attributed to its influence on our daily lives. Related to the latter, sometimes studying the conditions on the Sun is called space weather forecast.

With my limited knowledge, I cannot lay out all important aspects in solar physics, climate changes (not limited to our lower atmosphere but covering the space between the sun and the earth) due to solar activities, and the most important issues of recent years related to space weather. Only I can emphasize that compared to earth climate/atmosphere or meteorology, contribution from statisticians to space weather is almost none existing. I’ve witnessed frequently that crude eyeballing instead of statistics in analyzing data and quantifying images occurs in Solar Physics. Luckily, a few articles discussing statistics are found and my discussion is rather focused on these papers while leaving a room for solar physicists to chip in how space weather is dealt statistically for collaborating with statisticians.

By the way, I have no intention of degrading “eyeballing” in data analysis by astronomers. Statistical methods under EDA, exploratory data analysis whose counterpart is CDA, confirmatory data analysis, or statistical inference, is basically “eyeballing” with technical jargon and basics from probability theory. EDA is important to doubt every step in astronomers’ chi-square methods. Without those diagnostics and visualization, choosing right statistical strategies is almost impossible with real data sets. I used “crude” because instead of using “edge detection” algorithms, edges are drawn by hand via eyeballing. Also, my another disclaimer is that there are brilliant image processing/computer vision strategies developed by astronomers, which I’m not going to present. I’m focusing on small areas in statistics related to space weather and its forecasting.

Statistical Assessment of Photospheric Magnetic Features in Imminent Solar Flare Predictions by Song et al. (2009) SoPh. v. 254, p.101.

Their forte is “logistic regression” a statistical model that is not often used in astronomy. It is seen when modeling binary responses (or categorical responses like head or tail; agree, neutral, or disgree) and bunch of predictors, i.e. classification with multiple features or variables (astronomers might like to replace these lexicons with parameters). Also, the issue of variable selection is discussed like L_{gnl} to be the most powerful predictor. Their training set was carefully discussed from the solar physical perspective. Against their claim that they used “logistic regression” to predict solar flares for the first time, there was another paper a few years back discussing “logistic regression” to predict geomagnetic storms or coronal mass ejections. This statement can be wrong if flares and CMEs are exclusive events.

The Challenge of Predicting the Occurrence of Intense Storms by Srivastava (2006) J.Astrophys. Astr. v.27, pp.237-242

Probability of the storm occurrence is response in logistic regression model, of which predictors are CME related variables including latitude and longitude of the origin of CME, and interplanetary inputs like shock speeds, ram pressure, and solar wind related measures. Cross-validation was performed. A comment that the initial speed of a CME might be the most reliable predictor is given but no extensive discussion of variable selection/model selection.

Personally speaking, both publications[1] can be more statistically rigorous to discuss various challenges in logistic regression from the statistical learning/classification perspective and from the model/variable selection aspect to define more well behaving and statistically rigorous classifiers.

Often times we plan our days according to the weather forecast (although we grumble weather forecasts are not right, almost everyone relies on numbers and predictions from weather people). Although it may not be 100% reliable, those forecasts make our lives easier. Also, more reliable models are under developing. On the other hand, forecasting space weather with the help of statistics is yet unthinkable. However, scientists and engineers understand that the reliable space weather models help planning space missions and controlling satellites into safety mode. At least I know is that with the presence of flare or CME forecasting models, fewer scientists/engineers need to wake up in the middle of night, because of, otherwise unforeseen storms from the sun.

  1. I thought I collected more papers under “statistics” and “space weather,” not just these two. A few more probably are buried somewhere. It’s hard to believe such rich field is not touched by statisticians. I’d appreciate very much your kind forwarding those relevant papers. I’ll gradually add them.
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/space-weather/feed/ 0
[ArXiv] 1st week, Feb. 2008 http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-feb-2008/ http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-feb-2008/#comments Sun, 10 Feb 2008 16:56:12 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-feb-2008/ Review papers on Bayesian hierarchical modeling and LAR (least angle regression) appeared in this week’s stat arXiv and in addition to interesting astro-ph papers.

A review paper on LASSO and LAR: [stat.ME:0801.0964] T. Hesterberg et.al.
   Least Angle and L1 Regression: A Review
Model checking for Bayesian hierarchical modeling: [stat.ME:0802.0743] M. J. Bayarri, M. E. Castellanos
   Bayesian Checking of the Second Levels of Hierarchical Models

  • [astro-ph:0802.0042] Y. Kubo
    Statistical Models for Solar Flare Interval Distribution in Individual Active Regions (it discusses AIC)

  • [astro-ph:0802.0131] J.Bobin, J-L Starck and R. Ottensamer
    Compressed Sensing in Astronomy

  • [astro-ph:0802.0387] J. Gaite
    Geometry and scaling of cosmic voids

  • [astro-ph:0802.0400] R. Vio & P. Andreani
    A Modified ICA Approach for Signal Separation in CMB Maps

  • [astro-ph:0802.0498] V. Balasubramanian, K. Larjo and R. Sheth
    Experimental design and model selection: The example of exoplanet detection

  • [astro-ph:0802.0537] G. Dan, Z. Yanxia, & Z. Yongheng
    Support Vector Machines and Kd-tree for Separating Quasars from Large Survey Databases

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/arxiv-1st-week-feb-2008/feed/ 0
[ArXiv] SVM and galaxy morphological classification, Sept. 10, 2007 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-svm-morphological-classification/ http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-svm-morphological-classification/#comments Wed, 12 Sep 2007 20:31:30 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-svm-morphological-classification/ From arxiv/astro-ph:0709.1359,
A robust morphological classification of high-redshift galaxies using support vector machines on seeing limited images. I Method description by M. Huertas-Company et al.

Machine learning and statistical learning become more and more popular in astronomy. Artificial Neural Network (ANN) and Support Vector Machine (SVM) are hardly missed when classifying on massive survey data is the objective. The authors provide a gentle tutorial on SVM for galactic morphological classification. Their source code GALSVM is linked for the interested readers.

One of the biggest challenges to apply SVM or other classification methods in astronomy is quantification of measures, or how to define parameters and variables physically meaningful and machine interpretable at the same time. The authors of arxiv/astro-ph:0709.1359 followed the idea of Abraham et. al. (1994), who introduced concentration. However, my impression so far tells me that standardized indices (like economic indicators) are hardly found for the classification purpose in astronomy. Astronomical Machine Learning consortium would accelerate understanding many populations in the Universe.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-svm-morphological-classification/feed/ 0