Thank you,Tom, for your explanation about what “bayes” in sherpa does and why the LM algorithm does not work with “bayes.” Particularly, related to the latter, it all comes to an objective function and how it is defined that determines those algorithms. Depending on the shape of objective functions, strategies must be changed. As a statistician, I rather like to work on robust one and make it work for spectral fitting. Instead of saying LM does not work, I’d like to give a reason why it does not work. However, not knowing what’s inside – “bayes” didn’t explain nor pointed references – I was curious. I hope the documentation you are preparing to be finished soon and “bayes” could hold better explanation about its function and shed more information about Bayesian statistics.

]]>
By: TomLoredo http://hea-www.harvard.edu/AstroStat/slog/2008/it-bothers-me/comment-page-1/#comment-823 TomLoredo Sat, 06 Dec 2008 19:34:09 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=1232#comment-823 Hyunsook asks: <em>I would like to know why it’s not working with Levenberg-Marquardt (LM)</em> The LM algorithm uses the form of the chi**2 function to develop an approximation to derivatives of the fitting function, used to guide steps to improve the fit. Since "bayes" changes the fit function to something that is not in the chi**2 form (sum of weighted squared differences between data and model), standard LM can't work with it (nor can it work with other non-Gaussian likelihoods). Put another way, LM is not a generic optimization algorithm; it is specifically tailored to chi**2 minimization. Powell is a generic algorithm, so it can work with the "bayes" marginal likelihood. Also, in case it wasn't clear from my earlier comment, the "bayes" marginal likelihood does not <em>subtract</em> the background; it marginalizes over it (analytically). If you follow the same procedure for Gaussian noise, it just so happens that the result can be expressed in terms of subtracting a background estimate, but that is just a convenient "accident" that comes from the form of the Gaussian. From a Bayesian point of view, the right thing to do is <em>always</em> to marginalize an uncertain background, not to subtract off a background estimate. The only reference for the "bayes" algorithm is my paper in the first SCMA volume (<a href="http://adsabs.harvard.edu/abs/1992scma.conf..275L" rel="nofollow">ADS link</a>), though I'm working on a more complete description of it (and some related algorithms). It's also described in most of my CASt summer school lectures. I believe Harrison Prosper independently derived a similar algorithm (for particle physics applications) around the same time. Finally, the quadrature version of the CHASC Bayesian hardness ratio work I think uses a similar algorithm as a first step; I think it's described in the Appendix to that paper. Hyunsook asks:

I would like to know why it’s not working with Levenberg-Marquardt (LM)

The LM algorithm uses the form of the chi**2 function to develop an approximation to derivatives of the fitting function, used to guide steps to improve the fit. Since “bayes” changes the fit function to something that is not in the chi**2 form (sum of weighted squared differences between data and model), standard LM can’t work with it (nor can it work with other non-Gaussian likelihoods). Put another way, LM is not a generic optimization algorithm; it is specifically tailored to chi**2 minimization. Powell is a generic algorithm, so it can work with the “bayes” marginal likelihood.

Also, in case it wasn’t clear from my earlier comment, the “bayes” marginal likelihood does not subtract the background; it marginalizes over it (analytically). If you follow the same procedure for Gaussian noise, it just so happens that the result can be expressed in terms of subtracting a background estimate, but that is just a convenient “accident” that comes from the form of the Gaussian. From a Bayesian point of view, the right thing to do is always to marginalize an uncertain background, not to subtract off a background estimate.

The only reference for the “bayes” algorithm is my paper in the first SCMA volume (ADS link), though I’m working on a more complete description of it (and some related algorithms). It’s also described in most of my CASt summer school lectures. I believe Harrison Prosper independently derived a similar algorithm (for particle physics applications) around the same time. Finally, the quadrature version of the CHASC Bayesian hardness ratio work I think uses a similar algorithm as a first step; I think it’s described in the Appendix to that paper.

]]>