[ArXiv] Identifiability and mixtures of distributions, Aug. 3, 2007

From arxiv/math.st: 0708.0499v1
Inference for mixtures of symmetric distributions by Hunter, Wang, and Hettmansperger, Annals of Statistics, 2007, Vol.35(1), pp.224-251.

Consider a case of fitting a spectral line in addition to continuum with a delta function or a gaussian (normal) density function. Among many regularity conditions, personally the most bothersome one is identifiability. When the scale parameter (σ) goes to zero, we cannot tell which model, delta, or gaussian, is a right one. Furthermore, the likelihood ratio test cannot be applied to the delta function due to its discontinuity. For a classical confidence interval or a hypothesis test which astronomers are familiar with from Numerical Recipes, identifiability and the set property (topology) of model parameters suffer from the lack of attentions from astronomers who performs statistical inference on model parameters. I found a few astronomical papers that ignored this identifiability but used the likelihood ratio tests for an extra component discovery. Clearly, these are statistical malpractices.

Although math.st:0708.0499 did not discuss spectral line fitting, it offers a nice review on identifiability when inferencing for mixtures of symmetric distributions.

4 Comments
  1. vlk:

    Couple of questions –
    1, what exactly is “identifiability”?
    2, they use something called the Hodges-Lehmann estimator as a proxy for the median. What advantage does this H-L estimator have over the median?

    09-07-2007, 1:54 pm
  2. hlee:

    1.I haven’t apprehended fully the idea of identifiability beyond mathematical definitions and I’m unable to illustrate its profound meaning by adopting actual astronomical/empirical examples. An ApJ paper by Protassov et.al. (2002, ApJ, 571, p.545) describes identifiability as one of the regularity conditions in the appendix.

    2. Without theoretical details, if I were asked to made a short comment on H-L estimator, its asymptotic variance is smaller by 3 times compared to that of median estimator. In other words, Asymptotic relative efficiency of H-L to median is 3. I think this is related to U-statistics, which I haven’t comprehended the related theories fully.

    09-07-2007, 3:32 pm
  3. vlk:

    Re Identifiability: Protassov et al don’t explicitly use that word. Which of the regularity conditions listed by them is about identifiability?

    If not identifiability, perhaps you can more easily say what is non-identifiability?

    09-08-2007, 1:57 pm
  4. hlee:

    In addition to Likelihood Ratio Test (LRT), Condition 2 in the paper, although the appendix didn’t explicitly state identifiability (I forgot it because generally identifiability is stated as regularity conditions), implies this identifiability. For a simple hypothesis testing, the null hypothesis is stated as Ho:θ=θ* (I chose the same θ* in the paper) and the statement related to this θ* bears the idea, the parameters to be identifiable. A textbook version of identifiability is f(θ_1)=f(θ_2) implies θ_1 = θ_2, where θ_i are parameters and f(.) are density functions. Fitting mixture of normal distributions suffers from identifiability issues.

    09-09-2007, 11:51 pm
Leave a comment