PPS: Tom, I forgot we did indeed use the gamma quite recently. Given the posterior draws of a Poisson intensity obtained with BEHR, we fit (yea, fit) a gamma function to the sample to determine the parameters of the prior going forward to the next observation. You could rightly say that it is being used as a distribution, but for a tiny step in there after the sampling and before being set as the prior, it was indeed used as a function.

]]>Afraid not, Tom. Because even when such an “exponentially truncated power-law” might have been the appropriate function, people have used Gaussians instead.

So yes, I do mean fitting, with an honest-to-goodness function, the same way one might fit a spectral line with a Gaussian. Essentially to any quantity that does not extend to the -ve line. Such as the radial profile of a piled-up PSF to come up with a contrived example. I can give you a case where _I_ ought to have resorted to fitting with the gamma but didn’t — e.g., see Fig 8 of Chung et al. (2004, ApJ, 606, 1184). I will keep an eye peeled out for other cases to highlight.

Hyunsook, I don’t understand the distinction you make between fitting and modeling. It could be you are referring to some statistics-only definition of what a “model” means? In any case, I don’t see how error-estimates are dependent on the fitting _process_.

]]>*But I wanted to highlight the fact that it really is also a function, and can be used as such. Quite often we come across quantities that are defined on the +ve line, but people continue to use Gaussians to fit to them when in fact a gamma function would be more desirable.*

(Note it’s defined on the non-negative real line, not just +ve; it’s important that it includes zero.)

Can you give an example where it appears in astronomy as a function and not a distribution? Your example has it as a distribution. The only thing close to this that I can think is fitting unnormalized number counts, where one could use a function proportional to a gamma distribution as the intensity for an inhomogeneous point process (a Schechter function is in this vein). But in this case it’s still a distribution, albeit a measure that may not be normalized to unit area (a Poisson process intensity is a measure).

In any case, “gamma function” is already the name for \Gamma(x) (in TeX notation), the function that equals n! when x=n+1 for a nonnegative integer n. Both lower and upper case “gamma” (the word, not the symbol) are used for this in the literature (e.g., Numerical Recipes and the various Wolfram products, including Mathematica, use the lower case), so maybe you need to come up with a different name if you want to distinguish the functional form from the distribution. And in that case, you might as well drop the normalization constants. I have seen authors use “Gaussian” for any function proportional to exp(-x**2) (for appropriate x), and reserve “normal distribution” for the normalized form (including the sqrt(2pi), etc.). Perhaps we need to come up with a similar separate name for the functional form underlying the gamma distribution. Maybe just “truncated power law” or “exponentially truncated power law.”

Regarding the prior, I did *not* write that “priors putting [all] probability at infinity [are] equivalent to 0 counts in 0 time;” that’s nearly the opposite of what I wrote. Referring to your original statement that the (1,0) prior means “we expect to detect 0 counts in 0 time,” I stated that any prior that does *not* put all its mass at infinity says we expect to detect 0 counts in 0 time, since for any finite rate, the Poisson distribution for an interval of zero size puts all the probability at n=0 (“we expect to detect 0 counts in 0 time). So your statement cannot be a sound justification for the (1,0) prior. Sorry for the confusion; perhaps it’s clearer now.

Also, note that the gamma distribution is only defined for beta>0, so (1,0) is not a legitimate gamma distribution. This is where it’s useful to distinguish the distribution from the functional form (and perhaps was part of your motivation in doing so). The (1,0) limit of the distribution’s functional form (including the normalization constants in your definition) is 0, which isn’t a very good prior for anything. The (1,0) limit of the functional form *without* the normalization factors is a constant, and thus cannot be used as a probability distribution over the non-negative real line (in Bayesian language, it would be an improper prior). In other words, a flat prior is not a gamma distribution, though it is a special case of an “exponentially truncated power law.”

You are right of course that we use it mostly as a distribution, and indeed, the second line in the equation cannot apply to anything else. But I wanted to highlight the fact that it really is also a function, and can be used as such. Quite often we come across quantities that are defined on the +ve line, but people continue to use Gaussians to fit to them when in fact a gamma function would be more desirable.

I am confused though about priors putting probability at infinity always being equivalent to 0 counts in 0 time. The nice thing about alpha=1,beta=0 is that it is exactly like a Poisson likelihood for such an “observation”, and contrarily, there are other improper priors which go off to infinity but don’t have that nice correspondence with the Poisson likelihood (alpha=0.5, for instance, which apparently is the Fisher least information prior, or to take a completely different function, the Lorentzian). What is the difference between these cases then?

]]>Besides being the conjugate prior for the rate (or mean number of counts) of a constant Poisson counting process, the gamma distribution with argument (1/x) rather than x, dubbed the *inverse gamma distribution* for x, is the conjugate prior for the variance of a normal (Gaussian) distribution.

My further quibble is with the remark that the *gamma(alpha=1,beta=0)* prior “is like saying ‘we expect to detect 0 counts in 0 time’.” What prior (gamma or otherwise) that does not put all its probability at infinity does *not* say “we expect to detect 0 counts in 0 time?” This is a basic property of the Poisson distribution: the expected number of counts is proportional to the interval size (for any finite rate), so no matter what you say about the rate (a priori or a posteriori), you expect 0 events in an interval of 0 measure.

Here’s one way to look at what the *gamma(alpha=1,beta=0)* prior says: the probability I assign to seeing n counts in any particular interval of nonzero size is independent of n; that is, the prior expresses no preference for the number of counts expected a priori. I suppose this must be in the literature somewhere. I noted this justification for the flat (in x) prior back in my 1992 paper for the 1st SCMA conference. I’m not sure it made the printed version (which was highly abridged from the submitted one), but the long version is online: The Promise of Bayesian Inference for Astrophysics. The argument is on pp. 42-43, and makes use of the fact that an integral over a gamma distribution can be interpreted as a Laplace transform.

I stole this argument from Thomas Bayes. It is analogous to his justification for a flat prior for the single-trial probability parameter in a binomial distribution. He argued for a flat prior, not by arguing that uniformity of the prior density was in some sense a natural expression of “ignorance” (as is often attributed to him) but rather that uniformity of the *predictive distribution* for the number of successes was a useful description of ignorance. (See Stephen Stigler’s paper on Bayes’s famous paper for discussion of the misunderstanding of Bayes’s reasoning.)

A virtue of these predictive criteria for non-informative priors is that they are immune to the usual complaint that flatness of a prior for a parameter is parameterization-dependent. Of course, one could argue about “reparameterizing” the sample space. In these problems, with discrete sample spaces, such arguments seem more than a bit of a stretch. Unfortunately, I think such predictive arguments are only feasible for fairly simple problems. But some of those simple problems are important—like this one!

]]>I’m not sure if the above example is proper or not. Normal text books stop providing examples at Poisson or the exponential. Until graduate textbooks, no one would care the gamma density function is one of density functions, let alone it’s a conjugate prior of Poisson. Unless one is dealing with a Poisson likelihood and considering computing a posterior (astronomy posteriors require heavy computations in general), the gamma density function does not receive its spot light often. When I saw the title γ function, the first impression was Γ(x)=(x-1)!

]]>