Simple user model¶

This example works through a fit to a small one-dimensional dataset which includes errors. This means that, unlike the Simple Interpolation example, an analysis of the parameter errors can be made. The fit begins with the use of the basic Sherpa models, but this turns out to be sub-optimal - since the model parameters do not match the required parameters - so a user model is created, which recasts the Sherpa model parameters into the desired form. It also has the advantage of simplifying the model, which avoids the need for manual intervention required with the Sherpa version.

Introduction¶

For this example, a data set from the literature was chosen, looking at non-Astronomy papers to show that Sherpa can be used in a variety of fields. There is no attempt made here to interpret the results, and the model parameters, and their errors, derived here should not be assumed to have any meaning compared to the results of the paper.

The data used in this example is taken from Zhao Y, Tang X, Zhao X, Wang Y (2017) Effect of various nitrogen conditions on population growth, temporary cysts and cellular biochemical compositions of Karenia mikimotoi. PLoS ONE 12(2): e0171996. doi:10.1371/journal.pone.0171996. The Supporting information section of the paper includes a spreadsheet containing the data for the figures, and this was downloaded and stored as the file pone.0171996.s001.xlsx.

The aim is to fit a similar model to that described in Table 5, that is

\[y = N (1.0 + e^{a + b * t})^{-1}\]

where \(t\) and \(y\) are the abscissa (independent axis) and ordinate (dependent axis), respectively. The idea is to see if we can get a similar result rather than to make any inferences based on the data. For this example only the “NaNO3” dataset is going to be used.

Setting up¶

Both NumPy and Matplotlib are required:

>>> import numpy as np
>>> import matplotlib.pyplot as plt

Reading in the data¶

The openpyxl package (version 2.4.2) is used to read in the data from the Excel spreadsheet. This is not guaranteed to be the optimal means of reading in the data (and relies on hard-coded knowledge of the column numbers):

>>> from openpyxl import load_workbook
>>> wb = load_workbook('pone.0171996.s001.xlsx')
>>> fig4 = wb['Fig4data']
>>> t = []; y = []; dy = []
>>> for r in list(fig4.values)[2:]:
...    t.append(r[0])
...    y.append(r[3])
...    dy.append(r[4])
...

With these arrays, a data object can be created:

>>> from sherpa.data import Data1D
>>> d = Data1D('NaNO_3', t, y, dy)

Unlike the first worked example, this data set includes an error column, so the data plot created by DataPlot contains error bars (although not obvious for the first point, which has an error of 0):

>>> dplot = DataPlot()
>>> dplot.prepare(d)
>>> dplot.plot()

The data can also be inspected directly (as there aren’t many data points):

>>> print(d)
name      = NaNO_3
x         = [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
y         = [1, 4.125, 7.5, 10.0417, 12.25, 13.75, 14.0833, 14.7917, 15.5, 14.6667, 14.875]
staterror = [0, 0.9214, 1.1273, 1.9441, 2.3363, 0.9289, 1.6615, 1.1726, 1.8066, 2.149, 1.983]
syserror  = None

Restricting the data¶

Trying to fit the whole data set will fail because the first data point has an error of 0, so it is necessary to restrict, or filter out, this data point. The simplest way is to select a data range to ignore using ignore(), in this case everything where \(x < 1\):

>>> d.get_filter()
'0.0000:20.0000'
>>> d.ignore(None, 1)
>>> d.get_filter()
'2.0000:20.0000'

The get_filter() routine returns a text description of the filters applied to the data; it starts with all the data being included (0 to 20) and then after excluding all points less than 1 the filter is now 2 to 20. The format can be changed to something more appropriate for this data set:

>>> d.get_filter(format='%d')
'2:20'

Creating the model¶

Table 5 lists the model fit to this dataset as

\[y = 14.89 (1.0 + e^{1.941 - 0.453 t})^{-1}\]

which can be constructed from components using the Const1D and Exp models, as shown below:

>>> from sherpa.models.basic import Const1D, Exp
>>> plateau = Const1D('plateau')
>>> rise = Exp('rise')
>>> mdl = plateau / (1 + rise)
>>> print(mdl)
(plateau / (1 + rise))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   plateau.c0   thawed            1 -3.40282e+38  3.40282e+38
   rise.offset  thawed            0 -3.40282e+38  3.40282e+38
   rise.coeff   thawed           -1 -3.40282e+38  3.40282e+38
   rise.ampl    thawed            1            0  3.40282e+38

The amplitude of the exponential is fixed at 1, but the other terms will remain free in the fit, with plateau.c0 representing the normalization, and the rise.offset and rise.coeff terms the exponent term. The offset and coeff terms do not match the form used in the paper, namely \(a + b t\), which has some interesting consequences for the fit, as will be discussed below in the user-model section.

>>> rise.ampl.freeze()
>>> print(mdl)
(plateau / (1 + rise))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   plateau.c0   thawed            1 -3.40282e+38  3.40282e+38
   rise.offset  thawed            0 -3.40282e+38  3.40282e+38
   rise.coeff   thawed           -1 -3.40282e+38  3.40282e+38
   rise.ampl    frozen            1            0  3.40282e+38

The funtional form of the exponential model provided by Sherpa, assuming an amplitude of unity, is

\[f(x) = e^{{\rm coeff} * (x - {\rm offset})}\]

which means that I expect the final values to be \({\rm coeff} \simeq -0.5\) and, as \(- {\rm coeff} * {\rm offset} \simeq 1.9\), then \({\rm offset} \simeq 4\). The plateau value should be close to 15.

The model and data can be shown together, but as the fit has not yet been made then showing on the same plot is not very instructive, so here’s two plots one above the other, created by mixing the Sherpa and Matplotlib APIs:

>>> from sherpa.plot import ModelPlot
>>> mplot = ModelPlot()
>>> mplot.prepare(d, mdl)
>>> plt.subplot(2, 1, 1)
>>> mplot.plot(clearwindow=True)
>>> plt.subplot(2, 1, 2)
>>> dplot.plot(clearwindow=True)
>>> plt.title('')

The title of the data plot was removed since it overlaped the X axis of the model plot above it.

Fitting the data¶

The main difference to fitting the first example is that the Chi2 statistic is used, since the data contains error values.

>>> from sherpa.stats import Chi2
>>> from sherpa.fit import Fit
>>> f = Fit(d, mdl, stat=Chi2())
>>> print(f)
data      = NaNO_3
model     = (plateau / (1 + rise))
stat      = Chi2
method    = LevMar
estmethod = Covariance
>>> print("Starting statistic: {}".format(f.calc_stat()))
Starting statistic: 633.2233812020354

The use of a Chi-square statistic means that the fit also calculates the reduced statistic (the final statistic value divided by the degrees of freedom), which should be \(\sim 1\) for a “good” fit, and an estimate of the probability (Q value) that the fit is good (this is also based on the statistic and number of degrees of freedom).

>>> fitres = f.fit()
>>> print(fitres.format())
Method                = levmar
Statistic             = chi2
Initial fit statistic = 633.223
Final fit statistic   = 101.362 at function evaluation 17
Data points           = 10
Degrees of freedom    = 7
Probability [Q-value] = 5.64518e-19
Reduced statistic     = 14.4802
Change in statistic   = 531.862
   plateau.c0     10.8792
   rise.offset    457.221
   rise.coeff     24.3662

The reduced chi-square value is large, as shown in the screen output above and the explicit access below, the probability value is essentially 0, and the parameters are nowhere near the expected values.

>>> print("Reduced chi square = {:.2f}".format(fitres.rstat))
Reduced chi square = 14.48

Visually comparing the model and data values highlights how poor this fit is:

>>> mplot.prepare(d, mdl)
>>> dplot.plot()
>>> mplot.overplot()

Either the model has got caught in a local minimum, or it is not a good description of the data. To investigate further, a useful technique is to switch the optimiser and re-fit; the hope is that the different optimiser will be able to escape the local minima in the search space. The default optimiser used by Fit is LevMar, which is often a good choice for data with errors. The other standard optimiser provided by Sherpa is NelderMead, which is often slower than LevMar - as it requires more model evaluations - but less-likely to get stuck:

>>> from sherpa.optmethods import NelderMead
>>> f.method = NelderMead()
>>> fitres2 = f.fit()
>>> print(mdl)
(plateau / (1 + rise))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   plateau.c0   thawed      10.8792 -3.40282e+38  3.40282e+38
   rise.offset  thawed      457.221 -3.40282e+38  3.40282e+38
   rise.coeff   thawed      24.3662 -3.40282e+38  3.40282e+38
   rise.ampl    frozen            1            0  3.40282e+38

An alternative to replacing the method attribute, as done above, would be to create a new Fit object - changing the method using the method attribute of the initializer, and use that to fit the model and data.

As can be seen, the parameter values have not changed; the dstatval attribute contains the change in the statsitic value, and as shown below, it has not improved:

>>> fitres2.dstatval
0.0

The failure of this fit is actually down to the coupling of the offset and coeff parameters of the Exp model, as will be discussed below, but a good solution can be found by tweaking the starting parameter values.

Restarting the fit¶

The reset() will change the parameter values back to the last values you set them to, which may not be the same as their default settings (in this case the difference is in the state of the rise.ampl parameter, which has remained frozen):

>>> mdl.reset()
>>> print(mdl)
(plateau / (1 + rise))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   plateau.c0   thawed            1 -3.40282e+38  3.40282e+38
   rise.offset  thawed            0 -3.40282e+38  3.40282e+38
   rise.coeff   thawed           -1 -3.40282e+38  3.40282e+38
   rise.ampl    frozen            1            0  3.40282e+38

Note

It is not always necessary to reset the parameter values when trying to get out of a local minimum, but it can be a useful strategy to avoid getting trapped in the same area.

One of the simplest changes to make here is to set the plateau term to the maximum data value, as the intention is for this term to represent the asymptote of the curve.

>>> plateau.c0 = np.max(d.y)
>>> mplot.prepare(d, mdl)
>>> dplot.plot()
>>> mplot.overplot()

A new fit object could be created, but it is also possible to re-use the existing object. This leaves the optimiser set to NelderMead, although in this case the same parameter values are found if the method attribute had been changed back to LevMar:

>>> fitres3 = f.fit()
>>> print(fitres3.format())
Method                = neldermead
Statistic             = chi2
Initial fit statistic = 168.42
Final fit statistic   = 0.299738 at function evaluation 42
Data points           = 10
Degrees of freedom    = 7
Probability [Q-value] = 0.9999
Reduced statistic     = 0.0428198
Change in statistic   = 168.12
   plateau.c0     14.9694
   rise.offset    4.17729
   rise.coeff     -0.420696

These results already look a lot better than the previous attempt; the reduced statistic is much smaller, and the values are similar to the reported values. As shown in the plot below, the model also well describes the data:

>>> mplot.prepare(d, mplot)
>>> dplot.plot()
>>> mplot.overplot()

The residuals can also be displayed, in this case normalizing by the error values by using a DelchiPlot plot:

>>> from sherpa.plot import DelchiPlot
>>> residplot = DelchiPlot()
>>> residplot.prepare(d, mdl, f.stat)
>>> residplot.plot()
>>> residplot.hline(0, overplot=True, linecolor='black')

Unlike the data and model plots, the prepare() method of the residual plot requires a statistic object, so the value in the fit object (using the stat attribute) is used.

Given that the reduced statistic for the fit is a lot smaller than 1 (\(\sim 0.04\)), the residuals are all close to 0: the ordinate axis shows \((d - m) / e\) where \(d\), \(m\), and \(e\) are data, model, and error value respectively, and the hline() method was used to draw a line at \({\rm sigma} = 0\).

What happens at \(t = 0\)?¶

The filtering applied earlier can be removed, to see how the model behaves at low times. Calling the notice() without any arguments removes any previous filter:

>>> d.notice()
>>> d.get_filter(format='%d')
'0:20'

For this plot, the FitPlot class is going to be used to show both the data and model rather than doing it manually as above:

>>> from sherp.plot import FitPlot
>>> fitplot = FitPlot()
>>> dplot.prepare(d)
>>> mplot.prepare(d, mdl)
>>> fitplot.prepare(dplot, mplot)
>>> fitplot.plot()

Note

The prepare method on the components of the Fit plot (in this case dplot and mplot) must be called with their appropriate arguments to ensure that the latest changes - such as filters and parameter values - are picked up.

Warning

Trying to create a residual plot for this new data range, will end up with a division-by-zero error from the prepare call, as the first data point has an error of 0 and the residual plot shows \((d - m) / e\).

For the rest of this example the first data point has to be removed:

>>> d.ignore(None, 1)

Estimating parameter errors¶

The calc_stat_info() method returns an overview of the current fit:

>>> statinfo = f.calc_stat_info()
>>> print(statinfo)
name      =
ids       = None
bkg_ids   = None
statname  = chi2
statval   = 0.2997382864907501
numpoints = 10
dof       = 7
qval      = 0.999900257643
rstat     = 0.04281975521296431

It is another way of getting at some of the information in the FitResults object; for instance

>>> statinfo.rstat == fitres3.rstat
True

Note

The FitResults object refers to the model at the time the fit was made, whereas calc_stat_info is calculated based on the current values, and so the results can be different.

The est_errors() method is used to estimate error ranges for the parameter values. It does this by varying the parameters around the best-fit location until the statistic value has increased by a set amount. The default method for estimating errors is Covariance

>>> f.estmethod.name
'covariance'

which has the benefit of being fast, but may not be as robust as other techniques.

>>> coverrs = f.est_errors()
>>> print(coverrs.format())
Confidence Method     = covariance
Iterative Fit Method  = None
Fitting Method        = levmar
Statistic             = chi2
covariance 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   plateau.c0        14.9694    -0.880442     0.880442
   rise.offset       4.17729    -0.646012     0.646012
   rise.coeff      -0.420696     -0.12247      0.12247

As shown below, the error values can be extracted from the output of est_errors().

The default is to calculate “one sigma” error bounds (i.e. those that cover 68.3% of the expected parameter range), but this can be changed by altering the sigma attribute of the error estimator.

>>> f.estmethod.sigma
1

Changing this value to 1.6 means that the errors are close to the 90% bounds:

>>> f.estmethod.sigma = 1.6
>>> coverrs90 = f.est_errors()
>>> print(coverrs90.format())
Confidence Method     = covariance
Iterative Fit Method  = None
Fitting Method        = neldermead
Statistic             = chi2
covariance 1.6-sigma (89.0401%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   plateau.c0        14.9694     -1.42193      1.42193
   rise.offset       4.17729     -1.04216      1.04216
   rise.coeff      -0.420696     -0.19679      0.19679

The covariance method uses the covariance matrix to estimate the error surface, and so the parameter errors are symmetric. A more-robust, but often significantly-slower, approach is to use the Confidence approach:

>>> from sherpa.estmethods import Confidence
>>> f.estmethod = Confidence()
>>> conferrs = f.est_errors()
plateau.c0 lower bound:      -0.804259
rise.offset lower bound:     -0.590258
rise.coeff lower bound:      -0.148887
rise.offset upper bound:     0.714407
plateau.c0 upper bound:      0.989664
rise.coeff upper bound:      0.103391

The error estimation for the confidence technique is run in parallel - if the machine has multiple cores usable by the Python multiprocessing module - which can mean that the screen output above is not always in the same order. As shown below, the confidence-derived error bounds are similar to the covariance bounds, but are not symmetric.

>>> print(conferrs.format())
Confidence Method     = confidence
Iterative Fit Method  = None
Fitting Method        = neldermead
Statistic             = chi2
confidence 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   plateau.c0        14.9694    -0.804259     0.989664
   rise.offset       4.17729    -0.590258     0.714407
   rise.coeff      -0.420696    -0.148887     0.103391

The default is to use all thawed parameters in the error analysis, but the est_errors() method has a parlist attribute which can be used to restrict the parameters used, for example to just the offset term:

>>> offseterrs = f.est_errors(parlist=(mdl.pars[1], ))
rise.offset lower bound:     -0.590258
rise.offset upper bound:     0.714407
>>> print(offseterrs)
datasets    = None
methodname  = confidence
iterfitname = none
fitname     = neldermead
statname    = chi2
sigma       = 1
percent     = 68.2689492137
parnames    = ('rise.offset',)
parvals     = (4.177287700807689,)
parmins     = (-0.59025803525842369,)
parmaxes    = (0.71440700826435144,)
nfits       = 8

The covariance and confidence limits can be compared by accessing the fields of the ErrorEstResults object:

>>> fmt = "{:13s} covar=±{:4.2f}  conf={:+5.2f} {:+5.2f}"
>>> for i in range(len(conferrs.parnames)):
...     print(fmt.format(conferrs.parnames[i], coverrs.parmaxes[i], \
...     conferrs.parmins[i], conferrs.parmaxes[i]))
...
plateau.c0    covar=±0.88  conf=-0.80 +0.99
rise.offset   covar=±0.65  conf=-0.59 +0.71
rise.coeff    covar=±0.12  conf=-0.15 +0.10

The est_errors() method returns a range, but often it is important to visualize the error surface, which can be done using the interval projection (for one parameter) and region projection (for two parameter) routines. The one-dimensional version is created with the IntervalProjection class, as shown in the following, which shows how the statistic varies with the plateau term (the vertical dashed line indicates the best-fit location for the parameter, and the horizontal line the statistic value for the best-fit location):

>>> from sherpa.plot import IntervalProjection
>>> intproj = IntervalProjection()
>>> intproj.calc(f, plateau.c0)
>>> intproj.plot()

Unlike the previous plots, this requires calling the calc() method before plot(). As the prepare() method was not called, it used the default options to calculate the plot range (i.e. the range over which plateau.c0 would be varied), which turns out in this case to be close to the one-sigma limits. The range, and number of points, can also be set explicitly:

>>> intproj.prepare(min=12.5, max=20, nloop=51)
>>> intproj.calc(f, plateau.c0)
>>> intproj.plot()
>>> s0 = f.calc_stat()
>>> for ds in [1, 4, 9]:
...    intproj.hline(s0 + ds, overplot=True, linestyle='dot', linecolor='gray')
...

The horizontal lines indicate the statistic value for one, two, and three sigma limits for a single parameter value (and assuming a Chi-square statistic). The plot shows how, as the parameter moves away from its best-fit location, the search space becomes less symmetric.

Following the same approach, the RegionProjection class calculates the statistic value as two parameters are varied, displaying the results as a contour plot. It requires two parameters and the visualization is created with the contour() method:

>>> from sherpa.plot import RegionProjection
>>> regproj = RegionProjection()
>>> regproj.calc(f, rise.offset, rise.coeff)
>>> regproj.contour()

../_images/regproj_offset_coeff_auto.png

The contours show the one, two, and three sigma contours, with the cross indicating the best-fit value. As with the interval-projection plot, the prepare() method can be used to define the grid of points to use; the values below are chosen to try and cover the full three-sigma range as well as improve the smoothness of the contours by increasing the number of points that are looped over:

>>> regproj.prepare(min=(2, -1.2), max=(8, -0.1), nloop=(21, 21))
>>> regproj.calc(f, rise.offset, rise.coeff)
>>> regproj.contour()

../_images/regproj_offset_coeff_manual.png

Writing your own model¶

The approach above has provided fit results, but they do not match those of the paper and, since

\[\begin{split}a & = & - {\rm coeff} * {\rm offset} \\ b & = & \, {\rm coeff}\end{split}\]

it is hard to transform the values from above to get accurate results. An alternative approach is to create a model with the parameters in the required form, which requires a small amount of code (by using the Exp class to do the actual model evaluation).

The following class (MyExp) creates a model that has two parameters (a and b) that represents \(f(x) = e^{a + b x}\). The starting values for these parameters are chosen to match the default values of the Exp parameters, where \({\rm coeff} = -1\) and \({\rm offset} = 0\):

class MyExp(ArithmeticModel):
    """A simpler form of the Exp model.

    The model is f(x) = exp(a + b * x).
    """

    def __init__(self, name='myexp'):

        self.a = Parameter(name, 'a', 0)
        self.b = Parameter(name, 'b', -1)

        # The _exp instance is used to perform the model calculation,
        # as shown in the calc method.
        self._exp = Exp('hidden')

        return ArithmeticModel.__init__(self, name, (self.a, self.b))

    def calc(self, pars, *args, **kwargs):
        """Calculate the model"""

        # Tell the exp model to evaluate the model, after converting
        # the parameter values to the required form, and order, of:
        # offset, coeff, ampl.
        #
        coeff = pars[1]
        offset = -1 * pars[0] / coeff
        ampl = 1.0
        return self._exp.calc([offset, coeff, ampl], *args, **kwargs)

This can be used as any other Sherpa model:

>>> plateau2 = Const1D('plateau2')
>>> rise2 = MyExp('rise2')
>>> mdl2 = plateau2 / (1 + rise2)
>>> print(mdl)
(plateau2 / (1 + rise2))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   plateau2.c0  thawed            1 -3.40282e+38  3.40282e+38
   rise2.a      thawed            0 -3.40282e+38  3.40282e+38
   rise2.b      thawed           -1 -3.40282e+38  3.40282e+38
>>> fit2 = Fit(d, mdl2, stat=Chi2())
>>> res2 = fit2.fit()
>>> print(res2.format())
Method                = levmar
Statistic             = chi2
Initial fit statistic = 633.223
Final fit statistic   = 0.299738 at function evaluation 52
Data points           = 10
Degrees of freedom    = 7
Probability [Q-value] = 0.9999
Reduced statistic     = 0.0428198
Change in statistic   = 632.924
   plateau2.c0    14.9694
   rise2.a        1.75734
   rise2.b        -0.420685

>>> dplot.prepare(d)
>>> mplot2 = ModelPlot()
>>> mplot2.prepare(d, mdl2)
>>> dplot.plot()
>>> mplot2.overplot()

Unlike the initial attempt, this version did not require any manual intervention to find the best-fit solution. This is because the degeneracy between the two terms of the exponential in the Exp model have been broken in this version, and so the optimiser work better. It also has the advantage that the parameters match the problem, and so the parameter limits determined below can be used directly, without having to transform them.

>>> fit2.estmethod = Confidence()
>>> conferrs2 = fit2.est_errors()
plateau2.c0 lower bound:     -0.804444
rise2.b lower bound: -0.148899
rise2.a lower bound: -0.38086
rise2.b upper bound: 0.10338
plateau2.c0 upper bound:     0.989623
rise2.a upper bound: 0.489919
>>> print(fit2.conferrs2.format())
Confidence Method     = confidence
Iterative Fit Method  = None
Fitting Method        = levmar
Statistic             = chi2
confidence 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   plateau2.c0       14.9694    -0.804444     0.989623
   rise2.a           1.75734     -0.38086     0.489919
   rise2.b         -0.420685    -0.148899      0.10338

The difference in the model parameterisation can also be seen in the various error-analysis plots, such as the region-projection contour plot (where the limits have been chosen to cover the three-sigma contour), and a marker has been added to show the result listed in Table 5 of Zhao et al:

>>> regproj2 = RegionProjection()
>>> regproj2.prepare(min=(0.5, -1.2), max=(5, -0.1), nloop=(21, 21))
>>> regproj2.calc(fit2, rise2.a, rise2.b)
>>> regproj2.contour()
>>> plt.plot(1.941, -0.453, 'ko', label='NaNO$_3$ Table 5')
>>> plt.legend(loc=1)