NAME

ks - perform Kolmogorov-Smirnov test on datasets


SYNOPSIS

ks [param=value ...]


PARAMETERS

ks uses an IRAF-compatible parameter interface. A template parameter file is in /proj/axaf/simul/lib/uparm/ks.par.

db1
The name of an rdb file containing sampled data. This must be a file (i.e., not a pipe).

db1_x
The column name to read from db1.

db2
The name of an rdb file containing the reference (or second) data.

db2_type
This specifies the type of data in the second database. It can be one of
sample: another set of sampled data
pdf: a partial distribution function
cdf: a cumulative distribution function
db2_x
The column name of the data if db2_type is sample.

db2_min
The column containing minimum x values for distribution functions bins.

db2_max
The column containing maximum x values for distribution functions bins.

db2_wt
The column containing weights for distribution functions bins.

help boolean
Print this help information.

version boolean
Print version to the Unix standard output stream and exit.


DESCRIPTION

ks performs the Komogorov-Smirnov test to determine if two distributions are drawn from the same parent, or to determine if a distribution is drawn from a specified partial or cumulative distribution function.

db1 is an rdb table containing unbinned sampled data. db2 is either the same or a binned CDF or PDF. For sampled data, the name of the column containing the samples is required. For a CDF or PDF, the names of the columns containing the minimum and maximum limits of the bins, as well as the column containing the value of the distribution function (here called the weight) is required. The distribution functions need not be normalized.

When working with a CDF or PDF, ks interpolates within a bin by assuming a uniform probability within the bin.

The output consists of two numbers, the maximum distance between the determined cumulative distribution functions of the two data sets, and the probability of the null hypothesis (that both are the same). Small values of the probability indicate a signifcant difference between the data sets.


NOTES

This software is based upon routines from Numerical Recipes, Second Edition, p.623-626.


AUTHOR

Diab Jerius (djerius@cfa.harvard.edu)