Boost C++ Libraries of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards


Kolmogorov-Smirnov Distribution

#include <boost/math/distributions/kolmogorov_smirnov.hpp>
namespace boost{ namespace math{

template <class RealType = double,
          class Policy   = policies::policy<> >
class kolmogorov_smirnov_distribution;

typedef kolmogorov_smirnov_distribution<> kolmogorov_smirnov;

template <class RealType, class Policy>
class kolmogorov_smirnov_distribution
   typedef RealType  value_type;
   typedef Policy    policy_type;

   // Constructor:
   kolmogorov_smirnov_distribution(RealType n);

   // Accessor to parameter:
   RealType number_of_observations()const;

}} // namespaces

The Kolmogorov-Smirnov test in statistics compares two empirical distributions, or an empirical distribution against any theoretical distribution.[1] It makes use of a specific distribution which is informally known in the literature as the Kolmogorv-Smirnov distribution, implemented here.

Formally, if n observations are taken from a theoretical distribution G(x), and if Gn(x) represents the empirical CDF of those n observations, then the test statistic

will be distributed according to a Kolmogorov-Smirnov distribution parameterized by n.

The exact form of a Kolmogorov-Smirnov distribution is the subject of a large, decades-old literature.[2] In the interest of simplicity, Boost implements the first-order, limiting form of this distribution (the same form originally identified by Kolmogorov[3]), namely

Note that while the exact distribution only has support over [0, 1], this limiting form has positive mass above unity, particularly for small n. The following graph illustrations how the distribution changes for different values of n:

Member Functions
kolmogorov_smirnov_distribution(RealType n);

Constructs a Kolmogorov-Smirnov distribution with n observations.

Requires n > 0, otherwise calls domain_error.

RealType number_of_observations()const;

Returns the parameter n from which this object was constructed.

Non-member Accessors

All the usual non-member accessor functions that are generic to all distributions are supported: Cumulative Distribution Function, Probability Density Function, Quantile, Hazard Function, Cumulative Hazard Function, mean, median, mode, variance, standard deviation, skewness, kurtosis, kurtosis_excess, range and support.

The domain of the random variable is [0, +∞].


The CDF of the Kolmogorov-Smirnov distribution is implemented in terms of the fourth Jacobi Theta function; please refer to the accuracy ULP plots for that function.

The PDF is implemented separately, and the following ULP plot illustrates its accuracy:

Because PDF values are simply scaled out and up by the square root of n, the above plot is representative for all values of n. Note that for present purposes, "accuracy" refers to deviations from the limiting approximation, rather than deviations from the exact distribution.


In the following table, n is the number of observations, x is the random variable, π is Archimedes' constant, and ζ(3) is Apéry's constant.


Implementation Notes


Using the relation: cdf = jacobi_theta4tau(0, 2*x*x/π)


Using a manual derivative of the CDF

cdf complement

When x*x*n == 0: 1

When 2*x*x*n <= π: 1 - jacobi_theta4tau(0, 2*x*x*n/π)

When 2*x*x*n > π: -jacobi_theta4m1tau(0, 2*x*x*n/π)


Using a Newton-Raphson iteration

quantile from the complement

Using a Newton-Raphson iteration


Using a run-time PDF maximizer


sqrt(π/2) * ln(2) / sqrt(n)


2/12 - π/2*ln2(2))/n


(9/16*sqrt(π/2)*ζ(3)/n3/2 - 3 * mean * variance - mean2 * variance) / (variance3/2)


(7/720*π4/n2 - 4 * mean * skewness * variance3/2 - 6 * mean2 * variance - mean4) / (variance2)

[3] Kolmogorov A (1933). "Sulla determinazione empirica di una legge di distribuzione". G. Ist. Ital. Attuari. 4: 83–91.