...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
#include <boost/math/distributions/arcsine.hpp>
namespace boost{ namespace math{ template <class RealType = double, class Policy = policies::policy<> > class arcsine_distribution; typedef arcsine_distribution<double> arcsine; // double precision standard arcsine distribution [0,1]. template <class RealType, class Policy> class arcsine_distribution { public: typedef RealType value_type; typedef Policy policy_type; // Constructor from two range parameters, x_min and x_max: arcsine_distribution(RealType x_min, RealType x_max); // Range Parameter accessors: RealType x_min() const; RealType x_max() const; }; }} // namespaces
The class type arcsine_distribution
represents an arcsine
probability
distribution function. The arcsine distribution is named because
its CDF uses the inverse sin-1 or arcsine.
This is implemented as a generalized version with support from x_min to x_max providing the 'standard arcsine distribution' as default with x_min = 0 and x_max = 1. (A few make other choices for 'standard').
The arcsine distribution is generalized to include any bounded support a <= x <= b by Wolfram and Wikipedia, but also using location and scale parameters by Virtual Laboratories in Probability and Statistics Arcsine distribution. The end-point version is simpler and more obvious, so we implement that. If desired, this outlines how the Beta Distribution can be used to add a shape factor.
The probability density function PDF for the arcsine distribution defined on the interval [x_min, x_max] is given by:
f(x; x_min, x_max) = 1 /(π⋅√((x - x_min)⋅(x_max - x_min))
For example, Wolfram Alpha arcsine distribution, from input of
N[PDF[arcsinedistribution[0, 1], 0.5], 50]
computes the PDF value
0.63661977236758134307553505349005744813783858296183
The Probability Density Functions (PDF) of generalized arcsine distributions are symmetric U-shaped curves, centered on (x_max - x_min)/2, highest (infinite) near the two extrema, and quite flat over the central region.
If random variate x is x_min or x_max, then the PDF is infinity. If random variate x is x_min then the CDF is zero. If random variate x is x_max then the CDF is unity.
The 'Standard' (0, 1) arcsine distribution is shown in blue and some generalized examples with other x ranges.
The Cumulative Distribution Function CDF is defined as
F(x) = 2⋅arcsin(√((x-x_min)/(x_max - x))) / π
arcsine_distribution(RealType x_min, RealType x_max);
constructs an arcsine distribution with range parameters x_min and x_max.
Requires x_min < x_max, otherwise domain_error is called.
For example:
arcsine_distribution<> myarcsine(-2, 4);
constructs an arcsine distribution with x_min = -2 and x_max = 4.
Default values of x_min = 0 and x_max =
1 and a typedef arcsine_distribution<double> arcsine;
mean that
arcsine as;
constructs a 'Standard 01' arcsine distribution.
RealType x_min() const; RealType x_max() const;
Return the parameter x_min or x_max from which this distribution was constructed.
So, for example:
using boost::math::arcsine_distribution; arcsine_distribution<> as(2, 5); // Constructs a double arcsine distribution. BOOST_ASSERT(as.x_min() == 2.); // as.x_min() returns 2. BOOST_ASSERT(as.x_max() == 5.); // as.x_max() returns 5.
All the usual non-member accessor functions that are generic to all distributions are supported: Cumulative Distribution Function, Probability Density Function, Quantile, Hazard Function, Cumulative Hazard Function, mean, median, mode, variance, standard deviation, skewness, kurtosis, kurtosis_excess, range and support.
The formulae for calculating these are shown in the table below, and at Wolfram Mathworld.
Note | |
---|---|
There are always two values for the mode, at x_min and at x_max, default 0 and 1, so instead we raise the exception domain_error. At these extrema, the PDFs are infinite, and the CDFs zero or unity. |
The arcsine distribution is useful to describe Random walks, (including drunken walks) Brownian motion, Weiner processes, Bernoulli trials, and their application to solve stock market and other ruinous gambling games.
The random variate x is constrained to x_min and x_max, (for our 'standard' distribution, 0 and 1), and is usually some fraction. For any other x_min and x_max a fraction can be obtained from x using
fraction = (x - x_min) / (x_max - x_min)
The simplest example is tossing heads and tails with a fair coin and modelling the risk of losing, or winning. Walkers (molecules, drunks...) moving left or right of a centre line are another common example.
The random variate x is the fraction of time spent on the 'winning' side. If half the time is spent on the 'winning' side (and so the other half on the 'losing' side) then x = 1/2.
For large numbers of tosses, this is modelled by the (standard [0,1]) arcsine distribution, and the PDF can be calculated thus:
std::cout << pdf(as, 1. / 2) << std::endl; // 0.637 // pdf has a minimum at x = 0.5
From the plot of PDF, it is clear that x = ½ is the minimum of the curve, so this is the least likely scenario. (This is highly counter-intuitive, considering that fair tosses must eventually become equal. It turns out that eventually is not just very long, but infinite!).
The most likely scenarios are towards the extrema where x = 0 or x = 1.
If fraction of time on the left is a ¼, it is only slightly more likely because the curve is quite flat bottomed.
std::cout << pdf(as, 1. / 4) << std::endl; // 0.735
If we consider fair coin-tossing games being played for 100 days (hypothetically continuously to be 'at-limit') the person winning after day 5 will not change in fraction 0.144 of the cases.
We can easily compute this setting x = 5./100 = 0.05
std::cout << cdf(as, 0.05) << std::endl; // 0.144
Similarly, we can compute from a fraction of 0.05 /2 = 0.025 (halved because we are considering both winners and losers) corresponding to 1 - 0.025 or 97.5% of the gamblers, (walkers, particles...) on the same side of the origin
std::cout << 2 * cdf(as, 1 - 0.975) << std::endl; // 0.202
(use of the complement gives a bit more clarity, and avoids potential loss of accuracy when x is close to unity, see why complements?).
std::cout << 2 * cdf(complement(as, 0.975)) << std::endl; // 0.202
or we can reverse the calculation by assuming a fraction of time on one side, say fraction 0.2,
std::cout << quantile(as, 1 - 0.2 / 2) << std::endl; // 0.976 std::cout << quantile(complement(as, 0.2 / 2)) << std::endl; // 0.976
Summary: Every time we toss, the odds are equal, so on average we have the same change of winning and losing.
But this is not true for an an individual game where one will be mostly in a bad or good patch.
This is quite counter-intuitive to most people, but the mathematics is clear, and gamblers continue to provide proof.
Moral: if you in a losing patch, leave the game. (Because the odds to recover to a good patch are poor).
Corollary: Quit while you are ahead?
A working example is at arcsine_example.cpp including sample output .
The arcsine distribution with x_min = 0 and x_max = 1 is special case of the Beta Distribution with α = 1/2 and β = 1/2.
This distribution is implemented using sqrt, sine, cos and arc sine and
cos trigonometric functions which are normally accurate to a few machine epsilon.
But all values suffer from loss
of significance or cancellation error for values of x
close to x_max. For example, for a standard [0, 1]
arcsine distribution as, the pdf is symmetric about
random variate x = 0.5 so that one would expect pdf(as, 0.01) ==
pdf(as, 0.99)
. But
as x nears unity, there is increasing loss
of significance. To counteract this, the complement versions of
CDF and quantile are implemented with alternative expressions using cos-1
instead of sin-1. Users should see why
complements? for guidance on when to avoid loss of accuracy by using
complements.
The results were tested against a few accurate spot values computed by Wolfram Alpha, for example:
N[PDF[arcsinedistribution[0, 1], 0.5], 50] 0.63661977236758134307553505349005744813783858296183
In the following table a and b are the parameters x_min and x_max, x is the random variable, p is the probability and its complement q = 1-p.
Function |
Implementation Notes |
---|---|
support |
x ∈ [a, b], default x ∈ [0, 1] |
|
f(x; a, b) = 1/(π⋅√(x - a)⋅(b - x)) |
cdf |
F(x) = 2/π⋅sin-1(√(x - a) / (b - a) ) |
cdf of complement |
2/(π⋅cos-1(√(x - a) / (b - a))) |
quantile |
-a⋅sin2(½π⋅p) + a + b⋅sin2(½π⋅p) |
quantile from the complement |
-a⋅cos2(½π⋅p) + a + b⋅cos2(½π⋅q) |
mean |
½(a+b) |
median |
½(a+b) |
mode |
x ∈ [a, b], so raises domain_error (returning NaN). |
variance |
(b - a)2 / 8 |
skewness |
0 |
kurtosis excess |
-3/2 |
kurtosis |
kurtosis_excess + 3 |
The quantile was calculated using an expression obtained by using Wolfram Alpha to invert the formula for the CDF thus
solve [p - 2/pi sin^-1(sqrt((x-a)/(b-a))) = 0, x]
which was interpreted as
Solve[p - (2 ArcSin[Sqrt[(-a + x)/(-a + b)]])/Pi == 0, x, MaxExtraConditions -> Automatic]
and produced the resulting expression
x = -a sin^2((pi p)/2)+a+b sin^2((pi p)/2)
Thanks to Wolfram for providing this facility.