...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Often you don't want the value of the CDF, but its complement, which is
to say 1-p
rather than p
.
It is tempting to calculate the CDF and subtract it from 1
, but if p
is very close to 1
then cancellation
error will cause you to lose accuracy, perhaps totally.
See below "Why and when to use complements?"
In this library, whenever you want to receive a complement, just wrap all
the function arguments in a call to complement(...)
, for example:
students_t dist(5); cout << "CDF at t = 1 is " << cdf(dist, 1.0) << endl; cout << "Complement of CDF at t = 1 is " << cdf(complement(dist, 1.0)) << endl;
But wait, now that we have a complement, we have to be able to use it as
well. Any function that accepts a probability as an argument can also accept
a complement by wrapping all of its arguments in a call to complement(...)
,
for example:
students_t dist(5); for(double i = 10; i < 1e10; i *= 10) { // Calculate the quantile for a 1 in i chance: double t = quantile(complement(dist, 1/i)); // Print it out: cout << "Quantile of students-t with 5 degrees of freedom\n" "for a 1 in " << i << " chance is " << t << endl; }
Tip | |
---|---|
Critical values are just quantiles Some texts talk about quantiles, or percentiles or fractiles, others about critical values, the basic rule is: Lower critical values are the same as the quantile. Upper critical values are the same as the quantile from the complement of the probability. For example, suppose we have a Bernoulli process, giving rise to a binomial distribution with success ratio 0.1 and 100 trials in total. The lower critical value for a probability of 0.05 is given by:
and the upper critical value is given by:
which return 4.82 and 14.63 respectively. |
Tip | |
---|---|
Why bother with complements anyway?
It's very tempting to dispense with complements, and simply subtract
the probability from 1 when required. However, consider what happens
when the probability is very close to 1: let's say the probability expressed
at float precision is Or to look at this another way: consider that we want the risk of falsely rejecting the null-hypothesis in the Student's t test to be 1 in 1 billion, for a sample size of 10,000. This gives a probability of 1 - 10-9, which is exactly 1 when calculated at float precision. In this case calculating the quantile from the complement neatly solves the problem, so for example:
returns the expected t-statistic
raises an overflow error, since it is the same as:
Which has no finite result. With all distributions, even for more reasonable probability (unless the value of p can be represented exactly in the floating-point type) the loss of accuracy quickly becomes significant if you simply calculate probability from 1 - p (because it will be mostly garbage digits for p ~ 1). So always avoid, for example, using a probability near to unity like 0.99999
and instead use
since 1 - 0.99999 is not exactly equal to 0.00001 when using floating-point arithmetic. This assumes that the 0.00001 value is either a constant, or can be computed by some manner other than subtracting 0.99999 from 1. |