...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
#include <boost/math/special_functions/lambert_w.hpp>
namespace boost { namespace math { template <class T> calculated-result-type lambert_w0(T z); // W0 branch, default policy. template <class T> calculated-result-type lambert_wm1(T z); // W-1 branch, default policy. template <class T> calculated-result-type lambert_w0_prime(T z); // W0 branch 1st derivative. template <class T> calculated-result-type lambert_wm1_prime(T z); // W-1 branch 1st derivative. template <class T, class Policy> calculated-result-type lambert_w0(T z, const Policy&); // W0 with policy. template <class T, class Policy> calculated-result-type lambert_wm1(T z, const Policy&); // W-1 with policy. template <class T, class Policy> calculated-result-type lambert_w0_prime(T z, const Policy&); // W0 derivative with policy. template <class T, class Policy> calculated-result-type lambert_wm1_prime(T z, const Policy&); // W-1 derivative with policy. } // namespace boost } // namespace math
The Lambert W function is the solution of the equation W(z)e^{W(z)} = z. It is also called the Omega function, the inverse of f(W) = We^{W}.
On the interval [0, ∞), there is just one real solution. On the interval (-e^{-1},
0), there are two real solutions, generating two branches which we will denote
by W_{0} and W_{-1}. In Boost.Math, we call
these principal branches lambert_w0
and lambert_wm1
; their derivatives
are labelled lambert_w0_prime
and lambert_wm1_prime
.
There is a singularity where the branches meet at e^{-1} ≅ -0.367879
.
Approaching this point, the condition number of function evaluation tends to
infinity, and the only method of recovering high accuracy is use of higher
precision.
This implementation computes the two real branches W_{0} and
W_{-1}
with the functions lambert_w0
and lambert_wm1
, and their
derivatives, lambert_w0_prime
and lambert_wm1_prime
. Complex
arguments are not supported.
The final Policy argument is optional and can be used to control how the function deals with errors. Refer to Policies for more details and see examples below.
The Lambert W function has a myriad of applications. Corless et al. provide a summary of applications, from the mathematical, like iterated exponentiation and asymptotic roots of trinomials, to the real-world, such as the range of a jet plane, enzyme kinetics, water movement in soil, epidemics, and diode current (an example replicated here). Since the publication of their landmark paper, there have been many more applications, and also many new implementations of the function, upon which this implementation builds.
The most basic usage of the Lambert-W function is demonstrated below:
#include <boost/math/special_functions/lambert_w.hpp> // For lambert_w function. using boost::math::lambert_w0; using boost::math::lambert_wm1;
std::cout.precision(std::numeric_limits<double>::max_digits10); // Show all potentially significant decimal digits, std::cout << std::showpoint << std::endl; // and show significant trailing zeros too. double z = 10.; double r = lambert_w0(z); // Default policy for double. std::cout << "lambert_w0(z) = " << r << std::endl; // lambert_w0(z) = 1.7455280027406994
Other floating-point types can be used too, here float
,
including user-defined types like Boost.Multiprecision.
It is convenient to use a function like show_value
to display all (and only) potentially significant decimal digits, including
any significant trailing zeros, (std::numeric_limits<T>::max_digits10
) for the type T
.
float z = 10.F; float r; r = lambert_w0(z); // Default policy digits10 = 7, digits2 = 24 std::cout << "lambert_w0("; show_value(z); std::cout << ") = "; show_value(r); std::cout << std::endl; // lambert_w0(10.0000000) = 1.74552798
Example of an integer argument to lambert_w0
,
showing that an int
literal is
correctly promoted to a double
.
std::cout.precision(std::numeric_limits<double>::max_digits10); double r = lambert_w0(10); // Pass an int argument "10" that should be promoted to double argument. std::cout << "lambert_w0(10) = " << r << std::endl; // lambert_w0(10) = 1.7455280027406994 double rp = lambert_w0(10); std::cout << "lambert_w0(10) = " << rp << std::endl; // lambert_w0(10) = 1.7455280027406994 auto rr = lambert_w0(10); // C++11 needed. std::cout << "lambert_w0(10) = " << rr << std::endl; // lambert_w0(10) = 1.7455280027406994 too, showing that rr has been promoted to double.
Using Boost.Multiprecision types to get much higher precision is painless.
cpp_dec_float_50 z("10"); // Note construction using a decimal digit string "10", // NOT a floating-point double literal 10. cpp_dec_float_50 r; r = lambert_w0(z); std::cout << "lambert_w0("; show_value(z); std::cout << ") = "; show_value(r); std::cout << std::endl; // lambert_w0(10.000000000000000000000000000000000000000000000000000000000000000000000000000000) = // 1.7455280027406993830743012648753899115352881290809413313533156980404446940000000
Warning | |
---|---|
When using multiprecision, take very great care not to construct or assign
non-integers from |
Using multiprecision types, it is all too easy to get multiprecision precision wrong!
cpp_dec_float_50 z(0.7777777777777777777777777777777777777777777777777777777777777777777777777); // Compiler evaluates the nearest double-precision binary representation, // from the max_digits10 of the floating_point literal double 0.7777777777777777777777777777..., // so any extra digits in the multiprecision type // beyond max_digits10 (usually 17) are random and meaningless. cpp_dec_float_50 r; r = lambert_w0(z); std::cout << "lambert_w0("; show_value(z); std::cout << ") = "; show_value(r); std::cout << std::endl; // lambert_w0(0.77777777777777779011358916250173933804035186767578125000000000000000000000000000) // = 0.48086152073210493501934682309060873341910109230469724725005039758139532631901386
Note | |
---|---|
See spurious non-seven decimal digits appearing after digit #17 in the argument 0.7777777777777777...! |
And similarly constructing from a literal double
0.9
, with more random digits after digit
number 17.
cpp_dec_float_50 z(0.9); // Construct from floating_point literal double 0.9. cpp_dec_float_50 r; r = lambert_w0(0.9); std::cout << "lambert_w0("; show_value(z); std::cout << ") = "; show_value(r); std::cout << std::endl; // lambert_w0(0.90000000000000002220446049250313080847263336181640625000000000000000000000000000) // = 0.52983296563343440510607251781038939952850341796875000000000000000000000000000000 std::cout << "lambert_w0(0.9) = " << lambert_w0(static_cast<double>(0.9)) // lambert_w0(0.9) // = 0.52983296563343441 << std::endl;
Note how the cpp_float_dec_50
result is only as correct as from a double
= 0.9
.
Now see the correct result for all 50 decimal digits constructing from a decimal digit string "0.9":
cpp_dec_float_50 z("0.9"); // Construct from decimal digit string. cpp_dec_float_50 r; r = lambert_w0(z); std::cout << "lambert_w0("; show_value(z); std::cout << ") = "; show_value(r); std::cout << std::endl; // 0.90000000000000000000000000000000000000000000000000000000000000000000000000000000) // = 0.52983296563343441213336643954546304857788132269804249284012528304239956413801252
Note the expected zeros for all places up to 50 - and the correct Lambert W result!
(It is just as easy to compute even higher precisions, at least to thousands of decimal digits, but not shown here for brevity. See lambert_w_simple_examples.cpp for comparison of an evaluation at 1000 decimal digit precision with Wolfram Alpha).
Policies can be used to control what action to take on errors:
// Define an error handling policy: typedef policy< domain_error<throw_on_error>, overflow_error<ignore_error> // possibly unwise? > my_throw_policy; std::cout.precision(std::numeric_limits<double>::max_digits10); // Show all potentially significant decimal digits, std::cout << std::showpoint << std::endl; // and show significant trailing zeros too. double z = +1; std::cout << "Lambert W (" << z << ") = " << lambert_w0(z) << std::endl; // Lambert W (1.0000000000000000) = 0.56714329040978384 std::cout << "\nLambert W (" << z << ", my_throw_policy()) = " << lambert_w0(z, my_throw_policy()) << std::endl; // Lambert W (1.0000000000000000, my_throw_policy()) = 0.56714329040978384
An example error message:
Error in function boost::math::lambert_wm1<RealType>(<RealType>): Argument z = 1 is out of range (z <= 0) for Lambert W-1 branch! (Try Lambert W0 branch?)
Showing an error reported if a value is passed to lambert_w0
that is out of range, (and was probably meant to be passed to lambert_wm1
instead).
double z = +1.; double r = lambert_wm1(z); std::cout << "lambert_wm1(+1.) = " << r << std::endl;
The full source of these examples is at lambert_w_simple_examples.cpp
A typical example of a practical application is estimating the current flow through a diode with series resistance from a paper by Banwell and Jayakumar.
Having the Lambert W function available makes it simple to reproduce the plot in their paper (Fig 2) comparing estimates using with Lambert W function and some actual measurements. The colored curves show the effect of various series resistance on the current compared to an extrapolated line in grey with no internal (or external) resistance.
Two formulae relating the diode current and effect of series resistance can be combined, but yield an otherwise intractable equation relating the current versus voltage with a varying series resistance. This was reformulated as a generalized equation in terms of the Lambert W function:
Banwell and Jakaumar equation 5
I(V) = μ V_{T}/ R _{S} ․ W_{0}(I_{0} R_{S} / (μ V_{T}))
Using these variables
double nu = 1.0; // Assumed ideal. double vt = v_thermal(25); // v thermal, Shockley equation, expect about 25 mV at room temperature. double boltzmann_k = 1.38e-23; // joules/kelvin double temp = 273 + 25; double charge_q = 1.6e-19; // column vt = boltzmann_k * temp / charge_q; std::cout << "V thermal " << vt << std::endl; // V thermal 0.0257025 = 25 mV double rsat = 0.; double isat = 25.e-15; // 25 fA; std::cout << "Isat = " << isat << std::endl; double re = 0.3; // Estimated from slope of straight section of graph (equation 6). double v = 0.9; double icalc = iv(v, vt, 249., re, isat); std::cout << "voltage = " << v << ", current = " << icalc << ", " << log(icalc) << std::endl; // voltage = 0.9, current = 0.00108485, -6.82631
the formulas can be rendered in C++
double iv(double v, double vt, double rsat, double re, double isat, double nu = 1.) { // V thermal 0.0257025 = 25 mV // was double i = (nu * vt/r) * lambert_w((i0 * r) / (nu * vt)); equ 5. rsat = rsat + re; double i = nu * vt / rsat; // std::cout << "nu * vt / rsat = " << i << std::endl; // 0.000103223 double x = isat * rsat / (nu * vt); // std::cout << "isat * rsat / (nu * vt) = " << x << std::endl; double eterm = (v + isat * rsat) / (nu * vt); // std::cout << "(v + isat * rsat) / (nu * vt) = " << eterm << std::endl; double e = exp(eterm); // std::cout << "exp(eterm) = " << e << std::endl; double w0 = lambert_w0(x * e); // std::cout << "w0 = " << w0 << std::endl; return i * w0 - isat; } // double iv
to reproduce their Fig 2:
The plotted points for no external series resistance (derived from their published plot as the raw data are not publicly available) are used to extrapolate back to estimate the intrinsic emitter resistance as 0.3 ohm. The effect of external series resistance is visible when the colored lines start to curve away from the straight line as voltage increases.
See lambert_w_diode.cpp and lambert_w_diode_graph.cpp for details of the calculation.
The principal value of the Lambert W function is implemented
in the Wolfram
Language as ProductLog[k,
z]
,
where k
is the branch.
The symbolic algebra program Maple also computes Lambert W to an arbitrary precision.
double
and float
This implementation provides good precision and excellent speed for __fundamental
float
and double
.
All the functions usually return values within a few Unit in the last place (ULP) for the floating-point type, except for very small arguments very near zero, and for arguments very close to the singularity at the branch point.
By default, this implementation provides the best possible speed. Very slightly average higher precision and less bias might be obtained by adding a Halley step refinement, but at the cost of more than doubling the runtime.
For floating-point types with precision greater than double
and float
fundamental
(built-in) types, a double
evaluation is used as a first approximation followed by Halley refinement,
using a single step where it can be predicted that this will be sufficient,
and only using Halley
iteration when necessary. Higher precision types are always going to be very, very much slower.
The 'best' evaluation (the nearest representable)
can be achieved by static_cast
ing
from a higher precision type, typically a Boost.Multiprecision
type like cpp_bin_float_50
,
but at the cost of increasing run-time 100-fold; this has been used here to
provide some of our reference values for testing.
For example, we get a reference value using a high precision type, for example;
using boost::multiprecision::cpp_bin_float_50;
that uses Halley iteration to refine until it is as precise as possible for
this cpp_bin_float_50
type.
As a further check we can compare this with a Wolfram
Alpha computation using command N[ProductLog[10.], 50]
to get 50 decimal digits and similarly N[ProductLog[10.], 17]
to get the nearest representable for 64-bit double
precision.
using boost::multiprecision::cpp_bin_float_50; using boost::math::float_distance; cpp_bin_float_50 z("10."); // Note use a decimal digit string, not a double 10. cpp_bin_float_50 r; std::cout.precision(std::numeric_limits<cpp_bin_float_50>::digits10); r = lambert_w0(z); // Default policy. std::cout << "lambert_w0(z) cpp_bin_float_50 = " << r << std::endl; //lambert_w0(z) cpp_bin_float_50 = 1.7455280027406993830743012648753899115352881290809 // [N[productlog[10], 50]] == 1.7455280027406993830743012648753899115352881290809 std::cout.precision(std::numeric_limits<double>::max_digits10); std::cout << "lambert_w0(z) static_cast from cpp_bin_float_50 = " << static_cast<double>(r) << std::endl; // double lambert_w0(z) static_cast from cpp_bin_float_50 = 1.7455280027406994 // [N[productlog[10], 17]] == 1.7455280027406994 std::cout << "bits different from Wolfram = " << static_cast<int>(float_distance(static_cast<double>(r), 1.7455280027406994)) << std::endl; // 0
giving us the same nearest representable using 64-bit double
as 1.7455280027406994
.
However, the rational polynomial and Fukushima Schroder approximations are
so good for type float
and double
that negligible improvement is gained
from a double
Halley step.
This is shown with lambert_w_precision_example.cpp for Lambert W_{0}:
using boost::math::lambert_w_detail::lambert_w_halley_step; using boost::math::epsilon_difference; using boost::math::relative_difference; std::cout << std::showpoint << std::endl; // and show any significant trailing zeros too. std::cout.precision(std::numeric_limits<double>::max_digits10); // 17 decimal digits for double. cpp_bin_float_50 z50("1.23"); // Note: use a decimal digit string, not a double 1.23! double z = static_cast<double>(z50); cpp_bin_float_50 w50; w50 = lambert_w0(z50); std::cout.precision(std::numeric_limits<cpp_bin_float_50>::max_digits10); // 50 decimal digits. std::cout << "Reference Lambert W (" << z << ") =\n " << w50 << std::endl; std::cout.precision(std::numeric_limits<double>::max_digits10); // 17 decimal digits for double. double wr = static_cast<double>(w50); std::cout << "Reference Lambert W (" << z << ") = " << wr << std::endl; double w = lambert_w0(z); std::cout << "Rat/poly Lambert W (" << z << ") = " << lambert_w0(z) << std::endl; // Add a Halley step to the value obtained from rational polynomial approximation. double ww = lambert_w_halley_step(lambert_w0(z), z); std::cout << "Halley Step Lambert W (" << z << ") = " << lambert_w_halley_step(lambert_w0(z), z) << std::endl; std::cout << "absolute difference from Halley step = " << w - ww << std::endl; std::cout << "relative difference from Halley step = " << relative_difference(w, ww) << std::endl; std::cout << "epsilon difference from Halley step = " << epsilon_difference(w, ww) << std::endl; std::cout << "epsilon for float = " << std::numeric_limits<double>::epsilon() << std::endl; std::cout << "bits different from Halley step = " << static_cast<int>(float_distance(w, ww)) << std::endl;
with this output:
Reference Lambert W (1.2299999999999999822364316059974953532218933105468750) = 0.64520356959320237759035605255334853830173300262666480 Reference Lambert W (1.2300000000000000) = 0.64520356959320235 Rat/poly Lambert W (1.2300000000000000) = 0.64520356959320224 Halley Step Lambert W (1.2300000000000000) = 0.64520356959320235 absolute difference from Halley step = -1.1102230246251565e-16 relative difference from Halley step = 1.7207329236029286e-16 epsilon difference from Halley step = 0.77494921535422934 epsilon for float = 2.2204460492503131e-16 bits different from Halley step = 1
and then for W_{-1}:
using boost::math::lambert_w_detail::lambert_w_halley_step; using boost::math::epsilon_difference; using boost::math::relative_difference; std::cout << std::showpoint << std::endl; // and show any significant trailing zeros too. std::cout.precision(std::numeric_limits<double>::max_digits10); // 17 decimal digits for double. cpp_bin_float_50 z50("-0.123"); // Note: use a decimal digit string, not a double -1.234! double z = static_cast<double>(z50); cpp_bin_float_50 wm1_50; wm1_50 = lambert_wm1(z50); std::cout.precision(std::numeric_limits<cpp_bin_float_50>::max_digits10); // 50 decimal digits. std::cout << "Reference Lambert W-1 (" << z << ") =\n " << wm1_50 << std::endl; std::cout.precision(std::numeric_limits<double>::max_digits10); // 17 decimal digits for double. double wr = static_cast<double>(wm1_50); std::cout << "Reference Lambert W-1 (" << z << ") = " << wr << std::endl; double w = lambert_wm1(z); std::cout << "Rat/poly Lambert W-1 (" << z << ") = " << lambert_wm1(z) << std::endl; // Add a Halley step to the value obtained from rational polynomial approximation. double ww = lambert_w_halley_step(lambert_wm1(z), z); std::cout << "Halley Step Lambert W (" << z << ") = " << lambert_w_halley_step(lambert_wm1(z), z) << std::endl; std::cout << "absolute difference from Halley step = " << w - ww << std::endl; std::cout << "relative difference from Halley step = " << relative_difference(w, ww) << std::endl; std::cout << "epsilon difference from Halley step = " << epsilon_difference(w, ww) << std::endl; std::cout << "epsilon for float = " << std::numeric_limits<double>::epsilon() << std::endl; std::cout << "bits different from Halley step = " << static_cast<int>(float_distance(w, ww)) << std::endl;
with this output:
Reference Lambert W-1 (-0.12299999999999999822364316059974953532218933105468750) = -3.2849102557740360179084675531714935199110302996513384 Reference Lambert W-1 (-0.12300000000000000) = -3.2849102557740362 Rat/poly Lambert W-1 (-0.12300000000000000) = -3.2849102557740357 Halley Step Lambert W (-0.12300000000000000) = -3.2849102557740362 absolute difference from Halley step = 4.4408920985006262e-16 relative difference from Halley step = 1.3519066740696092e-16 epsilon difference from Halley step = 0.60884463935795785 epsilon for float = 2.2204460492503131e-16 bits different from Halley step = -1
double
evaluations
The distribution of differences from 'best' are shown in these graphs comparing
double
precision evaluations with
reference 'best' z50 evaluations using cpp_bin_float_50
type reduced to double
with static_cast<double(z50)
:
As noted in the implementation section, the distribution of these differences
is somewhat biased for Lambert W_{-1} and this might be reduced
using a double
Halley step at
small runtime cost. But if you are seriously concerned to get really precise
computations, the only way is using a higher precision type and then reduce
to the desired type. Fortunately, Boost.Multiprecision
makes this very easy to program, if much slower.
The domain of W_{0} is [-e^{-1}, ∞). Numerically,
lambert_w0(-1/e)
is exactly -1.
lambert_w0(z)
for
z <
-1/e
throws
a domain_error
, or returns
NaN
according to the policy.
lambert_w0(std::numeric_limits<T>::infinity())
throws an overflow_error
.
(An infinite argument probably indicates that something has already gone wrong,
but if it is desired to return infinity, this case should be handled before
calling lambert_w0
).
The domain of W_{-1} is [-e^{-1}, 0). Numerically,
lambert_wm1(-1/e)
is exactly -1.
lambert_wm1(0)
returns
-∞ (or the nearest equivalent if std::has_infinity
== false
).
lambert_wm1(-std::numeric_limits<T>::min())
returns the maximum (most negative) possible value of Lambert W
for the type T. double
:
lambert_wm1(-2.2250738585072014e-308) = -714.96865723796634 float
: lambert_wm1(-1.17549435e-38)
= -91.8567734
z <
-std::numeric_limits<T>::min()
, means that z is zero or denormalized
(if boost::math::detail::has_denorm_now<T>() == true
),
for example: r =
lambert_wm1(-std::numeric_limits<double>::denorm_min());
and an overflow_error exception is thrown, and will give a message like:
Error in function boost::math::lambert_wm1<RealType>(<RealType>): Argument z = -4.9406564584124654e-324 is too small (z < -std::numeric_limits<T>::min so denormalized) for Lambert W-1 branch!
Denormalized values are not supported for Lambert W_{-1} (because not all floating-point types denormalize), and anyway it only covers a tiny fraction of the range of possible z arguments values.
The lambert_w.hpp
code has been shown to work on most C++98
compilers. (Apart from requiring C++11 extensions for using of std::numeric_limits<>::max_digits10
in some diagnostics. Many old pre-c++11 compilers provide this extension but
may require enabling to use, for example using b2/bjam the lambert_w examples
use this command:
[ run lambert_w_basic_example.cpp : : : [ requires cxx11_numeric_limits ] ]
See jamfile.v2.)
For details of which compilers are expected to work see lambert_w tests and
examples in:
Boost
Test Summary report for master branch (used for latest release)
Boost
Test Summary report for latest developer branch.
As expected, debug mode is very much slower than release.
Several macros are provided to output diagnostic information (potentially much output). These can be statements, for example:
#define BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS
placed before the lambert_w
include statement
#include <boost/math/special_functions/lambert_w.hpp>
,
or defined on the project compile command-line: /DBOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS
,
or defined in a jamfile.v2: <define>BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS
// #define-able macros BOOST_MATH_INSTRUMENT_LAMBERT_W_HALLEY // Halley refinement diagnostics. BOOST_MATH_INSTRUMENT_LAMBERT_W_PRECISION // Precision. BOOST_MATH_INSTRUMENT_LAMBERT_WM1 // W1 branch diagnostics. BOOST_MATH_INSTRUMENT_LAMBERT_WM1_HALLEY // Halley refinement diagnostics only for W-1 branch. BOOST_MATH_INSTRUMENT_LAMBERT_WM1_TINY // K > 64, z > -1.0264389699511303e-26 BOOST_MATH_INSTRUMENT_LAMBERT_WM1_LOOKUP // Show results from W-1 lookup table. BOOST_MATH_INSTRUMENT_LAMBERT_W_SCHROEDER // Schroeder refinement diagnostics. BOOST_MATH_INSTRUMENT_LAMBERT_W_TERMS // Number of terms used for near-singularity series. BOOST_MATH_INSTRUMENT_LAMBERT_W_SINGULARITY_SERIES // Show evaluation of series near branch singularity. BOOST_MATH_INSTRUMENT_LAMBERT_W_SMALL_Z_SERIES BOOST_MATH_INSTRUMENT_LAMBERT_W_SMALL_Z_SERIES_ITERATIONS // Show evaluation of series for small z.
There are many previous implementations, each with increasing accuracy and/or speed. See references below.
For most of the range of z arguments, some initial approximation followed by a single refinement, often using Halley or similar method, gives a useful precision. For speed, several implementations avoid evaluation of a iteration test using the exponential function, estimating that a single refinement step will suffice, but these rarely get to the best result possible. To get a better precision, additional refinements, probably iterative, are needed for example, using Halley or Schröder methods.
For C++, the most precise results possible, closest to the nearest representable
for the C++ type being used, it is usually necessary to use a higher precision
type for intermediate computation, finally static-casting back to the smaller
desired result type. This strategy is used by Maple
and Wolfram Alpha, for example,
using arbitrary precision arithmetic, and some of their high-precision values
are used for testing this library. This method is also used to provide some
Boost.Test
values using Boost.Multiprecision,
typically, a 50 decimal digit type like cpp_bin_float_50
static_cast
to a float
, double
or long double
type.
For z argument values near the singularity and near zero, other approximations may be used, possibly followed by refinement or increasing number of series terms until a desired precision is achieved. At extreme arguments near to zero or the singularity at the branch point, even this fails and the only method to achieve a really close result is to cast from a higher precision type.
In practical applications, the increased computation required (often towards a thousand-fold slower and requiring much additional code for Boost.Multiprecision) is not justified and the algorithms here do not implement this. But because the Boost.Lambert_W algorithms has been tested using Boost.Multiprecision, users who require this can always easily achieve the nearest representation for fundamental (built-in) types - if the application justifies the very large extra computation cost.
One compact real-only implementation was based on an algorithm by Thomas
Luu, Thesis, University College London (2015), (see routine 11 on page
98 for his Lambert W algorithm) and his Halley refinement is used iteratively
when required. A first implementation was based on Thomas Luu's code posted
at Boost Trac #11027.
It has been implemented from Luu's algorithm but templated on RealType
parameter and result and handles
both fundamental
(built-in) types (float, double,
long double
),
Boost.Multiprecision,
and also has been tested successfully with a proposed fixed_point type.
A first approximation was computed using the method of Barry et al (see references 5 & 6 below). This was extended to the widely used TOMS443 FORTRAN and C++ versions by John Burkardt using Schroeder refinement(s). (For users only requiring an accuracy of relative accuracy of 0.02%, Barry's function alone might suffice, but a better rational function approximation method has since been developed for this implementation).
We also considered using Newton-Raphson iteration method.
f(w) = w e^w -z = 0 // Luu equation 6.37 f'(w) = e^w (1 + w), Wolfram alpha (d)/(dw)(f(w) = w exp(w) - z) = e^w (w + 1) if (f(w) / f'(w) -1 < tolerance w1 = w0 - (expw0 * (w0 + 1)); // Refine new Newton/Raphson estimate.
but concluded that since the Newton-Raphson method takes typically 6 iterations to converge within tolerance, whereas Halley usually takes only 1 to 3 iterations to achieve an result within 1 Unit in the last place (ULP), so the Newton-Raphson method is unlikely to be quicker than the additional cost of computing the 2nd derivative for Halley's method.
Halley refinement uses the simplified formulae obtained from Wolfram Alpha
[2(z exp(z)-w) d/dx (z exp(z)-w)] / [2 (d/dx (z exp(z)-w))^2 - (z exp(z)-w) d^2/dx^2 (z exp(z)-w)]
The most compact algorithm can probably be implemented using the log approximation of Corless et al. followed by Halley iteration (but is also slowest and least precise near zero and near the branch singularity).
More recently, the Tosio Fukushima has developed an even faster algorithm, avoiding any transcendental function calls as these are necessarily expensive. The current implementation of Lambert W_{-1} is based on his algorithm starting with a translation from Fukushima's FORTRAN into C++ by Darko Veberic.
Many applications of the Lambert W function make many repeated evaluations for Monte Carlo methods; for these applications speed is very important. Luu, and Chapeau-Blondeau and Monir provide typical usage examples.
Fukushima improves the important observation that much of the execution time
of all previous iterative algorithms was spent evaluating transcendental functions,
usually exp
. He has put a lot
of work into avoiding any slow transcendental functions by using lookup tables
and bisection, finishing with a single Schroeder refinement, without any check
on the final precision of the result (necessarily evaluating an expensive exponential).
Theoretical and practical tests confirm that Fukushima's algorithm gives Lambert W estimates with a known small error bound (several Unit in the last place (ULP)) over nearly all the range of z argument.
A mean difference was computed to express the typical error and is often about 0.5 epsilon, the theoretical minimum. Using the Boost.Math float_distance, we can also express this as the number of bits that are different from the nearest representable or 'exact' or 'best' value. The number and distribution of these few bits differences was studied by binning, including their sign. Bins for (signed) 0, 1, 2 and 3 and 4 bits proved suitable.
However, though these give results within a few machine epsilon of the nearest representable result, they do not get as close as is very often possible with further refinement, nearly always to within one or two machine epsilon.
More significantly, the evaluations of the sum of all signed differences using the Fukshima algorithm show a slight bias, being more likely to be a bit or few below the nearest representation than above; bias might have unwanted effects on some statistical computations.
Fukushima's method also does not cover the full range of z arguments of 'float' precision and above.
For this implementation of Lambert W_{0}, John Maddock used the Boost.Math Remez algorithm method program to devise a rational function for several ranges of argument for the W_{0} branch of Lambert W function. These minimax rational approximations are combined for an algorithm that is both smaller and faster.
Sadly it has not proved practical to use the same Remez algorithm method for Lambert W_{-1} branch and so the Fukushima algorithm is retained for this branch.
An advantage of both minimax rational Remez algorithm approximations is that the distribution from the reference values is reasonably random and insignificantly biased.
For example, table below a test of Lambert W_{0} 10000 values of argument covering the main range of possible values, 10000 comparisons from z = 0.0501 to 703, in 0.001 step factor 1.05 when module 7 == 0
Table 8.73. Fukushima Lambert W_{0} and typical improvement from a single Halley step.
Method |
Exact |
One_bit |
Two_bits |
Few_bits |
inexact |
bias |
---|---|---|---|---|---|---|
Schroeder W_{0} |
8804 |
1154 |
37 |
5 |
1243 |
-1193 |
after Halley step |
9710 |
288 |
2 |
0 |
292 |
22 |
Lambert W_{0} values computed using the Fukushima method with
Schroeder refinement gave about 1/6 lambert_w0
values that are one bit different from the 'best', and < 1% that are a few
bits 'wrong'. If a Halley refinement step is added, only 1 in 30 are even one
bit different, and only 2 two-bits 'wrong'.
Table 8.74. Rational polynomial Lambert W_{0} and typical improvement from a single Halley step.
Method |
Exact |
One_bit |
Two_bits |
Few_bits |
inexact |
bias |
---|---|---|---|---|---|---|
rational/polynomial |
7135 |
2863 |
2 |
0 |
2867 |
-59 |
after Halley step |
9724 |
273 |
3 |
0 |
279 |
5 |
With the rational polynomial approximation method, there are a third one-bit from the best and none more than two-bits. Adding a Halley step (or iteration) reduces the number that are one-bit different from about a third down to one in 30; this is unavoidable 'computational noise'. An extra Halley step would double the runtime for a tiny gain and so is not chosen for this implementation, but remains a option, as detailed above.
For the Lambert W_{-1} branch, the Fukushima algorithm is used.
Table 8.75. Lambert W_{-1} using Fukushima algorithm.
Method |
Exact |
One_bit |
Two_bits |
Few_bits |
inexact |
bias |
---|---|---|---|---|---|---|
Fukushima W_{-1} |
7167 |
2704 |
129 |
0 |
2962 |
-160 |
plus Halley step |
7379 |
2529 |
92 |
0 |
2713 |
549 |
For speed during the bisection, Fukushima's algorithm computes lookup tables
of powers of e and z for integral Lambert W. There are 64 elements in these
tables. The FORTRAN version (and the C++ translation by Veberic) computed these
(once) as static
data. This is
slower, may cause trouble with multithreading, and is slightly inaccurate because
of rounding errors from repeated(64) multiplications.
In this implementation the array values have been computed using Boost.Multiprecision
50 decimal digit and output as C++ arrays 37 decimal digit long
double
literals using max_digits10
precision
std::cout.precision(std::numeric_limits<cpp_bin_float_quad>::max_digits10);
The arrays are as const
and constexpr
and static
as possible (for the compiler version), using static constexpr macro. (See
lambert_w_lookup_table_generator.cpp
The precision was chosen to ensure that if used as long
double
arrays, then the values output
to lambert_w_lookup_table.ipp
will be the nearest representable value for the type chose by a typedef
in lambert_w.hpp.
typedef double lookup_t; // Type for lookup table (`double` or `float`, or even `long double`?)
This is to allow for future use at higher precision, up to platforms that use
128-bit (hardware or software) for their long
double
type.
The accuracy of the tables was confirmed using Wolfram
Alpha and agrees at the 37th decimal place, so ensuring that the value
is exactly read into even 128-bit long
double
to the nearest representation.
For types more precise than double
,
Fukushima reported that it was best to use the double
estimate as a starting point, followed by refinement using Halley
iterations or other methods; our experience confirms this.
Using Boost.Multiprecision it is simple to compute very high precision values of Lambert W at least to thousands of decimal digits over most of the range of z arguments.
For this reason, the lookup tables and bisection are only carried out at low
precision, usually double
, chosen
by the typedef double
lookup_t
. Unlike the FORTRAN version,
the lookup tables of Lambert_W of integral values are precomputed as C++ static
arrays of floating-point literals. The default is a typedef
setting the type to double
. To
allow users to vary the precision from float
to long double
these are computed to 128-bit precision to ensure that even platforms with
long double
do not lose precision.
The FORTRAN version and translation only permits the z argument to be the largest
items in these lookup arrays, wm0s[64]
= 3.99049
,
producing an error message and returning NaN
.
So 64 is the largest possible value ever returned from the lambert_w0
function. This is far from the std::numeric_limits<>::max()
for even float
s.
Therefore this implementation uses an approximation or 'guess' and Halley's
method to refine the result. Logarithmic approximation is discussed at length
by R.M.Corless et al. (page 349). Here we use the first two terms of equation
4.19:
T lz = log(z); T llz = log(lz); guess = lz - llz + (llz / lz);
This gives a useful precision suitable for Halley refinement.
Similarly, for Lambert W_{-1} branch, tiny values very near zero, W > 64 cannot be computed using the lookup table. For this region, an approximation followed by a few (usually 3) Halley refinements. See wm1_near_zero.
For the less well-behaved regions for Lambert W_{0} z arguments near zero, and near the branch singularity at -1/e, some series functions are used.
When argument z is small and near zero, there is an efficient
and accurate series evaluation method available (implemented in lambert_w0_small_z
). There is no equivalent
for the W_{-1} branch as this only covers argument z < -1/e
.
The cutoff used abs(z) <
0.05
is as found by trial and error by
Fukushima.
Coefficients of the inverted series expansion of the Lambert W function around
z =
0
are computed following Fukushima using
17 terms of a Taylor series computed using Wolfram
Mathematica with
InverseSeries[Series[z Exp[z],{z,0,17}]]
See Tosio Fukushima, Journal of Computational and Applied Mathematics 244 (2013), page 86.
To provide higher precision constants (34 decimal digits) for types larger
than long double
,
InverseSeries[Series[z Exp[z],{z,0,34}]]
were also computed, but for current hardware it was found that evaluating a
double
precision and then refining
with Halley's method was quicker and more accurate.
Decimal values of specifications for built-in floating-point types below are
21 digits precision == std::numeric_limits<T>::max_digits10
for long
double
.
Specializations for lambert_w0_small_z
are provided for float
, double
, long
double
, float128
and for Boost.Multiprecision
types.
The tag_type
selection is based
on the value std::numeric_limits<T>::max_digits10
(and not on the floating-point type T). This
distinguishes between long double
types that commonly vary between 64 and 80-bits, and also compilers that have
a float
type using 64 bits and/or
long double
using 128-bits.
As noted in the implementation section above, it is only possible to ensure the nearest representable value by casting from a higher precision type, computed at very, very much greater cost.
For multiprecision types, first several terms of the series are tabulated and
evaluated as a polynomial: (this will save us a bunch of expensive calls to
pow
). Then our series functor
is initialized "as if" it had already reached term 18, enough evaluation
of built-in 64-bit double and float (and 80-bit long
double
) types. Finally the functor is
called repeatedly to compute as many additional series terms as necessary to
achieve the desired precision, set from get_epsilon
(or terminated by evaluation_error
on reaching the set iteration limit max_series_iterations
).
A little more than one decimal digit of precision is gained by each additional series term. This allows computation of Lambert W near zero to at least 1000 decimal digit precision, given sufficient compute time.
Variants of Function lambert_w_singularity_series
are used to handle z arguments which are near to the singularity
at z =
-exp(-1)
= -3.6787944
where the branches W_{0} and
W_{-1} join.
T. Fukushima / Journal of Computational and Applied Mathematics 244 (2013) 77-89 describes using Wolfram Mathematica
InverseSeries\[Series\[sqrt\[2(p Exp\[1 + p\] + 1)\], {p,-1, 20}\]\]
to provide his Table 3, page 85.
This implementation used Wolfram Mathematica to obtain 40 series terms at 50 decimal digit precision
N\[InverseSeries\[Series\[Sqrt\[2(p Exp\[1 + p\] + 1)\], { p,-1,40 }\]\], 50\] -1+p-p^2/3+(11 p^3)/72-(43 p^4)/540+(769 p^5)/17280-(221 p^6)/8505+(680863 p^7)/43545600 ...
These constants are computed at compile time for the full precision for any
RealType T
using the original rationals from Fukushima Table 3.
Longer decimal digits strings are rationals pre-evaluated using Wolfram
Mathematica. Some integer constants overflow, so largest size available
is used, suffixed by uLL
.
Above the 14th term, the rationals exceed the range of unsigned
long long
and are replaced by pre-computed decimal values at least 21 digits precision
== max_digits10
for long double
.
A macro BOOST_MATH_TEST_VALUE
(defined in test_value.hpp)
taking a decimal floating-point literal was used to allow testing with both
built-in floating-point types like double
which have constructors taking literal decimal values like 3.14
,
and also multiprecision and other User-defined
Types that only provide full-precision construction from decimal digit strings
like "3.14"
. (Construction
of multiprecision types from built-in floating-point types only provides the
precision of the built-in type, like double
,
only 17 decimal digits).
Tip | |
---|---|
Be exceeding careful not to silently lose precision by constructing multiprecision
types from literal decimal types, usually |
Fukushima's implementation used 20 series terms; it was confirmed that using more terms does not usefully increase accuracy.
The lookup tables of Fukushima have only 64 elements, so that the z argument nearest zero is -1.0264389699511303e-26, that corresponds to a maximum Lambert W_{-1} value of 64.0. Fukushima's implementation did not cater for z argument values that are smaller (nearer to zero), but this implementation adds code to accept smaller (but not denormalised) values of z. A crude approximation for these very small values is to take the exponent and multiply by ln[10] ~= 2.3. We also tried the approximation first proposed by Corless et al. using ln(-z), (equation 4.19 page 349) and then tried improving by a 2nd term -ln(ln(-z)), and finally the ratio term -ln(ln(-z))/ln(-z).
For a z very close to z = -1.0264389699511303e-26 when W = 64, when effect of ln(ln(-z) term, and ratio L1/L2 is greatest, the possible 'guesses' are
z = -1.e-26, w = -64.02, guess = -64.0277, ln(-z) = -59.8672, ln(-ln(-z) = 4.0921, llz/lz = -0.0684
whereas at the minimum (unnormalized) z
z = -2.2250e-308, w = -714.9, guess = -714.9687, ln(-z) = -708.3964, ln(-ln(-z) = 6.5630, llz/lz = -0.0092
Although the addition of the 3rd ratio term did not reduce the number of Halley iterations needed, it might allow return of a better low precision estimate without any Halley iterations. For the worst case near w = 64, the error in the 'guess' is 0.008, ratio 0.0001 or 1 in 10,000 digits 10 ~= 4. Two log evaluations are still needed, but is probably over an order of magnitude faster.
Halley's method was then used to refine the estimate of Lambert W_{-1} from this guess. Experiments showed that although all approximations reached with Unit in the last place (ULP) of the closest representable value, the computational cost of the log functions was easily paid by far fewer iterations (typically from 8 down to 4 iterations for double or float).
After obtaining a double approximation, for double
,
long double
and quad
128-bit precision,
a single iteration should suffice because Halley iteration should triple the
precision with each step (as long as the function is well behaved - and it
is), and since we have at least half of the bits correct already, one Halley
step is ample to get to 128-bit precision.
The derivatives are computed using the formulae in Wikipedia.
Initial testing of the algorithm was done using a small number of spot tests.
After it was established that the underlying algorithm (including unlimited
Halley refinements with a tight terminating criterion) was correct, some tables
of Lambert W values were computed using a 100 decimal digit precision Boost.Multiprecision
cpp_dec_float_100
type and
saved as a C++ program that will initialise arrays of values of z arguments
and lambert_W0 (lambert_w_mp_high_values.ipp
and
lambert_w_mp_low_values.ipp
).
(A few of these pairs were checked against values computed by Wolfram Alpha to try to guard against mistakes; all those tested agreed to the penultimate decimal place, so they can be considered reliable to at least 98 decimal digits precision).
A macro BOOST_MATH_TEST_VALUE
was used to allow tests with any real type, both fundamental
(built-in) types and Boost.Multiprecision.
(This is necessary because fundamental
(built-in) types have a constructor from floating-point literals like
3.1459F, 3.1459 or 3.1459L whereas Boost.Multiprecision
types may lose precision unless constructed from decimal digits strings like
"3.1459").
The 100-decimal digits precision pairs were then used to assess the precision
of less-precise types, including Boost.Multiprecision
cpp_bin_float_quad
and cpp_bin_float_50
. static_cast
ing
from the high precision types should give the closest representable value of
the less-precise type; this is then be used to assess the precision of the
Lambert W algorithm.
Tests using confirm that over nearly all the range of z arguments, nearly all estimates are the nearest representable value, a minority are within 1 Unit in the last place (ULP) and only a very few 2 ULP.
For the range of z arguments over the range -0.35 to 0.5, a different algorithm
is used, but the same technique of evaluating reference values using a Boost.Multiprecision
cpp_dec_float_100
was used.
For extremely small z arguments, near zero, and those extremely near the singularity
at the branch point, precision can be much lower, as might be expected.
See source at: lambert_w_simple_examples.cpp test_lambert_w.cpp contains routine tests using Boost.Test. lambert_w_errors_graph.cpp generating error graphs.
A further method of testing over a wide range of argument z values was devised by Nick Thompson (cunningly also to test the recently written quadrature routines including Boost.Multiprecision !). These are definite integral formulas involving the W function that are exactly known constants, for example, LambertW0(1/(z²) == √(2π), see Definite Integrals. Some care was needed to avoid overflow and underflow as the integral function must evaluate to a finite result over the entire range.
The Lambert W has also been discussed in a Boost thread.
This also gives link to a prototype version by which also gives complex results
(x < -exp(-1)
, about -0.367879). Balazs
Cziraki 2016 Physicist, PhD student at Eotvos Lorand University, ELTE
TTK Institute of Physics, Budapest. has also produced a prototype C++ library
that can compute the Lambert W function for floating point and
complex number types. This is not implemented here but might be
completed in the future.
Initial guesses based on: