...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
A second example compares four generalized nth-root finding algorithms for
various n-th roots (5, 7 and 13) of a single value 28.0, for four floating-point
types, float
, double
, long
double
and a Boost.Multiprecision
type cpp_bin_float_50
. In
each case the target accuracy was set using our "recommended" accuracy
limits (or at least limits that make a good starting point - which is likely
to give close to full accuracy without resorting to unnecessary iterations).
Function |
Precision Requested |
---|---|
TOMS748 |
numeric_limits<T>::digits - 2 |
Newton |
floor(numeric_limits<T>::digits * 0.6) |
Halley |
floor(numeric_limits<T>::digits * 0.4) |
Schröder |
floor(numeric_limits<T>::digits * 0.4) |
Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code root_n_finding_algorithms.cpp.
The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
To pick out the 'best' and 'worst' algorithms are highlighted in blue and red. More than one result can be 'best' when normalized times are indistinguishable within the uncertainty.
Fraction of full accuracy 1
Table 10.3. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
7 |
457 |
2.00 |
0 |
11 |
860 |
3.54 |
1 |
11 |
806 |
3.02 |
1 |
12 |
226875 |
8.11 |
0 |
||||
Newton |
3 |
228 |
1.00 |
0 |
4 |
243 |
1.00 |
-1 |
4 |
298 |
1.12 |
-1 |
6 |
27968 |
1.00 |
0 |
||||
Halley |
2 |
250 |
1.10 |
0 |
3 |
268 |
1.10 |
0 |
3 |
267 |
1.00 |
0 |
4 |
52812 |
1.89 |
0 |
||||
Schröder |
2 |
256 |
1.12 |
0 |
3 |
271 |
1.12 |
-1 |
3 |
270 |
1.01 |
-1 |
4 |
61406 |
2.20 |
0 |
Table 10.4. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
825 |
3.06 |
1 |
15 |
1145 |
4.06 |
2 |
15 |
1159 |
4.17 |
2 |
14 |
295781 |
8.12 |
0 |
||||
Newton |
5 |
270 |
1.00 |
0 |
6 |
282 |
1.00 |
0 |
6 |
278 |
1.00 |
0 |
8 |
36406 |
1.00 |
0 |
||||
Halley |
4 |
303 |
1.12 |
0 |
5 |
329 |
1.17 |
0 |
5 |
335 |
1.21 |
0 |
6 |
78281 |
2.15 |
0 |
||||
Schröder |
5 |
340 |
1.26 |
0 |
6 |
432 |
1.53 |
0 |
6 |
367 |
1.32 |
0 |
7 |
85156 |
2.34 |
0 |
Table 10.5. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X86_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
714 |
3.16 |
-2 |
14 |
909 |
4.19 |
2 |
14 |
793 |
3.69 |
2 |
17 |
211718 |
9.28 |
2 |
||||
Newton |
6 |
226 |
1.00 |
0 |
7 |
217 |
1.00 |
0 |
7 |
215 |
1.00 |
0 |
9 |
22812 |
1.00 |
0 |
||||
Halley |
4 |
262 |
1.16 |
-1 |
5 |
260 |
1.20 |
0 |
5 |
260 |
1.21 |
0 |
6 |
40781 |
1.79 |
0 |
||||
Schröder |
6 |
332 |
1.47 |
0 |
7 |
314 |
1.45 |
0 |
7 |
310 |
1.44 |
0 |
8 |
67187 |
2.95 |
0 |
Fraction of full accuracy 1
Table 10.6. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
7 |
239 |
1.50 |
0 |
11 |
451 |
2.53 |
1 |
11 |
439 |
2.49 |
1 |
12 |
90312 |
7.51 |
0 |
||||
Newton |
3 |
159 |
1.00 |
0 |
4 |
178 |
1.00 |
-1 |
4 |
176 |
1.00 |
-1 |
6 |
12031 |
1.00 |
0 |
||||
Halley |
2 |
168 |
1.06 |
0 |
3 |
203 |
1.14 |
0 |
3 |
198 |
1.13 |
0 |
4 |
20937 |
1.74 |
0 |
||||
Schröder |
2 |
173 |
1.09 |
0 |
3 |
206 |
1.16 |
-1 |
3 |
203 |
1.15 |
-1 |
4 |
26250 |
2.18 |
0 |
Table 10.7. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
385 |
2.19 |
1 |
15 |
635 |
3.13 |
2 |
15 |
621 |
3.17 |
2 |
14 |
114843 |
6.81 |
0 |
||||
Newton |
5 |
176 |
1.00 |
0 |
6 |
203 |
1.00 |
0 |
6 |
196 |
1.00 |
0 |
8 |
16875 |
1.00 |
0 |
||||
Halley |
4 |
209 |
1.19 |
0 |
5 |
254 |
1.25 |
0 |
5 |
246 |
1.26 |
0 |
6 |
32343 |
1.92 |
0 |
||||
Schröder |
5 |
223 |
1.27 |
0 |
6 |
273 |
1.34 |
0 |
6 |
275 |
1.40 |
0 |
7 |
45156 |
2.68 |
0 |
Table 10.8. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_AVX
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
467 |
2.42 |
-2 |
14 |
648 |
3.06 |
2 |
14 |
640 |
2.99 |
2 |
17 |
170000 |
8.85 |
2 |
||||
Newton |
6 |
193 |
1.00 |
0 |
7 |
212 |
1.00 |
0 |
7 |
214 |
1.00 |
0 |
9 |
19218 |
1.00 |
0 |
||||
Halley |
4 |
209 |
1.08 |
-1 |
5 |
256 |
1.21 |
0 |
5 |
250 |
1.17 |
0 |
6 |
32656 |
1.70 |
0 |
||||
Schröder |
6 |
248 |
1.28 |
0 |
7 |
306 |
1.44 |
0 |
7 |
298 |
1.39 |
0 |
8 |
53437 |
2.78 |
0 |
Fraction of full accuracy 1
Table 10.9. 5th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
7 |
206 |
2.24 |
0 |
11 |
460 |
4.04 |
1 |
9 |
554 |
4.40 |
0 |
12 |
57656 |
8.39 |
0 |
||||
Newton |
3 |
92 |
1.00 |
0 |
4 |
114 |
1.00 |
-1 |
5 |
126 |
1.00 |
0 |
6 |
6875 |
1.00 |
0 |
||||
Halley |
2 |
106 |
1.15 |
0 |
3 |
134 |
1.18 |
0 |
3 |
178 |
1.41 |
0 |
4 |
12500 |
1.82 |
0 |
||||
Schröder |
2 |
126 |
1.37 |
0 |
3 |
143 |
1.25 |
-1 |
3 |
198 |
1.57 |
0 |
4 |
15312 |
2.23 |
0 |
Table 10.10. 7th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
345 |
2.09 |
1 |
15 |
615 |
3.14 |
2 |
13 |
875 |
3.98 |
0 |
14 |
70937 |
7.32 |
0 |
||||
Newton |
5 |
165 |
1.00 |
0 |
6 |
196 |
1.00 |
0 |
7 |
220 |
1.00 |
0 |
8 |
9687 |
1.00 |
0 |
||||
Halley |
4 |
193 |
1.17 |
0 |
5 |
239 |
1.22 |
0 |
5 |
298 |
1.35 |
0 |
6 |
19062 |
1.97 |
0 |
||||
Schröder |
5 |
217 |
1.32 |
0 |
6 |
270 |
1.38 |
0 |
6 |
367 |
1.67 |
0 |
7 |
27343 |
2.82 |
0 |
Table 10.11. 11th root(28) for float, double, long double and cpp_bin_float_50 types, using _X64_SSE2
float |
double |
long d |
cpp50 |
|||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algo |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
Its |
Times |
Norm |
Dis |
||||
TOMS748 |
12 |
412 |
2.15 |
-2 |
14 |
646 |
2.96 |
2 |
14 |
1054 |
4.22 |
1 |
17 |
107187 |
9.53 |
2 |
||||
Newton |
6 |
192 |
1.00 |
0 |
7 |
218 |
1.00 |
0 |
7 |
250 |
1.00 |
0 |
9 |
11250 |
1.00 |
0 |
||||
Halley |
4 |
200 |
1.04 |
-1 |
5 |
243 |
1.11 |
0 |
5 |
345 |
1.38 |
0 |
6 |
19687 |
1.75 |
0 |
||||
Schröder |
6 |
254 |
1.32 |
0 |
7 |
321 |
1.47 |
0 |
7 |
471 |
1.88 |
0 |
8 |
33281 |
2.96 |
0 |
Some tentative conclusions can be drawn from this limited exercise.
double
.
double
allows convergence to the final
exact result in one or two iterations. So in this contrived example,
crudely dividing the exponent by N for a 'guess', it would be far better
to use a pow<double>
or , if more precise pow<long double>
,
function to estimate a 'guess'. The limitation of this tactic is that
the range of possible (exponent) values may be less than the multiprecision
type.
Clearly, your mileage will vary, but in summary, Newton-Raphson iteration seems the first choice of algorithm, and effort to find a good 'guess' the first speed-up target, especially for Boost.Multiprecision. And of course, compiler optimisation is crucial for speed.