E.V.E
v2023.02.15
 
Loading...
Searching...
No Matches
Solving quadratic equations

This example demonstrates solving multiple quadratic equations in parallel using EVE.

For each quadratic equation \(ax^2 + bx + c = 0\), we compute the discriminant \(b^2 - 4ac\) and then calculate the solution using the maximal absolute value root by the usual quadratic formula: \( x_1 = \frac{-b -\mathbf{sign}(b)\sqrt{b^2 - 4ac}}{2a}\) and the second root \( x_2\), using the fact that \(\frac ca\) is equal to \(x_1*x_2\).

This is the stable way for computing the roots that respects the relative errors and is ok even if \(b^2 - 4ac \approx b^2\).

The computation must returns two Nans if the roots are not real or all coefficients are zero and one Nan if the first parameter is zero. (For complex roots use kyosu library which is fit to complex numbers scalar or SIMD computations and more).

We'll solve 11 different quadratic equations simultaneously. (11 just to show that the number of equations need not be a multiple of the SIMD current harware vector size).

But first let us see the scalar case:

We suppose that the quadratic coefficients are given in variables a, b and c having a floating type float or double;

The scalar code (without EVE) can look like this:

#include <iostream>
#include <array>
#include <cmath>
int main()
{
auto print = [](std::string name, auto v){ //utility to print the results
std::cout << name << " = {";
for(std::size_t i=0; i < v.size(); ++i) std::cout << v[i] << ((i != v.size()-1) ? ", ":"}");
std::cout << std::endl;
};
auto quadratic = [](auto const& a, auto const& b, auto const& c){
if (a == 0) return std::pair(-c/b, a/a); // second root is nan
auto delta = b*b-4*a*c; // compute delta
auto rmax = (-b-(b > 0 ? 1 : 0)*std::sqrt(delta))/(2*a); // compute the root with maximal absolute value
auto rmin = (rmax == 0 ? 0 : c/(rmax*a)); // compute the other root
if (rmin > rmax) std::swap(rmin, rmax); // order by increasing values
return std::pair(rmin, rmax); // return properly typed values
};
using a_t = std::array<double, 11>;
a_t a{5.0, 12.0, 6.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0e-20, 0.0, 1.0};
a_t b{3.0, 1.0, 4.0, -2.0, 2.0, 1.0, 1.0, 1.0, 1.0e20, 1.0, 2.0};
a_t c{-1.0, -5.0, -6.0, -6.0, 5.0, 30.0, 35.0, -40.0, -1.0, -1.0, 1.0};
a_t rmin, rmax;
for(std::size_t i=0; i < 11; ++i){
auto p = quadratic(a[i], b[i], c[i]);
rmin[i] = p.first;
rmax[i] = p.second;
}
print("a ", a);
print("b ", b);
print("c ", c);
print("rmin", rmin);
print("rmax", rmax);
}

What have we to do for allowing treatment of multiple quadratic equations in an SIMD fashion?

Let us give the complete program and comment the steps later.

#include <eve/module/core.hpp>
#include <eve/module/algo.hpp>
#include <iostream>
#include <array>
int main()
{
auto print = [](std::string name, auto v){ //utility to print the results
std::cout << name << " = {";
for(std::size_t i=0; i < v.size(); ++i) std::cout << v[i] << ((i != v.size()-1) ? ", ":"}");
std::cout << std::endl;
};
auto quadratics = [](auto & result, auto const& aa, auto const& bb, auto const& cc){
// result will be a zip of two arrays
auto quad_it = [](auto e){ //this is the EVE version of the scalar resolution algorithm above
auto [aaa, bbb, ccc] = e; // get the coefficients
auto delta = eve::fnma(4*aaa, ccc, eve::sqr(bbb)); // compute delta (fnma is the fma version of -a*b+c
auto rmax = eve::fam(-bbb, -eve::sign(bbb), eve::sqrt(delta))/(2*aaa); // compute the root with maximal abslute value
auto rmin = eve::if_else(eve::is_eqz(rmax), eve::zero, ccc/(rmax*aaa));// compute the other root (is_eqz(rmax) test for zeroes. Could be `rmax == 0`)
eve::swap_if(rmin > rmax, rmin, rmax); // order by increasing values
auto iseqza = eve::is_eqz(aaa);
if (eve::any(iseqza)) // dont do if no element is zero
{
rmin = eve::if_else(eve::is_eqz(aaa), -ccc/bbb, rmin); // first degree aaa is zero first root is -ccc/bbb
rmax = eve::if_else(eve::is_eqz(aaa), eve::nan, rmax); // first degree aaa is zero second root is nan
}
return eve::zip(rmin, rmax); // return properly typed values (always use zip to return multiple values)
};
// the algo transform_to takes care of using simd chunks to use quad_it
// the input vector size need not be a multiple of (the automagically chosen) best SIMD vector size
eve::algo::transform_to(eve::algo::views::zip(aa, bb, cc), result, quad_it);
};
using a_t = std::array<double, 11>;
a_t a{5.0, 12.0, 6.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0e-20, 0.0, 1.0};
a_t b{3.0, 1.0, 4.0, -2.0, 2.0, 1.0, 1.0, 1.0, 1.0e20, 1.0, 2.0};
a_t c{-1.0, -5.0, -6.0, -6.0, 5.0, 30.0, 35.0, -40.0, -1.0, -1.0, 1.0};
a_t rmin, rmax;
auto r = eve::views::zip(rmin, rmax); //prepare container view for the results
quadratics(r, a, b, c);
print("a ", a);
print("b ", b);
print("c ", c);
print("rmin", rmin);
print("rmax", rmax);
}
constexpr auto sign
elementwise_callable object computing the sign of the parameter.
Definition sign.hpp:76
constexpr auto sqrt
Computes the elementwise square root of the parameter.
Definition sqrt.hpp:81
constexpr auto sqr
Computes the square of the parameter.
Definition sqr.hpp:97
constexpr auto nan
Computes the IEEE quiet NaN constant.
Definition nan.hpp:67
constexpr auto zero
Computes the constant 0.
Definition zero.hpp:78
constexpr auto fnma
strict_elementwise_callable computing the fused multiply add of its three parameters.
Definition fnma.hpp:89
constexpr auto fam
strict_elementwise_callable computing the fused add multiply of its three parameters.
Definition fam.hpp:86
constexpr auto if_else
Select value based on conditional mask or values.
Definition if_else.hpp:197
void swap_if(Mask const &mask, Value &lhs, Value &rhs) noexcept
Conditional swap.
Definition swap_if.hpp:39
constexpr auto is_eqz
elementwise callable returning a logical true if and only if the element value is zero....
Definition is_eqz.hpp:74
constexpr auto any
Computes a bool value which is true if and only if one or more elements of x evaluates to true.
Definition any.hpp:96
constexpr auto zip
Callable for SoA value constructions.
Definition zip.hpp:85

If we suppose that the coefficients are still given in three std::array (any contiguous range type can do the job) a, b and c. We must read the three arrays by chunks in simd vectors, apply to the triple of simd vectors a resolution algorithm then store the results.

  • EVE core has the material to write the simd resolution.
  • EVE algo has the material to iterate on scalar datas in an SIMD way using the SIMD resolution.
  • The delicate pass is to transmit the input datas and recover the results in a proper way: this is mainly the eve::zip work, that allow to jump from aos to soa.
    • r is a view to the pair of arrays containing the results (zip is mandatory to define it)
    • We give as input the zip(aa, bb, cc) triple of datas arrays

Now the lambda quad_it do the same kind of work as the primary scalar version, but in an simd way. Differences are:

  • The lambda must have an unique paramater to fit the transform_to requirements. This parameter is bound to the three simd vectors by auto [aaa, bbb, ccc] = e;
  • The routine to solve the equation is also to be a bit changed :
    • we slightly optimize the computations by using fnma and fma (which turn to normal computations using *+- if SIMD intrinsics are not available), but this is not mandatory.
    • The use of eve::if_else is necessary because we have to compute the two branches has the zero equality can arise in some but not all lanes of rmax.
    • the swap_if line is only necessary if we want roots ordered by increasing order.
    • The case where some aaaare zero is deported at the end and treated differently.
    • To return the appropriate data to be used by eve::transform_to the call to eve::zip is required.

Conclusion

We have solved using SIMD parallelism a bunch of quadratic equation, without explicit knowledge of the current architecture.