This example demonstrates solving multiple quadratic equations in parallel using EVE.

For each quadratic equation \(ax^2 + bx + c = 0\), we compute the discriminant \(b^2 - 4ac\) and then calculate the solution using the maximal absolute value root by the usual quadratic formula: \( x_1 = \frac{-b -\mathbf{sign}(b)\sqrt{b^2 - 4ac}}{2a}\) and the second root \( x_2\), using the fact that \(\frac ca\) is equal to \(x_1*x_2\).

This is the stable way for computing the roots that respects the relative errors and is ok even if \(b^2 - 4ac \approx b^2\).

The computation must returns two Nans if the roots are not real or all coefficients are zero and one Nan if the first parameter is zero. (For complex roots use kyosu library which is fit to complex numbers scalar or SIMD computations and more).

We'll solve 11 different quadratic equations simultaneously. (11 just to show that the number of equations need not be a multiple of the SIMD current harware vector size).

But first let us see the scalar case:

We suppose that the quadratic coefficients are given in variables a, b and c having a floating type float or double;

The scalar code (without EVE) can look like this:

#include <iostream>
#include <array>
#include <cmath>
 
int main()
{
  auto print = [](std::string name, auto v){ //utility to print the results
    std::cout << name << " =  {";
    for(std::size_t i=0; i <  v.size(); ++i) std::cout << v[i] << ((i != v.size()-1) ? ", ":"}");
    std::cout << std::endl;
  };
 
  auto quadratic = [](auto const& a, auto const& b, auto const& c){
    if (a == 0) return std::pair(-c/b, a/a);                 // second root is nan
    auto delta = b*b-4*a*c;                                  // compute delta
    auto rmax = (-b-(b > 0 ? 1 : 0)*std::sqrt(delta))/(2*a); // compute the root with maximal absolute value
    auto rmin = (rmax == 0 ? 0 : c/(rmax*a));                // compute the other root
    if (rmin > rmax) std::swap(rmin, rmax);                  // order by increasing values
    return std::pair(rmin, rmax);                            // return properly typed values
  };
 
  using a_t = std::array<double, 11>;
 
  a_t a{5.0,  12.0,  6.0,  7.0, 1.0, 1.0,   1.0,   1.0, 1.0e-20, 0.0, 1.0};
  a_t b{3.0,   1.0,  4.0, -2.0, 2.0, 1.0,   1.0,   1.0, 1.0e20,  1.0, 2.0};
  a_t c{-1.0, -5.0, -6.0, -6.0, 5.0, 30.0, 35.0, -40.0, -1.0,   -1.0, 1.0};
  a_t rmin, rmax;
 
  for(std::size_t i=0; i < 11; ++i){
    auto p =  quadratic(a[i], b[i], c[i]);
    rmin[i] = p.first;
    rmax[i] = p.second;
  }
  print("a   ", a);
  print("b   ", b);
  print("c   ", c);
  print("rmin", rmin);
  print("rmax", rmax);
}

What have we to do for allowing treatment of multiple quadratic equations in an SIMD fashion?

Let us give the complete program and comment the steps later.

#include <eve/module/core.hpp>
#include <eve/module/algo.hpp>
#include <iostream>
#include <array>
 
int main()
{
  auto print = [](std::string name, auto v){ //utility to print the results
    std::cout << name << " =  {";
    for(std::size_t i=0; i <  v.size(); ++i) std::cout << v[i] << ((i != v.size()-1) ? ", ":"}");
    std::cout << std::endl;
  };
 
  auto quadratics = [](auto & result,  auto const& aa, auto const& bb, auto const& cc){
    // result will be a zip of two arrays
    auto quad_it = [](auto e){                                               //this is the EVE version of the scalar resolution algorithm above
      auto [aaa, bbb, ccc] = e;                                              // get the coefficients
      auto delta = eve::fnma(4*aaa, ccc, eve::sqr(bbb));                     // compute delta (fnma is the fma version of -a*b+c
      auto rmax = eve::fam(-bbb, -eve::sign(bbb), eve::sqrt(delta))/(2*aaa); // compute the root with maximal abslute value
      auto rmin = eve::if_else(eve::is_eqz(rmax), eve::zero, ccc/(rmax*aaa));// compute the other root (is_eqz(rmax) test for zeroes. Could be `rmax == 0`)
      eve::swap_if(rmin > rmax, rmin, rmax);                                 // order by increasing values
      auto iseqza = eve::is_eqz(aaa);
      if (eve::any(iseqza))                                                  // dont do if no element is zero
      {
        rmin = eve::if_else(eve::is_eqz(aaa), -ccc/bbb, rmin);               // first degree aaa is zero first root is -ccc/bbb
        rmax = eve::if_else(eve::is_eqz(aaa), eve::nan, rmax);               // first degree aaa is zero second root is nan
      }
      return eve::zip(rmin, rmax);                                           // return properly typed values (always use zip to return multiple values)
    };
    // the algo transform_to takes care of using simd chunks to use quad_it
    // the input vector size need not be a multiple of (the automagically chosen) best SIMD vector size
    eve::algo::transform_to(eve::algo::views::zip(aa, bb, cc), result, quad_it);
  };
 
  using a_t = std::array<double, 11>;
 
  a_t a{5.0,  12.0,  6.0,  7.0, 1.0, 1.0,   1.0,   1.0, 1.0e-20, 0.0, 1.0};
  a_t b{3.0,   1.0,  4.0, -2.0, 2.0, 1.0,   1.0,   1.0, 1.0e20,  1.0, 2.0};
  a_t c{-1.0, -5.0, -6.0, -6.0, 5.0, 30.0, 35.0, -40.0, -1.0,   -1.0, 1.0};
  a_t rmin, rmax;
  auto r = eve::views::zip(rmin, rmax); //prepare container view for the results
  quadratics(r, a, b, c);
  print("a   ", a);
  print("b   ", b);
  print("c   ", c);
  print("rmin", rmin);
  print("rmax", rmax);
}

If we suppose that the coefficients are still given in three std::array (any contiguous range type can do the job) a, b and c. We must read the three arrays by chunks in simd vectors, apply to the triple of simd vectors a resolution algorithm then store the results.

EVE core has the material to write the simd resolution.
EVE algo has the material to iterate on scalar datas in an SIMD way using the SIMD resolution.
The delicate pass is to transmit the input datas and recover the results in a proper way: this is mainly the eve::zip work, that allow to jump from aos to soa.
- r is a view to the pair of arrays containing the results (zip is mandatory to define it)
- We give as input the zip(aa, bb, cc) triple of datas arrays

Now the lambda quad_it do the same kind of work as the primary scalar version, but in an simd way. Differences are:

The lambda must have an unique paramater to fit the transform_to requirements. This parameter is bound to the three simd vectors by auto [aaa, bbb, ccc] = e;
The routine to solve the equation is also to be a bit changed :
- we slightly optimize the computations by using fnma and fma (which turn to normal computations using *+- if SIMD intrinsics are not available), but this is not mandatory.
- The use of eve::if_else is necessary because we have to compute the two branches has the zero equality can arise in some but not all lanes of rmax.
- the swap_if line is only necessary if we want roots ordered by increasing order.
- The case where some aaaare zero is deported at the end and treated differently.
- To return the appropriate data to be used by eve::transform_to the call to eve::zip is required.

Conclusion

We have solved using SIMD parallelism a bunch of quadratic equation, without explicit knowledge of the current architecture.

	E.V.E v2023.02.15