Initial problem

Let's say we want to convert 2D cartesian coordinates to 2D polar coordinates. This is a rather common exercise and will only require basic arithmetic and trigonometric functions.

Cartesian coordinates are usually represented by a pair of floating point values \((x,y)\). In a similar way, polar coordinate can be represented as a pair \((\rho,\theta)\) where \(\rho\) represents the length of the ray and \(\theta\) represents the angle with the X axis.

A bit of geometry in the unit circle leads to a relationship between the cartesian and the polar coordinates :

Therefore, we can derive the following C++ functions that perform this conversion:

#include <cmath>
 
float rho(float x, float y)
{
  return std::sqrt(x * x + y * y);
}
 
float theta(float x, float y)
{
  return std::atan2(y, x);
}

As expected, the code only requires arithmetic operations and some trigonometry. Fellow mathematicians in the audience may have some remarks on this code. For now, we will deal with the fact that it requires \(\theta\) to be in radians and that the accuracy of computing \(\rho\) this way is maybe sub-optimal. We will address those concerns later.

From scalar to SIMD using eve::wide

The next step is to work a SIMD version of those functions. When dealing with SIMD data types, one has to remember that a single operation has to be performed on multiple values. There, we will be working on multiple xand y to computes multiple \(\rho\) and \(\theta\).

SIMD instructions sets provides architecture-specific types for SIMD register along with ISA specific functions. Of course, handling such types and functions is highly non-portable. To overcome this issue, EVE provides a generic, architecture agnostic type for SIMD computation: eve::wide.

As EVE provides SIMD implementation of all basic operators and all the common math functions (among others), the conversion from scalar to SIMD is done by:

including the proper header files
replacing float with eve::wide<float>
calling function from namespace eve instead of std.

The SIMD version of our conversion functions are then given by:

#include <eve/wide.hpp>
#include <eve/module/core.hpp>
#include <eve/module/math.hpp>
 
eve::wide<float> rho(eve::wide<float> x, eve::wide<float> y)
{
  return  eve::sqrt(x * x + y * y);
}
 
eve::wide<float> theta(eve::wide<float> x, eve::wide<float> y)
{
  return eve::atan2(y, x);
}

eve::wide behaves like a regular type and can just be dropped as a replacement for any C++ arithmetic types.

Handling eve::wide

The remaining question is how to put data inside an instance of eve::wide so we can write tests for our SIMD cartesian to polar conversion function.

#include <iostream>
 
void check_polar()
{
  eve::wide<float> x1{4};
  eve::wide<float> y1{[](auto i, auto ) { return 1.5f*(i+1); }};
 
  std::cout << x1 << " " << y1 << " => " << rho(x1,y1) << "\n";
 
  float data[] = {1.5f, 3, 4.5f, 6, 7.5f, 9, 10.5f, 12, 13.5, 15, 16.5, 18, 19.5, 21, 22.5, 24};
  eve::wide<float> y2{&data[0]};
 
  std::cout << x1 << " " << y2 << " => " << theta(x1,y2) << "\n";
}

There, we construct three instances of eve::wide using three different constructors:

x1 is constructed from a single scalar value. The result is a register with a 4 in every lane.
y1 is constructed from a callable object taking two parameters. The first parameter is the current lane to compute, the second is the actual width of the register. The callable object is called once per lane with the corresponding lane index. Here, we use a lambda function that store 1.5 * (1 + i) in the \(i^{th}\) lane.
y2 is constructed from a pointer to float. This will perform a memory-to-register transfer and read eve::wide<float>::size() contiguous elements starting from &data[0].

An important detail is to note that neither the eve::wide type nor any of those constructors requires an a priori knowledge of the actual CPU architecture nor the actual number of lane in the register. This architecture-agnostic style is the main advantage of EVE as it guarantees the portability of the code.

Let's compile this small test using g++:

g++ intro-01.cpp -O3 -DNDEBUG -I/path/to/eve

eve

EVE Main Namespace.

Definition: abi.hpp:18

By default, if you're compiling on a Intel X86_64 machine, the SSE2 SIMD extension set will be used by the compiler. Under SSE2, SIMD register of float have four lanes. So the expected result is:

(4, 4, 4, 4) (1.5, 3, 4.5, 6) => (4.272, 5, 6.0208, 7.2111)

(4, 4, 4, 4) (1.5, 3, 4.5, 6) => (0.358771, 0.643501, 0.844154, 0.982794)

But what if we want to use a SIMD extension with larger registers, say AVX2 for example? In this case, you need to pass the appropriate option to your compiler. For g++, this means using -mavx2.

g++ intro-01.cpp -O3 -DNDEBUG -mavx2 -I/path/to/eve

As AVX2 registers are twice as big, the output should now contains 8 values:

(4, 4, 4, 4, 4, 4, 4, 4) (1.5, 3, 4.5, 6, 7.5, 9, 10.5, 12) => (4.272, 5, 6.0208, 7.2111, 8.5, 9.84886, 11.2361, 12.6491)

(4, 4, 4, 4, 4, 4, 4, 4) (1.5, 3, 4.5, 6, 7.5, 9, 10.5, 12) => (0.358771, 0.643501, 0.844154, 0.982794, 1.08084, 1.15257, 1.20682, 1.24905)

As expected, EVE code scales naturally with the selected architecture at compile time.

Mathematical Epilogue

As stated earlier, we are currently using a slightly brutal computation for \(\rho\). Indeed, if the magnitudes of x and y vary greatly, we may end up with overflow or cancellation, both classic IEEE 754 floating-point issues.

Computing the square root of the sum of the squares of x and y, without undue overflow or underflow at intermediate stages of the computation is the job of a very specific function: std::hypot. Quite handily, EVE also proposes a SIMD implementation via eve::hypot.

A better implementation is then given as:

#include <eve/wide.hpp>
#include <eve/module/math.hpp>
 
eve::wide<float> better_rho(eve::wide<float> x, eve::wide<float> y)
{
  return  eve::hypot(x, y);
}

Conclusion

In this first tutorial, we managed to:

get familiar with eve::wide, the main SIMD enabling type from EVE
take a simple scalar operation and turn it into a SIMD function using eve::wide
compile the code for various platform and checks the results.
be aware of the architecture agnostic style of coding with eve::wide

In the next tutorial, we will use this SIMD function as it should by applying it over multiple data.

	E.V.E v2023.02.15