E.V.E
v2023.02.15
 
Loading...
Searching...
No Matches
Frequency Scaling.

In SIMD programming there is a known issue of processor frequency scaling: when working with wider registers, in order to avoid overheating, some processors limit their CPU frequency. There are a lot of situations where this can happen but it is a noticeable problem mostly for 64 byte registers on intel avx512 cpus.

This is why, for example, if you look at libc, it at most uses 32 byte registers: sure you might speed up the strlen somewhat but then all the code after will be slower.

For big datasets the price of lower frequency is often outweighed by processing more numbers in open operation and seed ups of 15% are not unheard of.

For us, This lead to a dilemma in the API design: if the user is on the AVX512 system, most likely they expect the register to be 64 bytes. But we suspect this is not what they actually want. So we decided that eve::wide on avx512 is by default 64 bytes but algorithms by default use 32 bytes. If you want to get an algorithm to use 64 byte you can pass eve::algo::allow_frequency_scaling trait. There are also typedefs nofs_wide, nofs_logical where nofs stands for "no frequency scaling".

If you would like the default register size to be 64 bytes on AVX512, for example because you are on AMD where this problem does not exist, or you just have a domain where that makes sense - you can define EVE_AVX512_DEFAULT_64_BYTES in your compiler options.

Note
Other than on avx512 on intel we always use the maximum width of the register, since we expect the compiler to do it anyways and it is usually accepted. If you want to set a specific cardinal for an algorithm, you can always use eve::algo::force_cardinal.
#include <eve/module/core.hpp>
#include <eve/module/algo.hpp>
#include <span>
#ifndef EVE_AVX512_DEFAULT_64_BYTES
static_assert(eve::current_api != eve::avx512 || eve::nofs_cardinal_v<std::int8_t> == 32);
static_assert(eve::current_api != eve::avx512 || eve::expected_cardinal_v<std::int8_t> == 64);
#endif // EVE_AVX512_DEFAULT_64_BYTES
auto find_small_array(std::span<const int> a)
-> std::span<const int>::iterator
{
return eve::algo::find_if(a, [](eve::nofs_wide<int> x) { return x < 0; });
}
auto find_large_array(std::span<const int> a)
-> std::span<const int>::iterator
{
[](eve::wide<int> x) { return x < 0; });
}
auto find_use_4_ints(std::span<const int> a)
-> std::span<const int>::iterator
{
return eve::algo::find_if[eve::algo::force_cardinal<4>](a,
[](eve::wide<int, eve::fixed<4>> x) { return x < 0; });
}
constexpr auto allow_frequency_scaling
You can find more explanations in the 'frequency scaling tutorial'. On intel using 64 byte registers ...
Definition: traits.hpp:350
constexpr auto find_if
SIMD version of std::find_if.
Definition: find.hpp:130
Wrapper for SIMD registers.
Definition: wide.hpp:71