◆ compress

callable_compress_ eve::compress = {}

inlineconstexpr

Defined in Header

#include <eve/module/core.hpp>

Note

this is very low level function, most likely you are looking for eve::compress_copy or eve::compress_store.

FIX-1647: eve::compress doesn't support wide<tuple> yet.
the mask type can be any logical with the same cardinal.

Compression in simd is moving selected elements to the front of the simd_value. Unfortunately, not for all simd_value, not for all plaftorms that can be done efficiently. So the operation splits the input into chunks for which it's possible.

The function perfoms the following steps: 1) splits the simd_value and mask into chunks, that can be processed in one go. This depends on what instructions are availiable. 2) Each chunk, gets shuffled in a way that moves selected elements (mask == true) to the front. The tail of the resulting value is unspecified. [a, b, c, d], (false, true, false, true) -> [b, d, _, _] 3) For each chunk we also compute how many elements are selected. (in the example - 2). 4) Both shuffled chunk and a number are put in a kumi::tuple<simd_value, std::ptrdiff_t> TODO: there is a bug where sometimes it's an int and not std::ptrdiff_t. 5) Those chunks are combined together in another tuple.

List of people who's work was instrumental for building this:

@aqrit user on Stack Overflow
Peter Cordes

Throughout the code of compress there are references to what was taken from where as well as explanations.

Callable Signatures

namespace eve
{
   template <simd_value T, logical_simd_value L>
   auto compress(T x, L m);                       // (1)
 
   template <relative_conditional_expr C,
             simd_value T,
             logical_simd_value L>
   auto compress[C ignore](T x, L m)              // (2)
}

Parameters

x - simd_value to compress
m - mask which markes selected elements as true
ignore - optional eve::relative_conditional_expr, passed in []. Ignored elements are treated as not selected.

Return value

kumi::tuple<kumi::tuple<simd_value, std::ptrdiff_t>, ...> - tuple of compressed chunks, constructed as described earlier.
The operation is performed conditionnaly.

Example

#include <eve/module/core.hpp>
#include <tts/tts.hpp>
 
// template to make if constexpr not activate
template <typename T>
void show_return_type(T)
{
  if constexpr (std::same_as<T, std::int8_t> && eve::current_api == eve::sse4_2)
  {
    // On sse4_2 the default wide<std::int8_t>::size() == 16
 
    // We cannot compress 16 bytes in one step,
    // We need to split it into 2 chunks of 8.
    //
    // So the result will be 2 chunks of 8.
    // We will also return how many are in each chunk
 
    // clang-format off
    eve::wide<T> in {
       1,  2,  0,  4, //
       5,  0,  6,  7, //
       8,  9, 10, 11, //
      12,  0, 14, 15, //
    };
    // clang-format on
 
    using i8x8 = eve::wide<std::int8_t, eve::fixed<8>>;
 
    using chunk = kumi::tuple<i8x8, int>;  // int should be ptrdiff_t - this is a bug
 
    // ignore will be interpreted as false in the mask
    kumi::tuple<chunk, chunk> compressed = eve::compress[eve::ignore_first(1)](in, in != 0);
 
    auto [lo, hi] = compressed;
 
    auto [lo_compressed, lo_count] = lo;
    auto [hi_compressed, hi_count] = hi;
 
    TTS_EQUAL(5, lo_count); // 2 zeroes in the first 8 elements + ignore_first
    TTS_EQUAL(7, hi_count); // 1 zero in the second 8 elements
 
    // The 'tail' after removed elements is unspecified
    // so looking at them is not helpful.
    lo_compressed.set(5, -1);
    lo_compressed.set(6, -1);
    lo_compressed.set(7, -1);
    hi_compressed.set(7, -1);
 
    TTS_EXPECT(eve::all(lo_compressed == i8x8{2, 4, 5, 6, 7, -1, -1, -1}));
    TTS_EXPECT(eve::all(hi_compressed == i8x8{8, 9, 10, 11, 12, 14, 15, -1}));
  }
}
 
// Here is how one can use `eve::compress` directly.
// This how for some platforms we can implement `compress_copy_unsafe_dense`.
int* compress_copy_using_compress_directly(const int* in, int* out)
{
  auto loaded = eve::load(in);
 
  // a tuple or compressed wides.
  // each part is not just a wide but is a tuple<wide, count>
  // so that you know how to compact values after
  //
  // So using chunk = kumi::tuple<wide<int, N1>>;
  // So kumi::tuple<chunk, ...>
  kumi::tuple compressed_whole = eve::compress(loaded, loaded != 0);
 
  kumi::for_each([&](auto compressed_lengh) {
    auto [compressed, length] = compressed_lengh;
    eve::store(compressed, out);
    out += length;
  }, compressed_whole);
 
  return out;
}
 
void validate_compress_copy(auto f)
{
  constexpr std::size_t N = eve::wide<int>::size();
  std::array<int, N> in = {};
 
  for (std::size_t i = 0; i != N; ++i) {
    if (i % 4 == 0) in[i] = 0;
    else
    {
      in[i] = i;
    }
  }
 
  std::array<int, N> expected = {};
  std::copy_if(in.begin(), in.end(), expected.begin(), [](int x) { return x != 0; });
 
  std::array<int, N> out = {};
  auto* o = f(in.data(), out.data());
  std::fill(o, out.data() + out.size(), 0);
 
  TTS_EQUAL(expected, out);
}
 
int main()
{
  show_return_type(std::int8_t{0});
  validate_compress_copy(compress_copy_using_compress_directly);
}

	E.V.E v2023.02.15