🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Tightly packed float/double ?

Started by
11 comments, last by Green_Baron 4 years, 10 months ago

Hi,

am working on my template skills.

Example is vec4 and mat4, but there are also vec2/3 and square mat2/3.

The vec's data is


template <typename T>
struct vec4_t {
	union {
		struct{ T x, y, z, w; };
		struct{ T r, g, b, a; };
		struct{ T s, t, u, v; };
	};

....

and the corresponding mat's


template<typename T>
struct mat4_t {
	vec4_t<T> value[4];

....

Unfortunately, the 4 vectors are not tightly in a row without padding, on my machine there are two garbage values between each vector, so that if i obtain a pointer to m[0][0] i can not use it directly to for example feed a uniform with it. I have looked over glm (which i took as an example for my tinkering), but the guy performs things depending on platform and compiler that are a little over my head.

My question is, is there an easy C++ way (alignas ?) to achieve a tight packing of the vecs in a mat ?

Sure, there are different ways to solve it (like use an array of 16 values in case of a mat4), but i'd have to rewrite a lot of things then ...

Cheers

 

Advertisement

I am a little bit confused about the extra padding you mentioned. Can you use sizeof(...) and alignof(...) on your vector class (with the type you used) and post the results? I don't see any reason why the compiler should add any padding for standard 32bit or 64bit data types to your array elements unless your vector type's size is bigger than expected or it has a special alignment requirement.

Maybe I am too tired to see it...

Greetings

Or maybe i should be more specific ?

For example i create a new mat4_t<float> wit this constructor:


explicit mat4_t<T>( T const& x ) :
	value{ vec4_t<T>{ x, 0, 0, 0 }, vec4_t<T>{ 0, x, 0, 0 },
		   vec4_t<T>{ 0, 0, x, 0 }, vec4_t<T>{ 0, 0, 0, x } } {}

that uses this vec ctor:


explicit vec4_t<T>( T const &s1, T const &s2, T const &s3, T const &s4 ) :
			x{s1}, y{s2}, z{s3}, w{s4} {}

Which should result in a float matrix with the diagonal set to x.

I try this output; the vec4 operator<< is overloaded to print out (x/y/z/w).


omath::mat4 x{ 1.0f };
omath::vec4_t<float> *f{ &(x[0]) };
for( int i{0}; i < 4; ++i )
  	std::cout << f[i] << ' ';
std::cout << std::endl;

which prints as expected:

(1/0/0/0) (0/1/0/0) (0/0/1/0) (0/0/0/1)

But obtaining a float pointer to the first element:


omath::mat4 x{ 1.0f };
float *f{ &(x[0][0]) };
for( int i{0}; i < 16; ++i )
  	std::cout << f[i] << ' ';
std::cout << std::endl;

gives:

1 0 0 0 1.4992e+13 3.06114e-41 0 1 0 0 1.4992e+13 3.06114e-41 0 0 1 0

... two garbage values after each vector. Last vector missing here but it is there if i go up with the counter.

I have checked in both ctors, the data type is float. The overloaded operators[] return value[0] for mat and x for the vec. Maybe i am missing something very basic here, but what i'd like is a pointer to a contiguous array of 16 floats, if possible. Sure, i could write a function and stitch it together, no problem, but somehow it feels like there should be a more elegant solution ?

 

 

I'll echo DerTroll's suggestion of posting the results of (at least) sizeof for the types in question. It seems like that info could be useful (in addition to what you posted above).

7 hours ago, Green_Baron said:

struct{ T x, y, z, w; };

 

7 hours ago, Green_Baron said:

vec4_t<T> value[4];

Except if I missed something, this is not the same than having T value[4][4]. How would the compiler know if you have an array of 4 elements or 3 or 5... It cannot. If you access the pointer as a simple array, this should behave better, and as long as you are with C++11 or superior.

7 hours ago, Green_Baron said:

My question is, is there an easy C++ way (alignas ?) to achieve a tight packing of the vecs in a mat

Alignas will never allow you to get data more tightly aligned. The only way to do so is by ensuring it yourself.

Good morning ?

Sure, sorry, here it comes for floats, and it doesn't look as expected:

The size of a float is 4, alignment 4; a vec4<float>'s size is 24 (i expected 16), alignment 8, a mat4<float> has 104 bytes (i expected 56), alignment 8.

Even if we calculate the mat's size with 24 bytes per vec4, there are 2 extra bytes. The operators[] are overloaded to return x,y,z,w in case of a vec4 or the ith element of a mat4.

My clumsy tinkering is simply based on the assumption that an array is laid out in memory without any padding, but this is apparently not the case for the vec4[4]. The glm library (which i use as guide) achieves packing with the use of alignment operators plus some magic i don't understand.

Sure, this all is just instinctive play, i can always revert to other methods of storing the data or write methods to stitch it together into a contiguous array before returning a pointer, i just thought that there might be a way to achieve a deeper understanding ...

 

I was not able to reproduce that. My code (copy of yours):

vec.h


#pragma once

template <typename T>
struct vec4_t {
    explicit vec4_t<T>( T const &s1, T const &s2, T const &s3, T const &s4 ) :
                x{s1}, y{s2}, z{s3}, w{s4}
    {}
    union {
            struct{ T x, y, z, w; };
            struct{ T r, g, b, a; };
            struct{ T s, t, u, v; };
        };
};

mat.h


#pragma once

#include "vec.h"

template<typename T>
struct mat4_t {
    explicit mat4_t<T>( T const& x ) :
        value{ vec4_t<T>{ x, 0, 0, 0 }, vec4_t<T>{ 0, x, 0, 0 },
               vec4_t<T>{ 0, 0, x, 0 }, vec4_t<T>{ 0, 0, 0, x } }
    {}
    vec4_t<T> value[4];
};

main.cpp


int main()
{

    mat4_t<float> mat{1.0f};
    std::cout << sizeof (mat) << std::endl;
    auto* data = &(mat.value[0].x);
    for( int i{0}; i < 16; ++i )
        std::cout << data[i] << ' ';
    std::cout << std::endl;
}

Result:


64
1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Compiled with GCC 9.1 64 Bit Linux.

I'm having trouble believing that a compiler would add extra padding to a 16-byte struct.  However, I can easily believe that there is extra (non-garbage) data in your struct, either a virtual function table pointer or some other data member that you haven't mentioned.  Can you verify that your struct has no virtual functions?  Can you post the entire struct?

I feel so stupid, that was it.

Both had virtual destructors.

For reference, the entire vec4 struct:


template<typename T>
struct vec4_t {
	union {
		struct{ T x, y, z, w; };
		struct{ T r, g, b, a; };
		struct{ T s, t, u, v; };
	};

	vec4_t<T>() : x{0.0f}, y{0.0f}, z{0.0f}, w{0.0f} {}

	vec4_t<T>( vec4_t<T> const& other ) :
			x{other.x}, y{other.y}, z{other.z}, w{other.w} {}

	vec4_t<T>( T const &scalar ) :
			x{scalar}, y{scalar}, z{scalar}, w{scalar} {}

	explicit vec4_t<T>( T const &s1, T const &s2, T const &s3, T const &s4 ) :
			x{s1}, y{s2}, z{s3}, w{s4} {}

	vec4_t<T>( const vec3_t<T> &v, T const &s4 ) :
			x{v.x}, y{v.y}, z{v.z}, w{s4} {}

	vec4_t<T>( const vec3_t<T> &v ) :
			x{v.x}, y{v.y}, z{v.z}, w{1} {}

	vec4_t<T>( T const &s1, const vec3_t<T> &v ) :
			x{s1}, y{v.x}, z{v.z}, w{v.z} {}

	template<typename U>
	vec4_t<T>( const vec4_t<U> &other ) :
		x{static_cast<T>(other.x)}, y{static_cast<T>(other.y)},
		z{static_cast<T>(other.z)}, w{static_cast<T>(other.w)} {}

	~vec4_t<T>() {}

	vec4_t<T> &operator=( vec4_t<T> const& other ) {
		x = other.x;
		y = other.y;
		z = other.z;
		w = other.w;
		return *this;
	}
	
	// @todo bounds checking !
	T &operator[]( const int i ) {
		switch(i) {
			default:
			case 0:
				return x;
			case 1:
				return y;
			case 2:
				return z;
			case 3:
				return w;
		}
	}

	T const& operator[]( const int i ) const {
		switch(i) {
			default:
			case 0:
				return x;
			case 1:
				return y;
			case 2:
				return z;
			case 3:
				return w;
		}
	}

	template<typename U>
	vec4_t<T> &operator=( vec4_t<U> const& v) {
		x = static_cast<T>(v.x);
		y = static_cast<T>(v.y);
		z = static_cast<T>(v.z);
		w = static_cast<T>(v.w);
		return *this;
	}

	template<typename U>
	vec4_t<T> & operator*=(  U const scalar) {
		x *= static_cast<T>(scalar);
		y *= static_cast<T>(scalar);
		z *= static_cast<T>(scalar);
		w *= static_cast<T>(scalar);
		return *this;
	}

	template<typename U>
	vec4_t<T> & operator*=( vec4_t<U> const& v) {
		x *= static_cast<T>(v.x);
		y *= static_cast<T>(v.y);
		z *= static_cast<T>(v.z);
		w *= static_cast<T>(v.w);
		return *this;
	}

	template<typename U>
	vec4_t<T> & operator/=( U const scalar) {
		x /= static_cast<T>(scalar);
		y /= static_cast<T>(scalar);
		z /= static_cast<T>(scalar);
		w /= static_cast<T>(scalar);
		return *this;
	}

	template<typename U>
	vec4_t<T> & operator/=( vec4_t<U> const& v) {
		x /= static_cast<T>(v.x);
		y /= static_cast<T>(v.y);
		z /= static_cast<T>(v.z);
		w /= static_cast<T>(v.w);
		return *this;
	}

};

 

Edit: the union{ struct; array } is now back to its original form like in the first post.

Edit: exchanged the code to the preliminary final version ?

2 hours ago, Green_Baron said:

My clumsy tinkering is simply based on the assumption that an array is laid out in memory without any padding, but this is apparently not the case for the vec4[4]

The language guarantees the elements of an array are laid out in consecutive memory without any padding.  You can rely on that.

There is no such guarantee for class members.  You can force it with compiler-specific attributes to force packing, but all bets are off for non-POD types.

The destructor and non-templated copy-assignment operator on your vec4_t template class should be either =default or implicit.  You definitely don't want to use non-POD types for this kind of math library (ie. no virtual member functions).

You don't have to qualify the member function names or return values of inline member functions with the template parameter either; easier to read is always better, given the choice.  Also, the only constructor that needs to be marked explicit is the scalar one.

Stephen M. Webb
Professional Free Software Developer

This topic is closed to new replies.

Advertisement