🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Making a Data-Oriented ECS More Extensible

Started by
14 comments, last by DrDeath3191 4 years, 8 months ago

Hello everyone.

I've been working on my game for a while now, and I am beginning to dislike my means of adding new component tables and such to my game. My components at the moment are hard-coded classes formatted as structures of arrays; you can basically think of it as a relational database. Each variable of the component has its own dedicated vector in the table, meaning I can loop through them all in a cache-friendly manner.

When it comes to the function in-game, it works nicely. Development, on the other hand, is a little annoying. As I am trying to remain data-oriented, I am avoiding inheritance as much as possible. So each component table is an entirely standalone class. However, the underlying idea of the component table is the same; some tables do have some other tricks (which I will cover later), but the general idea is similar. So I essentially have to copy a lot of my code, add it to my giant World class that has all the component tables, make sure its garbage collection gets called, etc. I feel like this should be made easier. Especially if I wanted to support modding in the future.

I do have an idea as to how to possibly improve things, but I figured I should bounce it off people who might have more experience in this field. Essentially my plan is to replace all these component table classes I've made with a single class that stores a contiguous array of bytes with some metadata. So if I have a component table that was modeled something like this:


//...

std::vector<Entity> entities;
std::vector<vec3> positions;
std::vector<quat> rotations;
std::vector<vec3> scales;

//...

I would instead model it something like this:


//...

std::vector<uint8_t> data;
size_t entitiesIndex;
size_t positionsIndex;
size_t rotationsIndex;
  
//...

Essentially, I would flatten my structure to a single array, with indexes into that array indicating where each variable can be accessed. Why would this confer any sort of advantage? Because I could then theoretically create a file format that creates these tables based on my specifications. All data fundamentally can be represented as a contiguous array of bytes; all I need to know is the size of the datatype and how many are present in order to get those offsets to point to the correct memory. It would not need any sort of inheritance based structure; its an array of bytes, and an array of offsets. I can just create them and store them in an array in the World class. If I want to get a specific table, I can just index into that array. If I want to use the data in that table, I can just do a memcpy or a reinterpret_cast and process as desired.

I've been mulling over this idea for a little while, but I still have some reservations. This has to do with the tricks I mentioned earlier. I don't use enums or bools in the component tables. I use the indices of the components to indicate what their condition is (if the index is less than x, the condition is false, for example). This allows contiguous processing with minimal branching. I would need to make the format aware of this sort of trick. Theoretically, not that hard; I just need to indicate a number subdivisions for the component. However, if I were to create an entity in this fashion, I would need to be aware of runtime information in order to put it in the right place. I suppose I could have some intermediary tables that I put those sorts of components into and then handle it as a follow-up step, but that just seems sloppy. The way I've handled that runtime problem in the past was to have a series of factory classes for each component type that would specifically be passed in that information. I am also hoping to get rid of these; if I am thinking of completely redoing my ECS, this would be the time to lose them.

I've yet to think much on how systems would be implemented better; Right now they are implemented as classes that have references to the component tables they are interested in, and a single method. I suppose I could just pass the World a series of function pointers and have them be executed in some given order. But seeing as I am going for something data-oriented, function pointers seem a bit off the path. But maybe I'm being silly.

Has anybody here implemented an ECS in a similar fashion? Is it worth my time to try to rewrite my entire ECS structure in this way? Is what I want even feasible or wise? How would you tackle this problem? I hope I expressed myself clearly enough; the thought of this has wracked my brain for a bit, so I might be a little caught in my own world here.

Thanks for the help.

Advertisement
5 hours ago, DrDeath3191 said:

As I am trying to remain data-oriented, I am avoiding inheritance as much as possible.

Inheritance also could achieve the so-called DOD, while it's more flexible to choose a composition approach. e.g.:


struct A
{
	int foo;
};

struct B
{
	A a;
	int bar;
};

struct C : public A
{
	int bar;
};

 

5 hours ago, DrDeath3191 said:

So I essentially have to copy a lot of my code, add it to my giant World class that has all the component tables, make sure its garbage collection gets called, etc.

Would you consider to use some sort of the Singleton Pattern for the aggregation of the component tables?

5 hours ago, DrDeath3191 said:

Essentially my plan is to replace all these component table classes I've made with a single class that stores a contiguous array of bytes with some metadata.

If you could implement any kind of mechanisms that ensures different std::vector<T> could be allocated adjacent to each other in heap memory, then they would achieve the same result. Don't build the wheel by yourself if the language and the standard library has already built it for you.

5 hours ago, DrDeath3191 said:

All data fundamentally can be represented as a contiguous array of bytes; all I need to know is the size of the datatype and how many are present in order to get those offsets to point to the correct memory. It would not need any sort of inheritance based structure; its an array of bytes, and an array of offsets.

What you are talking about is almost a custom heap memory management module. Take a look at how to implement your own std::allocator if you want to keep sticking with STL.

5 hours ago, DrDeath3191 said:

I use the indices of the components to indicate what their condition is (if the index is less than x, the condition is false, for example). This allows contiguous processing with minimal branching.

How do you know this approach actually minimizes branching? Is branching hurt your performance? Is there any practical profiling results that support your design? Are you sure this kind of design approach suitable enough for your target CPU architecture?

5 hours ago, DrDeath3191 said:

However, if I were to create an entity in this fashion, I would need to be aware of runtime information in order to put it in the right place. I suppose I could have some intermediary tables that I put those sorts of components into and then handle it as a follow-up step, but that just seems sloppy. The way I've handled that runtime problem in the past was to have a series of factory classes for each component type that would specifically be passed in that information. I am also hoping to get rid of these; if I am thinking of completely redoing my ECS, this would be the time to lose them.

There is no silver bullet for a one-in-one-out factory, at least hard to design. My personal idea about the object instantiation is tended to keep things simple, that I would parse the object creation information to the final stage when they could be put into the component's data directly, until then I won't care about the immediate data because they are temporary garbage, I'd use some RAII or other things to get rid of the footprint. Again, it's a situational-oriented solution for me, I don't always SoA so much if they are non-sense for certain business, just let the L2-cache hit-rate misses like hell until it really hurts the performance significantly.

5 hours ago, DrDeath3191 said:

But seeing as I am going for something data-oriented, function pointers seem a bit off the path.

Don't be a fundamentalist of DOD, function pointers, lambda, callable objects and whatever, they are also a kind of data, the code is data, the procedure is data, everything is data when they are consumed by a certain module. If you're targeting C++11 or later, why not take a look at std::function and std::packaged_task? Functional Programming is a naturally good friend of DOD!

Finally, ECS is just ECS, it's not a worthwhile solution if it is not worthwhile for a certain problem.

Hope my mumbling could help you a little bit, happy coding?‍??‍?

5 hours ago, zhangdoa said:

 

11 hours ago, DrDeath3191 said:

So I essentially have to copy a lot of my code, add it to my giant World class that has all the component tables, make sure its garbage collection gets called, etc.

Would you consider to use some sort of the Singleton Pattern for the aggregation of the component tables?

He should just write and use a few templates, if he always needs to copy and paste code, which only changes cause of different component types. Singletons woud only add more useless boilerplate code.

5 hours ago, zhangdoa said:

Inheritance also could achieve the so-called DOD, while it's more flexible to choose a composition approach. e.g.:

Ehh, I'm not too sure about that. I also have some other methods to the components like accessors, registry and removal. These may need to be slightly different based on the specific needs of the data, meaning I have virtual function pointers to worry about which are not so great performance wise, and definitely to be avoided in data-oriented stuff.

5 hours ago, zhangdoa said:

Would you consider to use some sort of the Singleton Pattern for the aggregation of the component tables?

No. There's no need for this to be a singleton.

5 hours ago, zhangdoa said:

If you could implement any kind of mechanisms that ensures different std::vector<T> could be allocated adjacent to each other in heap memory, then they would achieve the same result. Don't build the wheel by yourself if the language and the standard library has already built it for you.

Not sure I follow you here. What mechanisms are you talking about?

5 hours ago, zhangdoa said:

What you are talking about is almost a custom heap memory management module. Take a look at how to implement your own std::allocator if you want to keep sticking with STL.

Pretty much, I guess. I can look into it, although I could easily replace the functionality with pointers instead of vectors. Would probably increase my debug build's performance, too.

5 hours ago, zhangdoa said:

How do you know this approach actually minimizes branching? Is branching hurt your performance? Is there any practical profiling results that support your design? Are you sure this kind of design approach suitable enough for your target CPU architecture?

By its very design. Let's say for example I have an attack originating from Team 1 that can hit only teams 2, 3, and 4. I could store an enumeration in the hitbox table, loop through all of them and ask "what team are you?". This would result in a bunch of branches by necessity. However, if the table is sorted in such a way that teams are contiguously aligned, I don't actually need to ask; I know by the index that this is a valid target to check. No branches, and cache friendly to boot.

6 hours ago, zhangdoa said:

There is no silver bullet for a one-in-one-out factory, at least hard to design. My personal idea about the object instantiation is tended to keep things simple, that I would parse the object creation information to the final stage when they could be put into the component's data directly, until then I won't care about the immediate data because they are temporary garbage, I'd use some RAII or other things to get rid of the footprint. Again, it's a situational-oriented solution for me, I don't always SoA so much if they are non-sense for certain business, just let the L2-cache hit-rate misses like hell until it really hurts the performance significantly.

I suppose putting it off to a follow-up step is how I would have to do it, if I pursue this idea further. Ah well, I suppose it isn't too terrible.

6 hours ago, zhangdoa said:

Don't be a fundamentalist of DOD, function pointers, lambda, callable objects and whatever, they are also a kind of data, the code is data, the procedure is data, everything is data when they are consumed by a certain module. If you're targeting C++11 or later, why not take a look at std::function and std::packaged_task? Functional Programming is a naturally good friend of DOD!

Finally, ECS is just ECS, it's not a worthwhile solution if it is not worthwhile for a certain problem.

Hope my mumbling could help you a little bit, happy coding?‍??‍?

Well the idea behind DOD is to maximize the efficiency of data traversal. Just because stuff is 'data' doesn't make it DOD. However, I think you may have a point in regards to the systems; a function pointer is not that much more expensive that a direct function call, and it would be rather convenient to add systems that way.

If you might be wondering where I'm getting some of my ideas, I modeled my ECS after this article. I of course stopped following before doing the binary blob, but am currently rethinking that decision.

Thanks very much for your input!

1 hour ago, DrDeath3191 said:

Ehh, I'm not too sure about that. I also have some other methods to the components like accessors, registry and removal. These may need to be slightly different based on the specific needs of the data, meaning I have virtual function pointers to worry about which are not so great performance wise, and definitely to be avoided in data-oriented stuff.

As @wintertime has already mentioned, template is a good idea to solve the polymorphism problem in compile-time. Basically, if you could combine composition over inheritance, function overloading and template specialization in an elegant way then you would get a similar or even better performance than vptr and vtable old school games, DOD is not always against OOD ?.

1 hour ago, DrDeath3191 said:

Not sure I follow you here. What mechanisms are you talking about?

 

1 hour ago, DrDeath3191 said:

Pretty much, I guess. I can look into it, although I could easily replace the functionality with pointers instead of vectors. Would probably increase my debug build's performance, too.

If you want to use std::vector efficiently, then manage heap memory allocation "by yourself". It's not about how to implement some complex scary stuff, just design some allocation tracker or wrapper classes at least (the default std::allocator implementation of MSVC 19.xx on my computer is just a wrapper around the global new()), and use some factory to instantiate your component vectors, then you could fully ensure that they would be put into coherent heap memory ranges.

Another choice is using a pre-allocated std::array if your component's maximum count is finite (but it's already doesn't matter whether it's an std::array or std::vector). Or you could implement an object pool with pre-allocated raw heap memory. Again, all options are possible, just you need to find the best one for your situation. Main memory is typically (almost actually) a DRAM, while cache memories inside the CPU chip are typically (almost actually) SRAM, just be aware that whether how you represent the array structure in your C++ code, the physics of the hardware would always have the same behavior, all the overburdens come from the abstraction.

1 hour ago, DrDeath3191 said:

However, if the table is sorted in such a way that teams are contiguously aligned, I don't actually need to ask; I know by the index that this is a valid target to check. No branches, and cache friendly to boot.

Who paid the bill of your sort operation ?? Any std::sort stuff? Did you have any overloaded operator==() or operator<()? Do you sort without any comparison? If so, that's awesome! But if not, then you still have the cost of branching at a certain moment when comparison happens ?.

I'm fully agreed with DOD as far as Mike Acton broadcasted the idea louder in his cppcon talk, also I've read the Bitsquid's blog post long time ago while I was also ECS-ing at that moment (His post is awesome btw!). Just my experiences told me that nothing in reality is really as simple and elegant as the example codes they are. When you move your head to the products, you always need to complex things up or compromise things down. One major "con" of DOD, or of ECS, is it quite rely on a well designed Software Model (Or a brain blew up programmer ?), you must model the Domain Model into computer-friendly rather than programmer-friendly tasks, then you have to profile often in order to make sure that your design is really cache-friendly. If you have a further interest you could read chapter 6 of Computer Systems: A Programmer's Perspective, it discussed the memory and cache related topics thoroughly ?.

56 minutes ago, zhangdoa said:

Who paid the bill of your sort operation ?? Any std::sort stuff? Did you have any overloaded operator==() or operator<()? Do you sort without any comparison? If so, that's awesome! But if not, then you still have the cost of branching at a certain moment when comparison happens ?.

Not at all! Enums can essentially just be thought of as integers, so I keep track of the indices where each value of the enum's segment ends in an array. Cast the enum to an index into that array, insert the entity's data into that index of the table, and then increment all later indices by 1. Easy peasy, constant time, no comparisons. I do need to take care that order is maintained when removing entities or they change their enum value, but that just means rotating the vector (std::rotate for people who may be confused) and patching up the index values.

1 hour ago, zhangdoa said:

As @wintertime has already mentioned, template is a good idea to solve the polymorphism problem in compile-time. Basically, if you could combine composition over inheritance, function overloading and template specialization in an elegant way then you would get a similar or even better performance than vptr and vtable old school games, DOD is not always against OOD ?.

Yeah, its the elegant part that tends to be an issue. If I'm reading the two of you correctly (and I'm very likely not), are you perhaps suggesting using the curiously recurring template pattern to create tables with variadic templates, one template per component type? And providing accessors, registers and removers in similar fashion? I just want to be sure, as I'm a little rusty with template magic.

1 hour ago, zhangdoa said:

If you want to use std::vector efficiently, then manage heap memory allocation "by yourself". It's not about how to implement some complex scary stuff, just design some allocation tracker or wrapper classes at least (the default std::allocator implementation of MSVC 19.xx on my computer is just a wrapper around the global new()), and use some factory to instantiate your component vectors, then you could fully ensure that they would be put into coherent heap memory ranges.

Another choice is using a pre-allocated std::array if your component's maximum count is finite (but it's already doesn't matter whether it's an std::array or std::vector). Or you could implement an object pool with pre-allocated raw heap memory. Again, all options are possible, just you need to find the best one for your situation. Main memory is typically (almost actually) a DRAM, while cache memories inside the CPU chip are typically (almost actually) SRAM, just be aware that whether how you represent the array structure in your C++ code, the physics of the hardware would always have the same behavior, all the overburdens come from the abstraction.

Okay, but it's really not the actual form of the data I'm concerned with right now. The reason I was planning on changing the structure was to facilitate creating new tables, possibly without having to write any code. I don't care if the data is a series of separate vectors, a binary blob, a fixed-size array, etc. Hell, if I understood you two correctly, that could just be another templated thing as well. I just want to make adding new component tables easier, since the underlying functionality is mostly similar, without having to lose too many of the advantages of Data Oriented design.

2 hours ago, zhangdoa said:

Just my experiences told me that nothing in reality is really as simple and elegant as the example codes they are. When you move your head to the products, you always need to complex things up or compromise things down.

You're telling me. Thankfully I can rely on people who have gone before me to provide some useful direction. Thanks again, by the way.

On 9/12/2019 at 1:56 AM, DrDeath3191 said:

Because I could then theoretically create a file format that creates these tables based on my specifications. All data fundamentally can be represented as a contiguous array of bytes; all I need to know is the size of the datatype and how many are present in order to get those offsets to point to the correct memory. It would not need any sort of inheritance based structure; its an array of bytes, and an array of offsets.

  • You can "create tables" with well-designed templates (a "table of T" with assorted extra template parameters to represent metadata, behaviour variations, etc.) that are going to grow more sophisticated, and be upgraded for all components, as you develop your game.
  • In case of allergy, you can "create tables" with all sorts of code generation tools instead of C++ templates.
  • In any case, a std::vector<T> without any indexing and management is a very rudimentary data structure for the purpose of holding and accessing components, and handling multiple arrays "by hand", throwing away normal C++ facilities like allocators and names, is a complication for no benefit and not a simplification.

Omae Wa Mou Shindeiru

20 hours ago, LorenzoGatti said:
  • You can "create tables" with well-designed templates (a "table of T" with assorted extra template parameters to represent metadata, behaviour variations, etc.) that are going to grow more sophisticated, and be upgraded for all components, as you develop your game.

Yes, this was hinted at above. I think that will be the best strategy going forward, but I don't have much experience with template meta-programming. I will hack away at this and let you guys know how I'm progressing.

20 hours ago, LorenzoGatti said:

In case of allergy, you can "create tables" with all sorts of code generation tools instead of C++ templates.

Such as? Mostly for curiosity, since I think I'm on the template train with everyone else in the thread.

Alright, so after a bit of fiddling and an ill-advised attempt to implement my own SOA vector instead of just using a tuple of vectors like a rational person, I have a basic table implementation. It uses CRTP, so no virtual calls are made, and using templates I can create different collections of vectors, add and remove stuff, seems great. I'm going to handle the enum case later. Here's the source for anyone curious:


template<typename Derived, typename... ComponentArgs> class ComponentTable {

public:

	std::vector<Entity> entities;

	std::tuple<std::vector<ComponentArgs>...> componentData;

	size_t getEntityIndex(Entity e) {
		return derived().get_index_implementation(e);
	}

	void registerEntity(Entity e, ComponentArgs... args) {
		derived().register_implementation(e, args...);
	}

	void unregisterEntity(Entity e) {
		derived().unregister_implementation(e);
	}

	size_t getNumberOfComponents() {
		return numberOfComponents;
	}

protected:

	Derived& derived() {
		return *static_cast<Derived*>(this);
	}

	std::unordered_map<Entity, size_t> entityToIndexMap;

	size_t numberOfComponents = 0;

};

template<typename... ComponentArgs> class DefaultComponentTable : public ComponentTable<DefaultComponentTable<ComponentArgs...>, ComponentArgs...> {

public:

	size_t get_index_implementation(Entity e) {
		
		auto search = entityToIndexMap.find(e);

		if (search != entityToIndexMap.end()) {
			return search->second;
		}

		return numberOfComponents;

	}

	void register_implementation(Entity e, ComponentArgs... args) {

		size_t index = getEntityIndex(e);

		if (index >= entities.size()) {

			entities.push_back(e);
			//push_back into componentData
			std::apply([&](auto&... vecs) { (vecs.push_back(args), ...); }, componentData);

		} else {

			//set elements in componentData
			std::apply([&](auto&... vecs) { ((vecs[index] = args), ...); }, componentData);

		}

		if (index == numberOfComponents) {

			entityToIndexMap.insert(std::make_pair(e, numberOfComponents));
			numberOfComponents++;

		}


	}

	void unregister_implementation(Entity e) {

		size_t removedIndex = getEntityIndex(e);

		//Entity not present in this table
		if (removedIndex == numberOfComponents) {
			return;
		}

		size_t last = numberOfComponents - 1;
		Entity lastEntity = entities[last];

		std::swap(entities[removedIndex], entities[last]);

		//remove data from SOAVector
		std::apply([&](auto&... vecs) { (std::swap(vecs[removedIndex], vecs[last]), ...); }, componentData);

		entityToIndexMap[lastEntity] = removedIndex;
		entityToIndexMap.erase(e);
		numberOfComponents--;

	}

};

Now however comes a new logistical problem: how should I handle putting them into a World class or similar? As I am using CRTP, I can't just shove all of them into a vector, as they are all completely different classes. I could define some sort of wrapping interface to work around that, but it seems like a hack and would introduce pointers in order to access the tables which would seemingly defeat the purpose of this exercise. I'm not a fan of the World being a singleton, so I'm against using the registration trick usually done for Automatically Registering Factories which seem to depend on such a thing. Perhaps I could have the World class defined as a giant tuple of tables? Should I even worry about having a single repository for all my component tables?

TL;DR too much, but my 1cent to ECS, after trying out various layouts and code paths.

If you are designing your ECS for multi-core, always keep in mind, the most important thing: cache friendliness.
Albeit cores do have their own cache, if you pass around data from different sections of the component data, it will trash the cache like there is no tomorrow.
I have looked upon bitsquid's technique, with SOA, but unless you're a masochist, you don't want to continue coding that.
My last implementation of ECS is so:

  • the entities of the same archetype (same component set/setup) are kept in a simple flat memory buffer, one after another, together with the component data (which most of the time is simple types or in-line struct data). You keep some basic management of the buffer free holes when an entity/component is deleted. Entity sizes are the same in this buffer, due to having same entity archetype.
  • so there are N entity buffers for the N entity archetypes
  • when you iterate over the entities, you usually work on the component data which is already in the cache, you can even sort the entity chunks to be readily available closer when they are accessed together
  • there are no pointers, unless you want to access entities/components in other archetype buffers (you can keep offsets or some LUT)
  • usually you will process entities of same archetype at once, for example the camera entity archetype, which has transform and camera component, you will iterate over all cameras in the CameraUpdateSystem and process them.
  • this ECS is simple, straightforward, no template magic, easy to debug and extend. Avoid dynamic allocation at runtime as much as possible.
  • the buffers can be almost serialized as is (if no pointers to custom data are in the components) in one write to file, like a snapshot and deserialized in one go into a fresh empty buffer, ready to be simulated.
  • you can slice up these buffers and give them to the cores for processing the ranges, hence its pretty scalable
  • for multi-core to be feasible, you might need to do data duplication, and after the update step, gather and merge the updates to the components. So if you want speed and lock-free, you will need to duplicate the data that each core is working on and merge the dirty component data to the final original buffer data.
  • even for a single core update, this continuous entity data buffer technique is probably better, but it the end, are you sure this is your worst frame time eater ? :) Unless you want to make some massive tens of thousands units roam the visible frustum, you will probably be better off with a simple update() render() single thread loop.

There are more implementation details, but you can get an idea anyway. Just keep it simple.

This topic is closed to new replies.

Advertisement