Stuff

posted in /* Why you crying? */

Published April 20, 2014

My schedule lately is one that can be summed up by two simple words: crushing exhaustion. A change of career (unfortunate, but necessary) has initiated a period of long shifts and a somewhat higher level of physical labor than I have been accustomed to lately, so that the attendant emotional and physical exhaustion while my body re-adapts to a less sedentary lifestyle has sapped me of any will to work on anything even remotely complicated. Thus, GC has suffered, and suffered greatly.

I have been talking a bit with @evolutional lately, though, as he's been porting ANL to C#. I haven't really dipped into the ANL much lately. It's just a thing that I use, and I've learned to work around any issues that occur. But talking with Oli has kind of gotten me back to thinking about it, and especially about the aspects of it that bug the shit out of me.

ANL was initially deeply inspired by libnoise. I followed the model of chained objects that libnoise uses to construct complex noise expressions from simple building blocks. The idea of it is still sound, I think, but the execution is a little bit flawed. I'm not even remotely any kind of optimization master, not by a damned sight, but even I understand that arbitrarily-allocated objects linked together by pointers and invoked by virtual functions is probably not the most cache friendly and performant scheme you could think of. Not to mention that complex noise expressions can often be optimized by smartly caching intermediate values, but implementing a cache as a module inserted into the chain (as ANL currently does), or by building cache functionality directly into the modules themselves (as ANL has also done, and might still be doing, I don't remember) is not the best way to do it if you want to provide for the possibility of multi-threaded execution. In order to provide for multi-threading, the caching needs to be performed external to the modules and local to the thread so that threads don't step on each other's toes.

Some time back, I had started thinking about an alternative scheme. Instead of building a noise description as a pointer-linked network of objects, I instead visualized it as a basic vector of instructions. The initial idea was to condense the noise expression down to a linear vector of in-order operations that could be fed to a vm to generate the final value, but such a scheme would result in lots of redundant calculations, since it wouldn't take advantage of the early-out provided by, ie, the Select module which selects between one or another source depending on the value of a control source.

So lately, given the fact that I can't really make myself concentrate on anything too technically challenging, I started on an experimental branch of the ANL that encapsulates a noise expression as a vector of opcode or instructions as I had visualized. Rather than evaluating the vector sequentially, however, I use vector-index cross-referencing (similar to how the objects in the existing model would cross-reference each other via pointer) to evaluate source branches to a module before evaluating the module itself, taking advantage of Select's early-out branching if available.

A noise expression is a simple vector of opcodes, with optional data in the form of constant data (float or rgba) and source indexes referencing other opcodes in the vector as sources. The opcode vector is passed to an instance of a sort of virtual machine for evaluation. The VM instance maintains an internal cache vector for locally caching the values of opcodes as they are evaluated.

The switch has necessitated a change to how noise expressions are constructed. This has actually simplified the interface quite significantly. Rather than a whole slew of separate classes for each function type, I now have a single factory/builder class that provides an interface with methods for constructing opcodes or opcode chains--somewhat similar to the existing TreeBuilder interface. By calling methods on this factory class, opcodes are appended to an internal vector. Once the expression is built, then the final vector of opcodes can be passed to a VM for evaluation.

Additionally, I have eliminated the interface of get(x,y), get(x,y,z), get(x,y,z,w), get(x,y,z,w,u,v) variants depending on the dimensionality. In lieu of method overloads, instead a noise expression is now evaluated using a special Coordinate class that specifies the dimensionality of the coordinate. This simplifies the interface, and precludes having to provide different VMs for each coordinate dimensionality.

I have not finished fully implementing all of the functions in the base library yet, but I have done enough to sort of test the concept out, and it feels right to me. Here is a simple example (C++):

int main(){    CTestFactory factory;    anl::TArray2D img(256,256);    unsigned int zero=factory.constant(0);    unsigned int one=factory.constant(1);    unsigned int negone=factory.constant(-1);    unsigned int freq=factory.constant(2);    unsigned int basis=factory.cellularBasis(negone,one,zero,zero,zero,zero,zero,zero,0,123845);    unsigned int cell=factory.scaleX(factory.scaleY(basis,freq),freq);    unsigned int fbm1=factory.simplefBm(anl::OP_GradientBasis, 3, 6, 2, 127345);    unsigned int fbm2=factory.simplefBm(anl::OP_ValueBasis, 3, 6, 3, 546321);    unsigned int p=factory.translateX(cell, fbm1);    factory.translateZ(p, fbm2);        anl::CTestNoiseVM vm(factory.getKernel());    // Map the kernel function to an image and save PNG    for(int x=0; xTo start with, a factory is created. The factory encapsulates a std::vector noise expression kernel, and provides some methods for adding opcodes to the kernel. An image is created for visualizing the output. Then various factory methods are called to build the expression vector, in this case setting up some constants, creating a cellular noise module and a couple of simple fBm fractals, then using the fractals to perturb the cellular function in X and Z. After the kernel is built, a VM is instantiated using the instruction string, then the image is iterated and each pixel colored based on the output of the VM at the specified coordinate.

Doing it this way seems to me like it will better facilitate cache coherency. All of the expression is encapsulated in a sequential vector of instructions that should easily fit entirely in the cache, rather than as widely-spread disparate object instances scattered throughout the heap and linked by pointers. All caching is done locally to the VM, so you could instance any number of VMs in separate threads from the same expression vector. (Ostensibly, anyway; I am not really very skilled or knowledgeable about multi-threading, so it's entirely possible I'm still making mistakes here.)

In the process of this experiment, I am also re-evaluating how certain tasks are accomplished. One area I have simplified things is the Gradient function. A gradient function lets you specify a line segment as two N-dimensional points, and it maps any input coordinate to somewhere on the gradient described by the line passing through both points. Although it's a simple linear equation, the Gradient function has been the single largest source of PM, Facebook and email questions since the ANL became publicly available, especially since my articles on Minecraft- and Terraria-type world generation which use the gradient function as the basis for ground/air delineation.

In the new branch, rather than providing a Gradient function and forcing the user to specify endpoints, instead I provide the much simpler methods named x(), y(), etc... in the factory class, mapping to the opcodes OP_X, OP_Y, etc... These opcodes quite simply operate by returning the corresponding component of the input coordinate. ie, if the VM is called to evaluate the point (0.5, 0.23, 0.76), then the opcode OP_X will return the value of 0.5, or the x coordinate. This provides an easy way to obtain a gradient aligned with one of the 6 axes. If the gradient is required to lie along a vector other than an axis, then it can be rotated using a rotation opcode (only 2D and 3D rotations are planned, as with vanilla ANL, since 4D and 6D rotation is a bitch, and I honestly see no use cases for them.) This hopefully simplifies the gradient functionality enough that I won't be constantly answering questions about it.

In a similar fashion, I have broken out the ScaleDomain and TranslateDomain functions into their component pieces (while still keeping the basic forms as well for now, though they might be going away to remove API clutter). That is, rather than having to specify all of your domain transformations similar to scaleDomain(src, xscale, yscale, zscale, wscale, uscale, vscale), which is horribly redundant if you only want to use, say, xscale and uscale, now you can specify them component-wise: scaleX(src, scale); scaleU(src, uscale);.

There is still a lot to do and test on this, and I'm not sure how long it's going to take given my work schedule. For now, though, I'm going to take this kids over to grandma's for Easter dinner. Happy Easter, everyone.

Previous Entry Demo and Explanation of Third Person Camera

Next Entry Experimental virtual-machine-based branch of ANL

7 likes 5 comments

Comments

riuthamus

Screenshots! I always love your screenshots! :P Nice post otherwise! ( but would be better with screenshots! )

April 21, 2014 04:45 AM

studentTeacher

This was a very interesting read! I like the way your previous posts on anl and your library (I'm familiar enough with both to know how your library works) is still in this type of setup, but it's a much cleaner look both from the outside and (what I can gather) from the inside. I look forward to see and hear more!

April 21, 2014 05:35 AM

Servant of the Lord

Very interesting!

Question: Is vm.evaluate(coord); actually executing the operations, or is anl::CTestNoiseVM's constructor doing that?

Are you generating a coord's/position's value upon request, or are you generating the data and vm.evaluate(coord) is just retrieving the (possibly interpolated) data at that position?

I ask this because I don't see where you are giving either the factory or the VM the size of the array you want to generate - and you're retrieving data using 0.0 to 1.0 coordinates... so how does the VM know what level of detail/resolution to generate?

I'm ignorant of procedural generation (but love reading your posts about it) - are algorithms like perlin noise generated per-pixel with each pixel generated in isolation from the rest of the image?

Can you generate, say, the value at (0.5,0.5), without generating the entire image?

April 21, 2014 05:37 AM

studentTeacher

Very interesting!

Question: Is vm.evaluate(coord); actually executing the operations, or is anl::CTestNoiseVM's constructor doing that?

Are you generating a coord's/position's value upon request, or are you generating the data and vm.evaluate(coord) is just retrieving the (possibly interpolated) data at that position?

I ask this because I don't see where you are giving either the factory or the VM the size of the array you want to generate - and you're retrieving data using 0.0 to 1.0 coordinates... so how does the VM know what level of detail/resolution to generate?

I'm ignorant of procedural generation (but love reading your posts about it) - are algorithms like perlin noise generated per-pixel with each pixel generated in isolation from the rest of the image?

Can you generate, say, the value at (0.5,0.5), without generating the entire image?

Noise is per-point -- there's no need to generate a whole array or anything like that, you just evaluate at each coord. The op-codes look like they are strung together as they are called by the factory, creating the vm using factory.getKernel() just packages the op-codes into an executable format (or something like that), and calling vm.evaluate(coord) just evaluates the op-code at the given coord and generates the value. Correct me if I'm wrong :)

April 21, 2014 03:24 PM

JTippetts

Very interesting!

Question: Is vm.evaluate(coord); actually executing the operations, or is anl::CTestNoiseVM's constructor doing that?

evaluate() actually executes the operations. The constructor merely stashes a handle to the kernel internally, and sizes its internal cache vector to match the length of the kernel.

Are you generating a coord's/position's value upon request, or are you generating the data and vm.evaluate(coord) is just retrieving the (possibly interpolated) data at that position?

I ask this because I don't see where you are giving either the factory or the VM the size of the array you want to generate - and you're retrieving data using 0.0 to 1.0 coordinates... so how does the VM know what level of detail/resolution to generate?

I'm ignorant of procedural generation (but love reading your posts about it) - are algorithms like perlin noise generated per-pixel with each pixel generated in isolation from the rest of the image?

Can you generate, say, the value at (0.5,0.5), without generating the entire image?

I generate the value on request. There is no need for the VM to know how large the final array is going to be, or even if the values are going to be stashed in an array. With Perlin noise, you evaluate each point in isolation just as you asked. You can, indeed, evaluate the value for (0.5, 0.5), or any other value, without the entire image.

The way Perlin noise (and cousins) work is that values are generated on a grid or lattice, and interpolated through the spaces between grid points. When layering noise into a fractal, each successive layer has (typically) twice the frequency and half the amplitude as the previous layer, so that it generates smaller and finer details that are added in. But the first layer always tends to dictate the scale of the major features, since it is the layer with the greatest amplitude. Thus, the greatest features will lie upon the integral boundaries of the domain; ie, you'll get a peak/valley at anywhere that x and y are integers. Roughly speaking, anyway.

This leads to a common problem you see when people just start out with Perlin noise. They'll take an image, say 256x256, and iterate it, calculating a Perlin value for each pixel coordinate, and the result ends up being strikingly similar to white noise since the pixel coordinates are all integral and thus catch all of the major features in the domain. Dividing the coordinate by some value is usually optimal in that situation.

Your questions have revealed a weakness of this experimental scheme, however. When filling an array, there is a maximum number of levels, or octaves, in a fractal that can possibly make a difference in the detail of the final array. Higher octaves counts not only waste processing time, but they can actually result in aliasing artifacts due to violating the requirements of the Nyquist-Shannon sampling theorem.

Unfortunately, this can cause a problem. If you use the 'simple' standard variants of Perlin noise (ie, layering multiple layers of the same basis function using a hard-coded amplitude calculation for each layer), and if I provide an opcode for these variants, it's no biggie; I could control the number of octaves 'remotely', ie from the outer loop, and get the correct number of octaves for a given sampling interval and array size. However, a large part of why I am doing what I am doing is to allow a user to build fractal systems from more complicated basis functions using customized amplitude calculation to create different effects. In the experimental approach I am trying here, that means essentially unrolling the fractal loops into blocks of discrete operations in the opcode vector, and that means that I can't control the octave count from outside, but rather only at startup when the vector is created.

This means that as currently stands, if you want 'complex' fractals that do not exhibit aliasing due to sampling, you have to build the instruction kernel while keeping in mind the sampling frequency it will be sampled at. Ugh. I might need to think about this some more.

April 21, 2014 03:31 PM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

JTippetts

Author

🎉 Celebrating 25 Years of GameDev.net! 🎉

Stuff

Comments

JTippetts

Latest Entries

Mechanics of Flow in Gold Rush.

Gold Rush

DungeonBot3000: Inventory and Items Display

DungeonBot3000: Postmortem

DungeonBot3000: Boss Fight Scripting

DungeonBot3000: Friggin Laser Beams

DungeonBot3000: Video

DungeonBot3000: Stats, Combat, and Items

DungeonBot3000

Diablo:Immortal

🎉 Celebrating 25 Years of GameDev.net! 🎉

Stuff

Comments

JTippetts

Latest Entries

Mechanics of Flow in Gold Rush.

Gold Rush

DungeonBot3000: Inventory and Items Display

DungeonBot3000: Postmortem

DungeonBot3000: Boss Fight Scripting

DungeonBot3000: Friggin Laser Beams

DungeonBot3000: Video

DungeonBot3000: Stats, Combat, and Items

DungeonBot3000

Diablo:Immortal

Reticulating splines