🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to General and Gameplay Programming

Are these ancient optimizations still relevant?

General and Gameplay Programming Programming

Started by Wallace121215@gmail.com December 28, 2020 10:02 PM

3 comments, last by ddlox 3 years, 6 months ago

Wallace121215@gmail.com

Author

December 28, 2020 10:02 PM

I'm in the process of writing a software renderer raycaster. I've actually done this before a very long time ago on a 386. Back in the day, in order to get a good frame rate, I had to incorporate several programming tricks, and I'm wondering if they are still relevant. Even if they are, do modern compilers simply do it for me, and what about inline assembly, does that also get reordered?

Alternating 32 and 16 bit instructions. The 386 introduced 32 bit register, but maintained backwards compatibility by having the lower half act as 16 bit registers. For multi clock cycle instructions, the CPU could overlap them, so you'd write code to go something like ADD EAX EBX, SUB CX DX…
Alternating int and float math. Most machines at the time did floating point math in a coprocessor, which ran in parallel with the main CPU. By alternating 2 int, 1 float, 2 int, 1 float, you'd get essentially parallel processing.
Jumping on less likely branch. The compiler I used would turn if(…)then(x)else(y) into cmp,jne,x… So you'd always make the most likely code to run away for the else

With the improvements of branch prediction and pipelining in the last several generations of chips, are any of these still relevant?

swiftcoder

18,997

December 29, 2020 07:38 AM

Wallace121215@gmail.com said:
are any of these still relevant?

None of those are specifically still relevant, but there are other considerations.

The fastest path on your CPU will be wide SIMD instructions, and compilers often aren't great at auto-vectorisation, so you may want to be hand-tuning the core rendering loop with SIMD intrinsics.
Your GPU is typically much faster than your CPU at this sort of thing. You can write a software raycaster as a GPU Shader, see Sven Forstmann's CUDA implementation.
Modern CPUs are damn fast, so you just might not need to worry about optimisations all that much. See Sebastian Macke's software raycaster running in the browser via javasript.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

clb

2,152

December 29, 2020 08:15 AM

Those specific optimizations are no longer relevant, but the hardware effects and potential optimizations “in the same spirit” are still there:

and 2. The CPUs are now much more pipelined and out-of-order than they were back then. Agner Fog maintains excellent hardware documents that describe the pipelining architectures from different generations. See https://www.agner.org/optimize/microarchitecture.pdf . E.g. page 217 states that Zen generation has 4 integer and 4 floating point pipes. However, whether your code runs into a pipelining bottleneck is not at all as obvious and not at all as likely to happen as it used to back in the Pentium U and V pipe days. Rather, data cache line access patterns, maximizing ALU throughput via SIMD, and dynamic branch prediction failures are the main effects you'll be likely to see in a relatively tight code.
The need for that kind of scheming was common when CPUs only had “static branch prediction”, they might e.g. assume that jumps backward are always taken, and jumps forward are never taken, and statically predict the outcome from program structure. Modern CPUs employ dynamic branch predictors, i.e. they maintain an internal mapping table “code address” → “% times taken” history, and predict according to the majority vote of the history. This means that if you have a branch that most of the time takes one branch, it will be very cheap. However, if you have a branch that goes 50%-50% either way, those branches will be very costly, and it is useful to avoid them in hot loops whenever you can. This kind of optimization of course implies that your code is already tight (and hot) enough that the effects are observable. Intel's VTune and AMD's uProf can both record branch misprediction %s, that highlight the branches that were slow to predict.

ddlox

309

December 29, 2020 11:45 AM

Wallace121215@gmail.com said:
…does that also get reordered?

short answer is yes;

u didn't mention what language u will be coding in… but seeing your short ASM strip, i assume C/C++;

in which case grab the latest book on c/c++ and code away ?

if u hit some slow code section, then profile that section and improve it (if u can, u can even improve it with simd instructions);

anyway, it's a long haul, so be patient with yourself;

all the best ?

🎉 Celebrating 25 Years of GameDev.net! 🎉

Are these ancient optimizations still relevant?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Are these ancient optimizations still relevant?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines