Using a variable in multithreaded environment - volatile, mutex or what?

Started by
18 comments, last by maxest 5 years, 10 months ago

I have two threads writing to the same bool variable, as well as reading from it. I enclosed writing into that variable in a mutex lock/unlock. Everything works fine under Debug. Now I switch to Release and due to compiler optimizations my program doesn't work anymore. It quickly occured to me that volatile keyword would prevent any caching of the variable and indeed that solves the problem. However, I can't tell where or how but I have the recollection that volatile shouldn't be used to solve multithreading race conditions. So I thought I would just check what happens if I also enclosed reading from the variable in the same mutex lock. That indeed also solves the problem (without relying on volatile). However, it seems to me that this lock/unlock upon reading serves only to prevent compiler from doing unwanted optimisations. It doesn't change anything at runtime but costs me time for (redundant ?) lock/unlock on the variable reading. What would you recommend as a solution?

Advertisement

What are you trying to achieve? When you have just a single variable multiple readers and multiple writers is suspicious.

I'm curious...what if you set the debug version of multi threaded library switch in project properties release configuration. Does it still 'stop working'? Might narrow down what's going on here.

What does 'not working' mean in your context?

If you have a mutex around writing the variable, but not reading, the only change you made in synchronization is that no multiple writers can write to the variable. However, it is still possible for a reader to read the variable halfway through the write operation. If that variable is not an atomic type (e.g. a struct or other compound), your read operation may very well happen halfway that write.

Surrounding your read with a lock/unlock would solve that problem, since you now guarantee that no two threads may be in the parts surrounded by lock/unlock simultaneously.

Protect the member behind accessor functions.  Lock/Unlock on both get and set.  Sleep/Retry with a threshold if locked.

Volatile in threaded code is wrong (it hinders the optimiser, yes, but does not guarantee correctness). Memory ordering is the big issue. Typically you have a pattern where you do some work, then set a flag to indicate to another thread that the work is done. For this to be correct, the "work" needs to arrive in RAM before the "flag" does. Volatile in no way ensures that this condition is met (except by luck half the time). 

The simplest method is to just wrap all uses of the flag (read and write) in a lock (mutex, etc) as, assuming the lock is written by someone qualified, the lock will use the appropriate CPU instructions to ensure that memory access occurs in the appropriate order. 

If this is actually a performance problem (locks are typically really fast!!) the next lower step down is std::atomic, not volatile. But again, it can be a bit dangerous if you're not familiar with the underlying memory model. The next step down from that, your compiler will have intrinsic functions that can be used to emit the necessary CPU memory barrier instructions directly, and intrinsics to disable the specific compiler optimisations that could interfere with your ordering requirements. The next step down from that is writing assembly for a specific CPU architecture :)

If you see volatile in your code, you have a bug. 

10 hours ago, Hodgman said:

If you see volatile in your code, you have a bug. 

While I understand what you're aiming at, in the absolute way you state this, it's not right. There are circumstances where volatile is the right thing to use. It is unlikely for a game developer on modern hardware to ever be in such circumstances, but there are areas of software engineering, where it's the right thing to do (embedded/low level software, retro hardware games). Basically whenever dealing with memory that is not under the control of your program (hardware registers, memory mapped IO).

If you see a volatile in high level code for modern hardware, you most likely have a bug ?

volatile is must-have keyword when working without std::atomic but hardware atomic functions like InterlockedExchange on MSVC or __sync_lock_test_and_set in GCC/clang so this depends.

When there is need for reading/ writing protection and the operation is just setting a variable, I use an atomic SpinLock or ReadWriteSpinLock if performance really matters. In case of atomic types like unsigned int you could also consider using atomic operations directly instead of locking.

If the operation is considered to longer runs then using Mutex is absolutely ok but keep in mind that Mutexes in Windows are protected from OS to prevent self locking and you need to use Semaphore instead. I use this for example in my Task System to let idle threads go to sleep by their own.

You should always ensure to use the same lock for read and write or the effect will be pointless ;)

1 hour ago, Shaarigan said:

volatile is must-have keyword when working without std::atomic but hardware atomic functions like InterlockedExchange on MSVC or __sync_lock_test_and_set in GCC/clang so this depends.

Yeah these are the intrinsics I was referencing :) on some platforms they want you to use pointers to volatile, but that's out of your control. On other platforms use non-volatile pointers. This is a level below std::atomic though - if you're writing code at this level, you're targeting a particular compiler and probably a particular CPU architectures. You have to aware of the memory model for that architecture and you're likely aware of the ASM that you're using the intrinsics to generate. At this level, you should basically just implement something like std::atomic and then use that. If you ever see it in gamedev outside of somewhere where an API has forced it upon you, be very suspicious. 

2 hours ago, rnlf_in_space said:

While I understand what you're aiming at, in the absolute way you state this, it's not right. There are circumstances where volatile is the right thing to use. It is unlikely for a game developer on modern hardware to ever be in such circumstances, but there are areas of software engineering, where it's the right thing to do (embedded/low level software, retro hardware games). Basically whenever dealing with memory that is not under the control of your program (hardware registers, memory mapped IO).

Yeah talking to hardware is probably the only exception. My hyperbole ("volatile is a bug") is a famous quote from the Linux kernel memo "volatile considered harmful", where they ban it from use in the kernel - even something that low level doesn't need it. Hardware device drivers are beyond the scope of the kernel though, and likely will have valid uses. 

However, if you're talking to hardware that can also access RAM and you communicate via MMIO and shared memory, you still need to be aware of the underlying memory model and also emit the appropriate fence instructions to ensure that your memory visibility and IO writes occur in the intended order. 

40 minutes ago, Hodgman said:

on some platforms they want you to use pointers to volatile, but that's out of your control. On other platforms use non-volatile pointers. This is a level below std::atomic though - if you're writing code at this level, you're targeting a particular compiler and probably a particular CPU architectures. You have to aware of the memory model for that architecture and you're likely aware of the ASM that you're using the intrinsics to generate. At this level, you should basically just implement something like std::atomic and then use that. If you ever see it in gamedev outside of somewhere where an API has forced it upon you, be very suspicious.

I use that behind the scenes as platform and architecture independent std::atomic replacement for something like spin locking in our professional engine


#if defined(__GNUC__)
#define SpinLock(__lock) { while (sync_lock_test_and_set(&(__lock), 1)) while (lock) {} }
#define SpinUnlock(__lock) { __sync_lock_release(&(__lock)); }
#elif defined(WINDOWS)
#define SpinLock(__lock) { while (InterlockedExchange(&(__lock), 1)) while (__lock) {} }
#define SpinUnlock(__lock) { InterlockedExchange(&(__lock), 0); }
#endif

But anybody should stay aware of exotic platforms that implement their own interlocked functions like PSSDK or Switch SDK does

This topic is closed to new replies.

Advertisement