lordnacho 3 months ago

This thing is great. The deeper lesson seems to be not to try any of these things in any kind of complicated program, you will fuck it up.

Imagine writing something with dozens of threads, each running different programs with various threading primitives sprinkled around. You would never be able to debug it, because just conjuring each case of "if this thread gets to here and that thread gets to there" in a test would be impossible.

If you're going to use these, keep it simple.

What I really prefer is not to share the state at all, just have something like a ring buffer between threads that pass messages to each other. Each thread just sits there and takes things from its input buffers and puts things onto the message queues of other threads. The ring buffer itself uses atomics, or is well tested to ensure none of these weird locking scenarios happen. This way all your thinking is just single threaded, and if there's a sync issue you know where to find it.

  • mrkeen 3 months ago

    In the mainstream languages, synchronisation seems to be a means (locking code) to an end (locking data).

    The programmer usually does not care that only one thread can enter a method at a time - that's just the 'how'. The 'why' is that the programmer wants to reason about reads and writes in a multithreaded environment as easily as one would do in a single-threaded environment.

    Compare lock management to memory management. Method synchronisation is analogous to using stack variables - you don't have to do the work manually, but you can't work across method boundaries.

    Manually locking/unlocking a resource is analogous to malloc and free. It's prone to error, unless you adopt draconian rules preventing flexibility.

    What's the GC of lock management?

    • zbentley 3 months ago

      Channel-based programming, perhaps? From Go style to Erlang shared-nothing, those approaches offer some of the properties you’re looking for.

      Or maybe the holy grail hasn’t been created yet in this category: a compiler that can transform arbitrary computations or expressions of computational goals into their maximally parallel form, where locks etc. are compiler output artifacts a la assembly instructions rather than things programmers regularly interact with. Some academic languages and frameworks have made strides in this direction, but I don’t know of any that have caught on.

      • tomcam 3 months ago

        > Or maybe the holy grail hasn’t been created yet in this category: a compiler that can transform arbitrary computations or expressions of computational goals into their maximally parallel form…

        I am not an expert at concurrency, so forgive my ignorance. If such a compiler existed, wouldn’t its purpose be defeated by external code? As in, someone provides a library whose concurrency properties are unknown.

    • neonsunset 3 months ago

      In the context of C#, most code does not actively share mutable state even if it's vastly concurrent and/or parallel via Tasks or explicit threading.

      In case the state is still shared, most common scenarios are usually addressed by applying concurrent data structures (ConcurrentDictionary/Stack/Queue) or using a barrier/semaphoreslim.

      In case where more complex mutable state must be shared in a non-thread-safe way, it is nowadays easily addressed by using Channel<T> where a single reader owns the instance of a particular class and handles concurrently submitted requests in a serialized way.

      Otherwise, I agree that correctly writing non-blocking code or code with granular locking strategy is extremely non-trivial, and the state space that must be handled for even most trivial cases is enormous and makes my head hurt.

      Some scenarios can be addressed by https://github.com/microsoft/coyote which simplifies the task, but it is still challenging.

      Other than above, there exists an F# implementation of Concurrent ML that solves the problem in a CSP-style fashion, similar to Channel<T> above: https://github.com/Hopac/Hopac/blob/master/Docs/Programming....

  • jiggawatts 3 months ago

    I don't have many iron rules, but I have one that I refuse to bend on: You've either proved your bespoke multi-threaded code mathematically, or it is guaranteed to be Wrong with a capital W.

    If you've proved it correct, you now have just a 50% chance of it being merely wrong when deployed on actual hardware.

throwup238 3 months ago

My main takeaway from this excellent guide is that I suck at multithreaded reasoning and should stick to single threaded languages.

There go my dreams of being a C-slinging kernel ninja master.

  • YZF 3 months ago

    I'll give you one tip that covers a large portion of deadlocks I've seen in my career.

    A lock should be used to lock a data structure, that's it. Corollary: Never hold a lock while calling a function.

    Most common deadlocking scenario is people taking a lock, calling a function, that they don't know takes a different lock. If you can't get by just taking a lock, touching some data structure (lightly), and releasing the lock then you need to look at your data structures.

    • jstimpfle 3 months ago

      That sounds good in principle but is it practical? As long as you have something to do while holding the lock, chances are that implementing that something requires calling a function. That or code duplication.

      • YZF 3 months ago

        In my experience it is practical. Let's say you have a shared linked list, you take the lock in your insert function, you insert an item, you give the lock back. No function calls.

        Code that looks like what you describe, "implementing something that requires calling a function", tends to deadlock or be wrong. A really smart guy I worked with wrote some database driver that looked like that, it worked, except when it deadlocked, and finding that deadlock was a nightmare. I'm sure there are exceptions but this rule will get you out of a lot of trouble. If you need to violate the rule try find a different synchronization/concurrency mechanism or a different data structure.

        Even if the code is initially correct, inevitably someone will refactor it without realizing a lock is taken and break it.

        • jstimpfle 3 months ago

          > you take the lock in your insert function, you insert an item, you give the lock back

          yeah but what if "you insert an item" is literally hundreds of lines, and there are 3 layers of api functions below you? What if you need to to take other locks for example to apply backpressure / flush out data on the layers below?

          > Code that looks like what you describe, "implementing something that requires calling a function", tends to deadlock or be wrong

          It happens. What you do is you work hard until it's fixed.

          I've digged into the filesystem layer of Linux for a while. Going through all the locking rules and making sure your fs is in line with them, that's not a lot of fun. Maybe you should tell the Linux filesystem people how to do it instead?

          https://docs.kernel.org/filesystems/locking.html

          > Even if the code is initially correct, inevitably someone will refactor it without realizing a lock is taken and break it.

          Yup, and if there's a practical means to improve the situation with static type systems that is a net benefit, I'm all for it.

          • YZF 3 months ago

            > yeah but what if "you insert an item" is literally hundreds of lines, and there are 3 layers of api functions below you? What if you need to to take other locks for example to apply backpressure / flush out data on the layers below?

            Well- that's what software engineering is about. If insert an item to a shared data structure is hundreds of lines of code I'd say there's something very wrong. You shouldn't need to take another lock to create backpressure, e.g. look at Go's concurrency model.

            I think it's a bad pattern as a rule. There are always situations where you break rules. My tip was for most of the situations where you don't do that and most of the people that shouldn't do that. If you know what you're doing, you understand concurrency very well and synchronization very well, then you probably don't need this tip. You can be a very smart and experienced developer and easily create stuff with rare deadlocks that's almost impossible to debug if you're not careful. I've fixed these sorts of issues in multiple code bases.

            I've never worked on the Linux filesystem so I'm not going to tell them what to do. We'll have to assume the people working on that know what they're doing, otherwise it'd be a bit scary. Given that we don't see the Linux filesystem deadlocking - probably ok.

            EDIT: I've given this rule to many junior/intermediate engineers and I've used it myself so I would say it is applicable to almost any situations where you need to use locking. It results in code is thread safe and simply can't deadlock. This other deadlocking code base I worked on would have been much cleaner if this rule was applied, and it could have been applied, and then it wouldn't deadlock once a year at one random customer site and take their system down. Again, like anything software, sometimes you do things differently in different situations, but maybe the generalization of the rule is you don't just sprinkle locks willy-nilly all over the place, you need to somehow rationalize/codify how the locks and structures work together in a way that guarantees no corner cases will lead to issues. And sure at the "expert" level there are many more patterns for certain situations.

  • Animats 3 months ago

    It's not that bad. It's helpful to understand what's happening down at the instruction, memory access, and fence level, but most code just uses some kind of scoped mutex. You don't often have to think all this through.

    The big win in Rust is that the mutexes are tied to the data they protect. The compiler won't let you access data until it's locked. You can still deadlock, though.

    • loeg 3 months ago

      > The big win in Rust is that the mutexes are tied to the data they protect. The compiler won't let you access data until it's locked.

      You can get this kind of behavior in C++ with, e.g., folly::Synchronized<T>.

      • 4gotunameagain 3 months ago

        I will play the devil's advocate and say: You can get any behaviour in C++, but there are so many options almost nobody is using them, or using them the right way.

        And this is coming from someone that has written a lot of C++ and no Rust at all..

      • tialaramex 3 months ago

        Rust's borrow checker will of course check that if you borrowed the protected item you've given it back before unlocking, whereas all Folly and similar C++ libraries can do here is caution you that this footgun exists and is loaded and pointed at your foot so please don't.

        • jstimpfle 3 months ago

          That's just the usual resource ownership management problem that Rust is supposed to solve.

          But a simple templated type like GP proposed does indeed fix the issue discussed here. To access the thing in the first place you need to lock the correct mutex. Looking at Folly::Synchronized, locking doesn't even return the protected item itself directly. In most cases -- unless the bare pointer is needed -- you will access the item through the returned "locked view" object which does the unlocking in its destructor.

          • tialaramex 3 months ago

            Sure, Rust "just" enforces type safety. But without type safety a type can't help us much more than the textual advice did so I think that's a really big difference, especially at scale.

            In a small problem the textual advice is enough, I've written gnarly C with locks that "belong" to an object and so you need to make sure you take the right lock before calling certain functions which touch the object. The textual advice (locks are associated with objects) was good enough for that code to be correct -- which is good because C has no idea how to reflect this nicely in the language itself nor does it have adequate type safety enforcement.

            But in a large problem enforcement makes all the difference. I had maybe two kinds of lock, a total of a dozen functions which need locking, it was all in my head as the sole programmer on that part of the system. But if we'd scaled up to a handful of people working on that code, ten kinds of lock, a hundred functions needing locking I'd be astonished if it didn't begin to have hard to debug issues or run into scaling challenges as everybody tries to "keep it simple" when that's no longer enough.

          • loeg 3 months ago

            GP isn't totally wrong. With folly::Synchronized, you can lock, take a reference, then unlock, and continue to (incorrectly/unsafely) use the reference. The compiler doesn't catch that.

              folly::Synchronized<int> lockedObj;
              auto lockHandle = lockedObj.wlock();
              auto& myReference = *lockHandle;
              lockHandle.unlock();
              myReference = 5; // bad
            
            Still, it is harder to misuse than bare locks unattached to data.
            • jstimpfle 3 months ago

              Yes, but you can also take a reference (copy a pointer), delete, and continue to use the reference etc. I was pointing out that this is simply a lifetime/ownership issue, not an issue specific to locking (and yes, Rust solves that issue, at least for the easy cases). And as far as the problem is protecting access to a locked resource, a class like folly::Synchronized does indeed solve the problem.

        • loeg 3 months ago

          Yes.

  • jcranmer 3 months ago

    There's essentially three levels of reasoning about multithreaded code.

    The lowest level is understanding things on the hardware memory model level--knowing how atomic operations work, and being able to build primitives based on these atomics. And quite frankly, there be real dragons here (weak memory models are mind-bending), but--like assembly programming--it's not really necessary to resort to this level unless you really need to care about the performance, and the first rule of performance engineering applies [1].

    The second level is the level of the standard multithreaded primitives: mutexes, condition variables, semaphores, barriers, etc. This level is important to learn because it's generally the level of principles of how things work that matters most. But it's kind of like a BASIC programming language: it's certainly possible to write in it, but at scale, you get fatigue trying to keep all of the details straight, and it's rapidly apparent that there are lots of common patterns that could be handled more simply.

    The highest level of understanding is recognizing that there are a few basic common patterns of multithreading, and if you can just resort to appropriate libraries for those patterns, you generally don't have to worry about the details at all. Chief among these is embarrassingly-parallel code, stuff that generally works with a fork-join model, or generally any sort of pattern that's "for each element in a large set", where the work to be done involves no shared mutable state.

    Speaking of which, the most important thing to take away from any multithreaded programming is this: shared mutable state is inherently problematic. If you can avoid having any shared mutable state, that is ideal. The next-best option is to move all of the shared mutable state into a library that someone more competent than you has written (e.g., a database).

    [1] Don't optimize until you've measured to be sure you know what you need to optimize.

  • rramadass 3 months ago

    You might find Curt Schimmel's UNIX Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers useful.

  • g15jv2dp 3 months ago

    The problem with almost every example, as far as I can tell, stems from variables having multiple writers without being properly locked.

  • Emjayen 3 months ago

    If it's any consolation: the vast, vast majority programmers should stick to single-threaded design, as is evident from all the atrocious multithreaded software out there of which 99% of the time would be faster without multiple threads (and surely far less complex)

    Unless you know what you're doing, just pick the real low-hanging fruit, like throwing some threads at file block decompression.

    • tialaramex 3 months ago

      It so happens, even though I don't like C++ that I entirely agree with the rationale behind the C++ 26 introduction of structured concurrency to the language.

      About sixty years ago structured control flow was rare or non-existent in most languages. Want to do X fifteen times? Well, try setting a counter N to zero, do X, increment the counter N, and if N is less than fifteen jump back to the start of your "do-X" routine. Simple.

      Oh wait, sorry, somebody else who wanted to do X is also jumping to the start of that routine, and they didn't set N so now everything is on fire.

      Today your programming language has structured control flow. You write a loop, the loop does X fifteen times, it's not possible for some unrelated code to just wander into part of your loop unsuspecting and cause mayhem since it wasn't prepared to count up to fifteen. New keywords appeared like "for" and "while" and "return". It's not that somehow a new tool magically fixed the problem, but instead programmers learned to write software in a more structured way using the better tools.

      Structured concurrency is the same idea again, but for concurrency.

      It took some time between Dijkstra writing a letter and languages delivering the nice for-each loop we're familiar with today in many languages, and so likewise what we have today as the state of the art for structured concurrency is likely nowhere close to what will be developed over coming decades, but it's already nicer to work with that the chaos of unstructured concurrent programming.

Animats 3 months ago

Oh, that's just precious.

It stops before the problems of lock congestion, infinite overtaking, and priority inversion, though.

  • gpuhacker 3 months ago

    It also assumes all reads and writes are volatile. In the real world, threads can witness out-of-order execution in different threads.

voidUpdate 3 months ago

I was somewhat hoping to find some details about pin and tumbler, disk detainer, tubular, wafer etc :P

wing-_-nuts 3 months ago

Piggy backing on this post to ask, does anyone know of good multithreaded testing libraries akin to what thread weaver used to be before it got abandoned?

https://github.com/google/thread-weaver

Happy to discuss frameworks for other languages too as I love using it as a tool to explore different possible ways that concurrency can go wrong even if you might never encounter them in standard testing.

cjfd 3 months ago

Hmmmm.... I think the general lesson here is to use concurrency classes the way they are intended as opposed to the way they are not intended. I mean, more or less every code example here is clearly wrong and completely counter to how these classes should be used. Also, don't make things complicated. If your use of concurrency is complicated the chances of being incorrect are about 1000%.

gs17 3 months ago

It would be pretty cool to see the reverse of this too, where you get handed code and the evil scheduler is some AI actively trying to break your code, so you need to fix it (with limited tools, likely something block-based). Obviously that's a lot harder to develop, but it could be a very interesting challenge.

xianshou 3 months ago

What this really explains is why you should use Python. I use ThreadPoolExecutor + as_completed on every I/O-bound or async workflow reflexively and have literally never encountered a deadlock. (With this method, I mean. I have encountered, and caused, many deadlocks in less pristine environments.)

speed_spread 3 months ago

It's missing my favourite optimistic lock, the Delusional Lock.

  • lock_enthusiast 3 months ago

    I'm assuming it's the "I'm sure it'll be fine to read/write this unsynchronized, what's the worst that could happen?" locking strategy.

    • SeriousM 3 months ago

      That just sounds like the "read uncommitted" transaction level. Works as long as you know what to do :)

      • whaleofatw2022 3 months ago

        IIRC some IBM DBs have a level called 'chaos', it may be read uncommitted but with other side effects...

      • sitkack 3 months ago

        Is that like accidentally pure functions?

    • speed_spread 3 months ago

      "what's the worst that could happen?" implies some forethought. Delusional Locking just goes "It's gonna be fine! LALALALALA I CaN'T HeAr YoU"