Compared with C++ concurrent libraries, Rust is not too similar!

Translator | Lu Xinwang

Review | Yun Zhao

If you compare Rust to the younger brother of C++, I believe everyone will have no objection. Rust borrows many design ideas from C++. The same is true for concurrency.

The concurrency features of the Rust standard library are very similar to those in C++11: threads, atomic operations, locks and mutexes, condition variables, and more. However, over the past few years, with the release of C++17 and C++20, C++ has gained quite a few new concurrency-related features, and future releases will have more to learn from.

Let's take a moment to review C++'s concurrency features, discuss what those features look like under Rust, and what needs to be done to achieve this effect.

PART 01 _

atomic_ref

P0019R8 introduced std::atomic_ref into C++. It is a type that allows you to use non-atomic objects as atomic objects. For example, you can create an atomic_ref<int> that refers to a regular variable of type int, in which case you can use the same functionality as the atomic type atomic<int> as if it were an atomic<int>.

In C++, this requires a whole new type that replicates most of the atomic interface, and the equivalent Rust feature is a one-line function: atomic*::from_mut. For example, this function allows you to convert &mut u32 to &AtomicU32 , a form of aliasing that is perfectly correct in Rust.

The C++
atomic_ref type comes with safety requirements that need to be manually maintained. As long as you use atomic_ref to access an object, all access to that object must go through atomic_ref. Directly accessing atomic_ref while it is still present results in undefined behavior.

In Rust, however, this is already fully handled by the borrow checker. The compiler understands that by variably borrowing a u32, nothing is allowed to directly access that u32 until the borrow ends. The lifetime of the &mut u32 that goes into the from_mut function
will be preserved as part of the &AtomicU32 it gets from. You can make as many copies of &AtomicU32 as you want, but the original borrow won't end until all copies of that reference are gone.

The from_mut function is currently not stable, but maybe it's time to stabilize it.

PART 02 _

Generic atomic type

In C++, std::atomic is generic: you can have an atomic<int> or an atomic<myownstuct>. In Rust, on the other hand, we only have specific atomic types: AtomicU32, AtomicBool, AtomicUsize, etc.

C++'s atomic types support objects of any size, whether the platform supports it or not. For objects of sizes not supported by platform-native atomic operations, it automatically falls back to the lock-based implementation. Rust only provides types that are natively supported by the platform. If you're compiling with a platform that doesn't have 64-bit atomic, AtomicU64 doesn't exist.

This has advantages and disadvantages. This means that Rust code using AtomicU64 may fail to compile on some platforms, but it also means that there are no performance-related surprises when certain types silently fall back to a very different implementation. This also means that we can assume an AtomicU64 is identical to the u64 in memory, allowing functions like AtomicU64::from_mut.

Using a generic atomic type atomic<T> in Rust can be tricky to handle types of any size. Without specialization, we cannot make automic<LargeThing> contain Mutex without including it in automic<SmallThing>. What we can do, however, is to store the mutex in a global HashMap, indexed by a memory address. The automic<T> can then be the same size as T, using the mutex in this global HashMap if necessary.

That's what the popular atomic does.

The proposal to add such a generic generic automic<T> type to the Rust standard library requires discussion of whether it should be used in no_std programs. Regular hashmaps require allocation, which is not possible in no_std programs. Fixed-size tables might work for no_std programs, but might be undesirable for various reasons.

PART 03 _

Compare-exchange and padding

P0528R3 changed how compare_exchange handles padding. The compare-swap operation on atomic<TypeWithPadding> is also used to compare padding bits, but that turns out to be a bad idea. Today, the padding bits are no longer included in the comparison.

Since Rust currently only provides atomic types for integers without any padding, this change has nothing to do with Rust.

However, the atomic<T> scheme using the compare_exchange method requires a discussion of how to handle padding, and may need to take input from that scheme.

PART 04 _

Compare-exchange memory sorting

In C++11, the compare_exchange function requires that successful memory ordering is at least as strong as failure ordering. compare_exchange(…,…, memory_order_release, memory_order_acquire) is not accepted. This requirement is copied verbatim into Rust's compare_exchange function.

P0418R2 believes that this restriction should be removed, which is part of C++17.

The same restriction is lifted as part of Rust 1.64 and Rust lang/Rust#98383.

PART 05 _

Consexpr mutex constructor

C++'s std::mutex has a constexpr constructor, which means it can be constructed at compile time as part of constant evaluation. However, not all implementations actually provide this. For example, Microsoft's implementation of std::mutex does not include a constexpr constructor. So relying on this is a bad idea for portable code.

Also, interestingly, C++'s std::condition_variable and std::shared_mutex don't provide constexpr constructors at all.

In Rust 1.0, Rust's primitive mutex does not include the constant fn new. Combined with Rust's strict requirements for static initialization, this makes using mutexes in static variables very annoying.

This was resolved in Rust 1.63.0 as part of Rust lang/Rust#93740, all:

Mutex::new
rBlock::new
Condvar::new

All are constant functions now.

PART 06 _

Latches and barriers

P1135R6 introduced std::ltatch and std::barriers in C++20, both of which allow multiple threads to wait for a point to be reached. A latch is basically just a counter that is decremented by each thread and allows you to wait for it to reach zero. It can only be used once. A barrier is a more advanced version of this idea that can be reused and accepts a "completion function" that executes automatically when the counter reaches zero.

Rust has had a similar barrier type since 1.0. It is inspired by pthread (pthrea_Barrier_t) not C++.

Rust's (and pthread's) barriers are not as flexible as those included in C++ today. It has only one "decrement and wait" operation (called wait), and lacks the "wait-only", "decrement-only" and "decrement-and-remove" functions that come with C++'s std::barrier.

On the other hand, unlike C++, Rust's (and pthread's) "decrement and wait" operations designate a thread as the group leader. This is a (possibly more flexible) alternative to the completion function.

Operations missing on Rust versions can easily be added at any time. All we need is a good suggestion for the names of these new methods.

PART 07 _

signal

Likewise, P1135R6 also adds semaphores to C++20:

std::counting_semaphore
std::binary_semaphore

Rust doesn't have a generic semaphore type, although it does provide an efficient binary semaphore per thread via thread::park and unpark.

Semaphores can easily be constructed manually using a Mutex<u32> and Condvar, but most operating systems allow for a more efficient and smaller implementation using a single AtomicU32. For example, via futex() on Linux and waitoAddress() on Windows. The atomic size that can be used for these operations depends on the operating system and its version.

C++'s counting_semaphore is a template that takes an integer as a parameter to indicate how far we want to be able to count. For example, counting_semaphore<1000> can count up to at least 1000, so would be 16 bits or more. The binary_semaphore type is just an alias for counting_Semaphore<1>, which can be a single byte on some platforms.

In Rust, we may not be ready for such generic types anytime soon. Rust's generics enforce some kind of consistency, which puts some restrictions on what we can do with constants as generic parameters.

We could have separate semaphore 32, semaphore 64, etc., but that seems like overkill. It's possible to have a semaphore<u32> and a semaphore<u64> or even a semaphore<bool>, but it's something we haven't done before in the standard library. Our atom types are simply AtomicU32, AtomicU64, etc.

As mentioned above, for our atomic types, we only provide types that are natively supported by the platform you are compiling for. If we apply the same philosophy to semaphores, it won't exist on platforms that don't have futex or WaitoAddress capabilities, such as macOS. If we have separate semaphore types of different sizes, some sizes don't exist on (some versions of) Linux and various BSDs.

If we want to use the standard semaphore types in Rust, we first need some input as to whether we really need semaphores of different sizes, and what form of flexibility and portability we need to make them useful. Maybe we should just use one of the always-available 32-bit semaphore types (with lock-based fallback), but any such advice would have to include a detailed explanation of the use cases and limitations.

PART 08 _

Atomic waits and notifications

The remaining new features added to C++20 by P1135R6 are atomic wait and notify functions.

These functions effectively expose Linux's futex() and Windows' waitoAddress() directly through standard interfaces.

However, they are available on all sizes of atoms, on all platforms, regardless of what the operating system supports. Linux Futex (before FUTEX2) was always 32-bit, but C++ also allows atomic<uint64_t>:wait.

One way is to use something like a "parking lot": a global hash map that effectively maps memory addresses to locks and queues. This means that 32-bit wait operations on Linux can use a very fast futex-based implementation, while operations of other sizes will use a very different implementation.

If we followed the philosophy of only providing natively supported types and functions (as we did with atomic types), we would not provide such a fallback implementation. This means we only have AtomicU32::wait (and AtomicI32::wait ) on Linux, while on Windows all atomic types include this wait method.

Using Atomic*::wait and Atomic*::notify in Rust requires a discussion about whether falling back to global tables is appropriate in Rust.

PART 09 _

jthread and stop_token

P0660R10 added std::jthread and std::stop_token to C++20.

If we ignore stop_token for a moment, jthread is basically just a regular std::thread that automatically gets a join() method on destruction. This avoids accidentally detaching the thread and letting it run longer than expected, which can happen with regular threads. However, it also introduces a potential new pitfall: Destroying the jthread object immediately will join the thread immediately, effectively eliminating any potential parallelism.

As of Rust 1.63.0, scoped threads are provided (Rust lang/Rust#93203). Like jthreads, scoped threads are automatically joined. However, their point of attachment is well-defined and guaranteed to be safe and secure. The borrow checker even understands this guarantee, allowing you to safely borrow local variables in scoped threads as long as those variables go out of scope.

Aside from auto-join, a major feature of jthreads is its stop_token and corresponding stop_source. You can call request_stop() on stop_source to make the corresponding stopUrequest() method on stop_token return true. This works nicely to ask the thread to stop and is done automatically in jthread's destructor before joining. It's up to the thread's code to actually check the token, and stop when it's set.

So far it looks almost like a regular AtomicBool.

The difference is the stop_callback type. This type allows a callback function to be registered with a stop token, a "stop function". Requesting a stop with the corresponding stop source will execute this function. In fact, threads can use it to let other threads know how to stop or cancel their work.

In Rust, we can easily add atomicboolean-like functionality to the Scope object of thread::Scope. A simple is_finished(&self) -> bool or stop_requested(&self)
-> bool indicating whether the main scope function has completed may be sufficient. It can be requested from anywhere in conjunction with the request_stop(&self) method.

The stop_callback feature is more complex, and any Rust equivalent may require a detailed proposal discussing its interface, use cases, and limitations.

PART 10 _

Atomic floating point number

P0020R6 Added support for atomic floating point addition and subtraction in C++20.

It is also easy to add AtomicF32 or AtomicF64 to Rust, but paradoxically, it seems that the platforms that natively support atomic floating point operations are often GPU manufacturers, and Rust does not seem to provide support for these platforms at present.

With regards to adding these types to Rust, it is highly recommended to provide some practical use cases.

PART 11 _

Byte atomic memory

Currently, it is not possible to efficiently implement sequential locks that follow all the rules of the memory model in Rust or C++.

P1478R7 proposes to add atomic_load_per_byte_memcpy and

atomic_store_per_byte_memcpy to solve this problem.

For Rust, here is an idea to expose functionality via the AtomicPerByte<T> type: RFC 3301.

PART 12 _

atomic shared_ptr

P0718R2 adds specializations for atomic<shared_ptr> and atomic<weak_ptr> for C++20.

Reference-counted pointers (shared_ptr in C++, Arc in Rust) are often used for concurrent lock-free data structures. The atomic <shared_ptr> specialization makes it easier to do this correctly by properly handling reference counting.

In Rust, we can add the equivalent AtomicArc<T> and AtomicWeak<T> types. (Although AtomicArc may sound odd, consider that Arc's A already stands for "atomic".)

However, C++'s shared_ptr<T> is nullable, while in Rust it requires an option <Arc<T>. It's unclear if AtomicArc<T> should be empty, or if we should have an AtomicOptionArc<T> as well.

The popular arc-swap already provides all of these variants in Rust, but as far as I know, nothing like the standard library has been proposed yet.

PART 13 _

synchronized_value

Although P0290R2 was not accepted, a type called synchronized_value<T> was proposed, which combined a mutex with a data type T. Although it wasn't accepted by C++ at the time, it was an interesting suggestion since synchronize_value<T> is almost identical to Mutex<T> in Rust.

In C++, std::mutex doesn't contain the data it protects, or even what it protects. This means that it is up to the user to remember which data is protected and by which mutex, and to ensure that the correct mutex is locked each time "protected" data is accessed.

Rust's Mutex design uses a MutexGuard similar to a (mutable) T reference, which makes it safer while still allowing Mutex<( )>. The proposal for synchronized_value tries to add this pattern to C++, but uses closures instead of mutexes because C++ doesn't keep track of lifetimes.

PART 14 _

Epilogue

In my opinion, C++ can continue to be a source of inspiration for Rust, although the idea of "direct copy and paste" is not worth promoting, but good ideas still need to be learned and inherited. As we've seen with Mutex, scoped threads, Atomic*::from_mut, etc, things tend to get very different while providing the same functionality in Rust.

Of course, providing exactly the same functionality as C++ shouldn't be the main goal. The goal should be to provide exactly what the Rust ecosystem needs from the language and standard library, which may not be the same as what C++ users need from their language.

If you have a concurrency need from the Rust standard library that is not currently met, feel free to leave it in the comments, regardless of whether it has been solved in another language.

Original link:

https://blog.m-ou.se/rust-cpp-concurrency/

Translator introduction

Lu Xinwang, 51CTO community editor, programming language enthusiast, has a strong interest in databases, architecture, and cloud native. Currently, he is working for a cross-border e-commerce overseas marketing company as a back-end development work.

Live Preview

At 20:00 on August 25th, Feng Yue, director of the Zhongguancun Kejin AI Security Attack and Defense Laboratory, will be a guest in the [T Talk] live broadcast room to share the technical principles and technical practices of the Zhongguancun Kejin multimodal biological verification and anti-counterfeiting algorithm fusion system.

Whether you are a practitioner in the algorithm industry or a developer who is keen on biological anti-counterfeiting technology, I believe you can gain some different technical experience and solutions from this sharing.

Click on the video number card to book a live broadcast immediately

【T-TALK】Issue 18: Building a strong biological anti-counterfeiting security barrier: Analysis of the next generation of biological nuclear body technology

video number

● WeChat launches consumer protection function, TikTok denies that the built-in browser monitors users, Xiaomi rewards 3,142 people with about 942 million yuan | T Information

Apple devices are exposed to "kernel" vulnerabilities, details become a mystery

● 7 easy-to-use front-end frameworks, try to know

Click here to " read the full text " to view the highlights