Shepherd's Oasis, LLC

Shepherd's Oasis, LLC

A software consulting, contracting, training, teaching and services company here to help you deliver the best quality for your business, organization, customers and constituents!

Catching a Use Case: Fallible Allocators

A class of allocator from C that C++ pushed away from, but was embraced by Rust.

ThePhD

19-Minute Read

A sunset horizon and a person standing fairly close by, all reflected through a bubble up-close.

It turns out there are two kinds of allocators in the world, and out of the Systems Programming languages we deal with here, C++ has the most special one!

Unfortunately for us, it’s the one that actually leaves some performance and use cases out in the cold πŸ₯Ά!

The Allocator Models

There are two allocation models that exist in various forms in C, C++, and Rust. All three languages support both allocation models, but how the language itself centers and defaults certain models is important to downstream development and interoperation between codebases. C and Rust have one allocation model, and it can be embodied by the Rust documentation here for its GlobalAlloc trait, with my own emphasis added:

unsafe fn alloc(&self, layout: Layout) -> *mut u8

Allocate memory as described by the given layout.

Returns a pointer to newly-allocated memory, or null to indicate allocation failure.

This embodies the important trait of the first and most fundamental allocation model…

Model 0: Fallible Allocation

Fallible Allocation means that failure is built into how the desired allocator behaves. It is the if result != null style of handling. That is, the error is returned as part of normal control flow, and is central to the code itself. The Fallible model is how both low-level Rust (GlobalAlloc and its unstable/nightly friend, AllocRef) and C code (with malloc or its many, many replacements and variants) behave. C has no standard way to handle errors, so it is on the end-user to detect that allocation has failed and do something. Rust provides a bit more infrastructure for its Fallible model, thankfully. Folks working with unsafe code are encouraged to attempt allocation, and call the default bad allocation handler built into the standard for failure. Rust’s also comes with the ability to set a hook for allocation failure to further customize the behavior.

In both C and Rust, handling these errors is a part of using the allocator, and in both cases it is caught by checking for null/NULL. The null pointer value then becomes the sentinel and anchor that downstream users of C and Rust Allocators can switch on. Or, at least, this was the case with the global allocator: Rust’s AllocRef returns a more robust Result<NonNull<[u8]>, AllocErr>. A more informative AllocErr is returned for Rust’s enhanced interface, and old code using Rust’s the global allocator wrapped around this simply transmutes a null value to a basic allocator error. For C++ folk, AllocRef’s return type is the equivalent of std::expected<std::span<unsigned char>, bad_alloc>. (Not that std::expected is real yet; it’s in a holding pattern as we spin around other topics instead.) Strangely enough, however, C++ chose a different model than Rust settled on or its predecessor, C, did. C++’s std::allocator and all conforming allocator objects, as well as the default new expressions, went for the second model…

Model 1: Infallible Allocation

In C++, every allocation in normal control flow is a perfect one. The reason for this perfection is because C++ developers, all coming from C, were frustrated by literal decades of writing this out in all sorts of ways:

0int* data_elements;
1/* ...*/
2data_elements = malloc(sizeof(int) * data_elements_count);
3if (data_elements == NULL) {
4	fprintf(stderr, "well this is a big shame?");
5	exit(1);
6}
7// actual real stuff to do

C++ decided that it wasn’t going to deal with this “error prone” code anymore, or let people just get away with calling malloc and not checking the return value properly at all (which, unfortunately, is done more often than I’d like to properly acknowledge in even the most robust software and firmware). So, when C++ created their new, they made it quite a bit more aggressive:

0int* data_elements;
1data_elements = new int[data_elements_count]();
2// actual real stuff to do

This code is shorter, more type-safe (the size is packed in the new expression and the multiplication is done implicitly for you in a lower-level language function), and no longer needs to check for the error. But that’s not because it can’t fail: it’s because the language has externalized the cost. It throws a bad_alloc exception now, rather than let the user handle it. This makes the call site clean and even provides a performance benefit to “happy path” code. So useful was this practice of turning errors (precondition violations, allocation failures, and more) into exceptions that when we created the std::allocator type to be paired with std::vector and everything it went into thereafter, we baked this into the Cpp17Allocator named requirements:

Memory is allocated for an array of n T and such an object is created but array elements are not constructed.

[Note 3: Additionally, the member function allocate for that type can fail by throwing an object of type bad_Β­alloc. β€” end note]

Nowhere are you allowed to return NULL or nullptr in C++ for its typical Cpp17Allocator. Once the .allocate(n, ...) function returns, it needs to be a valid pointer 0 to the first object for an array of at least n said objects. This makes the allocator Infallible and removes the need to check, which is definitely an improvement over its C predecessor. Ignoring the cost of polymorphic, inheritance-based exception throwing and catching, this seems like strictly a win. But even in a purely theoretical world with perfect, zero-cost exception handling, there is a design issue with Cpp17Allocator. There is no way to signal failure in the normal control flow of code. This creates an interesting problem, where you run into an interesting use case…

Allocation Failure Is Not An Error

Or, AFINAE.

Allocation failure is not always an error in many cases. I do not mean this in the “well, we can free more memory and return a better pointer later” sense, I mean that literally the work can continue without using the allocated space asked for. This is most on display when it comes to certain algorithms: you can often trade increased space for better speed. The C++ standard recognizes this, too, and has wording built in to 3 algorithms for this. One such algorithm is std::inplace_merge (emphasis mine):

Complexity: Let N = last - first:
β€” For the overloads with no ExecutionPolicy, and if enough additional memory is available, exactly N βˆ’1 comparisons.
β€” Otherwise, Θ(N log N) comparisons.

But, if you look at the signature for std::inplace_merge:

0template<class BidirectionalIterator>
1void inplace_merge(BidirectionalIterator first,
2                   BidirectionalIterator middle,
3                   BidirectionalIterator last);

There’s no memory source.

What gives?

Well, it turns out that these algorithms just… take some memory. Usually, its taken from some unspecified global pool. Algorithms like this in the Standard Library are std::inplace_merge, std::stable_sort, and std::stable_partition. For example, in the Microsoft C++ Standard Library, they just call their own internal version of std::get_temporary_buffer() (now deprecated). get_temporary_buffer was implemented poorly in all STL implementations, but the API’s design was exactly what was needed. It gave a chunk of memory suitable for an array of that object type, or signaled failure in normal control flow without throwing an exception. This is essential to fulfilling the promise of these algorithms, because they can work both with and without memory. From this design need, it can be immediately observed:

Allocators in C++ are fundamentally incapable of solving this problem in the Standard Library!

There is no extra parameter to give a memory source to inplace_merge or stable_sort because in C++ there is no type that satisfies these requirements. Passing a std::allocator here will simply crash and burn, and it’s not like std::memory_resource is any better since it maintains the same “returns a valid pointer, tosses an exception, or dies” mentality. By centering an Infallible Allocator model in C++, we traded simplicity for the lack of ability to handle complex cases. This also doesn’t stop here, either: Parallel Algorithm and, soon, Executor-based implementations of asynchronous/parallel algorithms are going to need memory sources they can strongly control as well.

Not having any kind of Fallible Allocator in the Standard thus becomes a damaging problem that hampers our ability to control allocations. It means forcing developers to outlaw the use of those functions in shared code if they are targeting specific environments. It means not being able to just opt-in to always using the fall back option directly. And, ultimately, it means a standard library that is less useful and less inclusive of diverse, important use cases in the wild. Parallel algorithms are not going anywhere, and the need for them is only going to increase as heterogenous computing continues to thrive.

So, we are going to need to have a Fallible Allocator, at some point.

How?

Ultimately, it seems like the best way to solve this problem is by creating a new FallibleAllocator named requirement that has the requisite API changes. Particularly, rather than my_alloc.allocate(...) returning a pointer, it will return a std::optional<non_null<T*>>. The reason I say this over something like std::expected<non_null<T*>, bad_alloc> like Rust is because bad_alloc is a lot more heavyweight than I would ever want it, and no user-facing constructor allows anyone to override what information is kept in the string. Thusly optional is strictly better here. A minimal class would likely take this form:

 0#include <optional>
 1#include <cstdlib>
 2#include <cstddef>
 3#include <type_traits>
 4
 5// does literally nothing
 6// except for static analyzers
 7template <typename T>
 8using non_null = T;
 9
10template <typename Value>
11class fallocator {
12public:
13	using fallible = std::true_type;
14	using value_type = Value;
15	using pointer = value_type*;
16
17	constexpr fallocator() = default;
18	constexpr fallocator(const fallocator&) = default;
19	constexpr fallocator(fallocator&&) = default;
20	constexpr fallocator& operator=(const fallocator&) = default;
21	constexpr fallocator& operator=(fallocator&&) = default;
22	template <typename X>
23	constexpr fallocator(const fallocator<X>&) noexcept {}
24
25	std::optional<non_null<pointer>> allocate(std::size_t n) noexcept {
26		pointer p = static_cast<pointer>(std::malloc(sizeof(value_type) * n));
27		if (p == nullptr) {
28			return std::nullopt;
29		}
30		return p;
31	}
32
33	void deallocate(pointer p) noexcept {
34		std::free(p);
35	}
36};
37
38template <typename Left, typename Right>
39constexpr bool
40operator==(const fallocator<Left>& left, const fallocator<Right>& right) noexcept {
41	return true;
42}

And that’s all we would need for the most basic implementation. This is a “minimal allocator”, so we are not showing off all the bells and whistles. It follows the same Allocator Requirements as the Cpp17Allocators already present in the C++ Standard Library, just with the allocate function serving a different purpose.

There’s an additional using fallible = std::true_type, because infrastructure in the Standard Library (or a different library) should recognize that this is a fallible allocator and thusly can return an invalid result. This is different from the infallible .allocate() functions of old.

Note that this is templated because, internally, an algorithm or similar could have the need to allocate things that do not exactly match for the allocator that was given. In this case, the library writer using the fallocator would need to use the allocator_traits::rebind_alloc functionality to get a new fallocator so they can allocate objects of any specific internal / node type. This may be particularly useful if there are complicated data structures involved in the parallel algorithms that can provide speedups, rather than just making gigantic arrays to fill in concurrently.

“Why not just catch()?

Setting up a try { ... } catch () { ... } around a .allocate(...) call from an old-style allocator — rather than having to write a new one — seems infinitely more appealing as the low-energy solution to this problem. But that’s exactly the problem: the normal std::allocator does not have to signal an error by calling std::bad_alloc. That is just one way to leave this code:

 0std::allocator<int> alloc{};
 1
 2//...
 3
 4using alloc_traits = std::allocator_traits<decltype(alloc)>;
 5int* ptr = nullptr;
 6try {
 7	ptr = alloc_traits::allocate(alloc, num_elements);
 8}
 9catch (...) {
10	// something went wrong, panic!
11}
12// good to go, use "ptr"

The other way to get out of this code is by just calling exit(1) or abort() inside the .allocate(...) function. Yes, conceivably, someone could do this in both the Fallible and the Infallible Allocators. But the contract of an Infallible Allocator makes it much more likely: if you have exceptions turned off, there’s no graceful way to leave the .allocate(...) function and meet the post conditions. Therefore, the only recourse is invoking a special termination procedure.

Fallible allocators provide an out for all code and all models, since the failure mode – an empty std::optional – conveys the error in normal code flow!

Using the Fallible Allocator

The usage is largely the same, just that the return value is a bit bulkier from being an std::optional rather than just the regular pointer.

 0fallocator<int> alloc{};
 1
 2//...
 3
 4using alloc_traits = std::allocator_traits<decltype(alloc)>;
 5
 6auto maybe_allocate = alloc_traits::allocate(alloc, num_elements);
 7if (!maybe_allocate) {
 8	// panic!!
 9}
10// good to go, get and use pointer
11int* ptr = *maybe_allocator;
12// ...

A more practical example is getting an opportunistic buffer to compute a better e.g. std::stable_sort. Modifying a standard library to do this would look something like the below. Comments are added to make it clear what all those underscore wildness in the code is doing. (That’s the required way to write code to avoid macro clashes in the Standard Library.)

 0// from: https://github.com/microsoft/STL/blob/master/stl/inc/algorithm#L7929
 1
 2template <class _BidIt, class _Pr, class _Al>
 3void stable_sort(const _BidIt _First, const _BidIt _Last, _Pr _Pred, _Al _Alloc) {
 4    /* range checking */
 5    // sort preserving order of equivalents
 6    _Adl_verify_range(_First, _Last);
 7    const auto _UFirst = _Get_unwrapped(_First);
 8    const auto _ULast  = _Get_unwrapped(_Last);
 9    // _STD is a macro for std:: to avoid ADL shenanigans
10    const auto _Count  = _STD distance(_UFirst, _ULast);
11    // optimize: if we have a small enough range,
12    // just use a quick in-place insertion sort
13    if (_Count <= _ISORT_MAX) {
14        _Insertion_sort_unchecked(_UFirst, _ULast, _Pass_fn(_Pred));
15        return;
16    }
17
18    // otherwise, get allocator-y with it...
19    using _Alloc_type = decltype(_Get_unwrapped(_Alloc));
20    using _Alloc_traits_type = _STD allocator_traits<_Alloc_type>;
21    static_assert(_Alloc_traits_type::is_fallible_v, "the allocator must be fallible");
22    static_assert(_STD is_same_v<_Iter_value_t<_BidIt>, alloc_type::value_type>,
23                  "a fallible allocator must have the same type as the iterator");
24
25    // pre-compute size
26    const auto _Capacity = _Count - _Count / 2;
27    // allocate buffer
28    const auto _Maybe_temp_buf = _Alloc_traits_type::allocate(_Get_unwrapped(_Alloc), _Capacity);
29    if (_Maybe_tmp_buf) {
30        // make sure to delete the memory, if successful
31        // when we are done
32        auto _Temp_buf_ptr = *_Maybe_temp_buf;
33        using _Alloc_deleter = _STD _Alloc_ref_delete<_Alloc_type>;
34        using _Alloc_guard = _STD unique_ptr<decltype(*_Temp_buf_ptr), _STD _Alloc_ref_delete>;
35        _Alloc_guard _Mem_guard(_Temp_buf_ptr, _Get_unwrapped(_Alloc));
36        _Stable_sort_unchecked(_UFirst, _ULast, _Count, _Temp_buf_ptr, _Capacity, _Pass_fn(_Pred));
37    }
38    else {
39        _Stable_sort_unchecked(_UFirst, _ULast, _Count, nullptr, 0, _Pass_fn(_Pred));
40    }
41}

And there it is! A stable sort, but one that allows the end-user to exert control over the final memory profile of the implementation. Alloc_ref_delete is a unique_ptr-compatible Deleter that holds an allocator by reference, so it can call allocator_traits<Allocator>::deallocate(alloc, ptr).

This optimization can be applied to other algorithms (std::inplace_merge, std::stable_merge), as well as some common implementation strategies for parallel algorithms. The fallocator demonstrated here and its usage follow the existing allocator API very strictly. There are additional improvements that can be made in its design and its choice of API, but those changes are much more fundamental and have to do with a few upcoming write ups here and elsewhere! It’s going to be exciting, since there is a lot to talk about with respect to allocators and performance!

There is one more thing to potentially talk about, but honestly it’s not the most important thing! It’s alright if you stop here. In fact, it’s probably encouraged!

So, we’ll see you later, buhbyyeee!

πŸ’š



























Okay.

They’re gone. It’s just you and me, dear reader, and as usual there’s some non-standard, not-blessed-by-C++ shenanigans we can get up to using this new API. You will notice that, in particular, the fallocator API’s allocate is written like so:

 0	// ...
 1	
 2	std::optional<non_null<pointer>> allocate(std::size_t n) noexcept {
 3		pointer p = static_cast<pointer>(std::malloc(sizeof(value_type) * n));
 4		if (p == nullptr) {
 5			return std::nullopt;
 6		}
 7		return p;
 8	}
 9
10	// ...

It returns an std::optional. Remember, the non_null is just a template that does literally nothing, except maybe help some folks who do shenanigans with static analyzers. This means that, ultimately, the API is working with std::optional<T*>. This can enable a particularly fun, non-standard use case that has no real right to be considered! That is,

you can return nullptr from the allocator and it could be considered valid.

… What. W-Why?!

Okay, so. Hear me out here, before you slam dunk on me (you can dunk on me later, but your slam dunk should be filled with the righteous fullness of a complete rejection, rather than a visceral negative reaction). There are cases where the value of the nullptr — in the case where it has an actual representation value of 0 — actually ends up pointing to valid memory.

This typically happens in “unprotected” development or low-level embedded development: whether it’s mapped registers or other shenanigans, sometimes the literal address 0 is, indeed, not nothing, but something! Normally, the fix to this is easy: the compiler can make the nullptr representation something like 0x55555555. But sometimes, the compiler doesn’t exactly do that, as Myrrlyn 1 explains…

I worked on a system that had a mapped zero page but the compiler still used bit pattern zero as null.

Yes. Sometimes, systems and vendors are a little loose and wild. And, sometimes, you have to get creative. Which is why they came up with this…

The solution was that the HAL provided an extern symbol that was set to zero by the linker.

Nice! But, that begs the question: is an extern symbol really a way around this? How effective is the compiler at removing nullptr, even if it is hidden inside of an optional instead? As Nika Layzell pointed out2 if you try out some code that exposes a nullptr – even if you don’t check explicitly for if (p == nullptr), it’s easy to see the compiler makes assumptions about the “nullptr-ness” (using Clang 11):

 0#include <optional>
 1
 2int x[100];
 3
 4inline std::optional<int*> get_thing(int i) {
 5  if (i < 100) {
 6    if (i == 5) { return nullptr; }
 7    return x + i;
 8  }
 9  return std::nullopt;
10}
11
12int my_thing(int i) {
13  auto x = get_thing(i);
14  if (x) {
15    return **x;
16  }
17  return 0;
18}

The above code generates assembly that looks like this:

0my_thing(int):                           # @my_thing(int)
1        xor     eax, eax
2        cmp     edi, 99
3        jg      .LBB0_2
4        movsxd  rax, edi
5        mov     eax, dword ptr [4*rax + x]
6.LBB0_2:
7        ret
8x:
9        .zero   400

You can see from this code that it never checks for i == 5, because the double-dereference in my_thing invalidates that check inside of the for loop. The compiler – after inlining that method – proves that, because we dereference both the std::optional and the int*, at no point can i be 5 since our code dereferences the pointer stored inside the std::optional. If we never do the 5 branch, then there’s only 2 actual branches in the combined code. The first is being less than 100, in which case there is always a value. The second is if it is not, in which case the return value is always 0 (which is what xor eax, eax computes).

Thusly, the compiler removes it.

But! There’s a way to cheat, using what Myrrlyn told us:

 0#include <optional>
 1
 2int x[100];
 3
 4extern int* the_real_nullptr;
 5
 6inline std::optional<int*> get_thing(int i) {
 7  if (i < 100) {
 8    if (i == 5) { return the_real_nullptr; }
 9    return x + i;
10  }
11  return std::nullopt;
12}
13
14int my_thing(int i) {
15  auto x = get_thing(i);
16  if (x) {
17    return **x;
18  }
19  return 0;
20}

And suddenly, the compiler can’t figure it out:

 0my_thing(int):                           # @my_thing(int)
 1        xor     eax, eax
 2        cmp     edi, 99
 3        jg      .LBB0_5
 4        cmp     edi, 5
 5        jne     .LBB0_2
 6        mov     rax, qword ptr [rip + the_real_nullptr]
 7        jmp     .LBB0_4
 8.LBB0_2:
 9        movsxd  rax, edi
10        lea     rax, [4*rax + x]
11.LBB0_4:
12        mov     eax, dword ptr [rax]
13.LBB0_5:
14        ret
15x:
16        .zero   400

Ah, wonderful! For a moment, we can celebrate! And yet… it’s actually still not correct. See, even if we:

  • never compare against nullptr;
  • squirrel the value of the the_real_nullptr type in a different translation unit;
  • and, go as far as to bury it in a Linker Script;

we can still run afoul of the compiler. As Mara Bos3 explained alongside Nika, the compiler can do more than just optimizations and comparisons against the nullptr value:

rustc isn’t the only compiler that’s allowed to make use of unused/invalid bit patterns to store other information. If you always dereference that pointer, the compiler might even use nullptr as the nullopt representation.

Yes; the compiler can repurpose the bits of a pointer containing what it believes is the nullptr value and do shenanigans on them. For example, it could insert a checking operation that the pointer value representation is, indeed, bit-equivalent to 0x0 and insert an exploding instruction if that’s the case! After all, Trap Pointer Representations are a thing in both C and C++, too, and any integral value converted into a pointer could trigger a hard trap. This means that even just returning the_real_nullptr could take a live trap representation, load it into the register, and let it destroy the runtime. But, beyond the theoretical unsoundness found in C17 and C2x, there’s also the practical implications that Mara also pointed out:

The very few people who are going to be using address zero are probably doing that because there’s some magic at that address. Like the first page containing some memory mapped registers or something. It’s extremely unlikely that addresses into a space like that would be returned by an allocator.

And she’s right. What kind of allocator really needs to return the actual 0 pointer value? Is it worth it to return a std::optional<non_null<T*>> when just returning a T* would work just fine? There might be an embedded use case out there, but an embedded use case that needs it in an allocator…? That’s pretty suspicious! So, maybe we can make allocate just return a T* again. But… you know. That’s something we’ll have to figure out, another day! For now, the control and improvement for following Chandler Carruth’s Principle of Development — that you can open up the hood and get better performance, but have good performance by default — is more than enough of a reward without getting into the tall grass about conceptual nullptr.

Thanks for sticking with the article for the extra shenanigans!

β€” ThePhD πŸ’š


What do you think of the article? Get in touch with us or leave us a comment on our social media to discuss!


November 16th, 2020 Edit: a few typos and a fix up of the original allocator was done!

0 β€” Technically, the return value is unspecified if .allocate(...) is handed a size of 0, but we’re not counting the degenerate case here. ‴

1 β€” Myrrlyn is a software developer. He works on bitvec and other amazing things that help make the world go ‘round. ‴

2 β€” Nika Layzell is a software developer from Canada. She works on Fission site isolation at Mozilla, and regularly gets into fun with Rust and C++. ‴

3 β€” Mara Bos is a software developer and electrical engineer from the Netherlands. She works on the Rust Standard Library and many other projects. She also maintains her own DNS, which is a massive flex and we hope to one day be as cool as she is. ‴

Edited Title Photo by Christina from Pexels.

Recent Posts

Categories

About

A contracting and consulting company ready to serve your business needs, locally or globally!