typeof(...) for ISO C
Standardizing one of the biggest existing practices in decades... with some issues.
ThePhD here, and I’m going to tell you about a fun little extension to C called typeof
! It’s is a vendor extension that yields the type of an expression.
It’s a wildly useful vendor extension, since it allows someone to write a type-generic “print” (see here for a really cool example). It is one of the most widely spread extensions in the world, since – even for MSVC – it exists in all compilers, including the ones that don’t support it.
Seriously!
It’s a bit of a trip, but it can be undeniably proven that every single conforming C compiler since C89 actually supports this feature out of the box. A good handful of compilers don’t support the interface to do it like GCC with a __typeof__
or __typeof
keyword. But, every single compiler has to support, for example, this:
0int main (int, char*[]) {
1 return sizeof(2);
2}
n.b.: The ability to leave parameter names out of arguments is a C2x improvement I helped merge as the C Project Editor!
That’s it. That’s all we need to prove that typeof
exists. From the standard (bolded emphasis mine):
The
sizeof
operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. — N2573, Programming Languages C - Working Draft, §6.5.3.4 The sizeof and _Alignof operators, Semantics
That means that – whether or not you have access to the vendor extension – every compiler has to do the following equivalent internally in the compiler:
0int main (int, char*[]) {
1 return sizeof(typeof(2)); // equivalent to sizeof(int)
2}
This is great for advocating for a typeof
operation, because it means every single compiler since the C89 standard has the ability to compute typeof
. This means that proposing this to the C Standards Committee – WG14 – is less of a question of compiler capability and instead is just standardizing existing practice. It’s just a matter of convincing the folks there that we should expose the functionality. Given that it is tied to sizeof
, we already have remarkably standard and consistent behavior for the feature… except, in one place.
Qualifiers
Qualifiers refer to things such as const
and volatile
attached to type specifiers for various purpose. While something like volatile
has sort of an ambiguous meaning that has been made to loosely translate to “cannot be optimized away, especially if writing to hardware”, const
has stronger semantics in meaning “this declaration cannot be modified”. This turns out to begin to matter in some places:
0int main(int, char*[]) {
1 _Atomic int value = 5;
2 __typeof(value) other_value;
3 other_value = 6;
4 return value + other_value;
5}
What is the type of other_value
here?
For Clang, TCC, and more, other_value
is an atomic integer. But, on GCC, it is just a plain int
. It’s unclear from GCC’s documentation and from the implementation why this is the case. Given GCC is currently grappling with the choices of whether to include certain qualifiers or not (see: this bug report), there’s some room to say the _Atomic
– and other types – should be stripped out.
This can cause problems for some folks. The lack of consistency can become its own type of vendor lock-in, as folks rely on something that does not have standardized or de-facto standard behavior. Still, it’s not all bugs and confusion! A way around this, is by using the cast trick:
0int main(int, char*[]) {
1 const int value = 5;
2 __typeof((__typeof(value))value) other_value; // okay
3 other_value = 6; // works
4 return value + other_value;
5}
The cast here (the gnarly (__typeof(value))value
expression) is meant to strip qualifiers. This is because the C Standard explicitly states cast expressions strip all qualifiers:
Preceding an expression by a parenthesized type name converts the value of the expression to the unqualified version of the named type. — N2573 §6.5.4 “Cast operators”, paragraph 5
There’s just one problem with this: it only works for “atomic, qualified, or unqualified scalar type” (§6.5.4 “Cast operators”, paragraph 1). Or, in simpler terms: numbers and pointers. Aggregate types – such as arrays and structs – are illegal as all get out in cast operations such as this.
This presents a big problem. While it looks silly just sitting in a simple program with a single main
function, this comes up a lot when doing Macro-generic programming, as well as _Generic
-generic programming. This ends up being a sticking point for real world code, all the way down to the Linux kernel:
Unfortunately, gcc does not provide a way to remove segment qualifiers, which is needed to use typeof() to create local instances of the per-cpu variable. For this reason, do not use the segment qualifier for per-cpu variables, and do casting using the segment qualifier instead. — GCC Mailing List, “typeof and operands in named address spaces”
Preserving qualifiers means we are letting this use case go unsolved. There are a few different ways to tackle the problem. At the present moment, the proposal decides to strip all top-level qualifiers. This removes all qualifiers (including Named Address Space Operators, which come from an ISO Technical Report), which allows for an end-user to pick them. Of course, this can be problematic: what if a user forgets to add qualifiers that they intended? Forgetfulness is a thing, and it could insert small, insidious footguns that do the wrong thing or – worse – just have poor behavior characteristics.
Triggering “lvalue conversions” by doing things like __typeof((0, value))
also isn’t guaranteed to get rid of const
-ness (or other qualifiers). It also has the additional drawback for ruining things like arrays by forcing array-to-pointer decay. For macros where we want to preserve arrays so we can do things like introspect on size, it’s helpful to prevent decay using various expressions that trigger lvalue conversion. So, it’s really not a particularly appealing solution.
Thinking on It
Perhaps, the more mature solution is to give users a way to strip qualifiers or transfer said qualifiers. This would give the maximum amount of flexibility and not propagate the current issues we have with current __typeof
-style vendor extensions. typeof_unqualied( type-name or expression )
may be the right way forward, if we left a new, standard typeof
to keep all of the qualifiers from the original expression:
0int main(int, char*[]) {
1 const int value = 5;
2 typeof_unqualified(value) other_value; // okay
3 other_value = 6; // nice
4 return value + other_value;
5}
Of course, typeof_unqualified
is a HUGE amount of key presses, especially in a world where functions are called strcat
😺. Maybe just a simple typeof_unqual
is more palatable to the C world?
Nevertheless, the current proposal can be found here. There’s a lot of tweaking to do before it gets seen, and given the depth of the details of qualifiers this paper may take a whole lot of effort to standardize properly.
Truly, I seem to have a knack for setting myself up for the hard stuff…
Catch you next time.
— ThePhD 💚
P.S.: Yes, decltype(...)
does exists in C++. But in C++ it returns a reference, based on things like “having an extra set of parentheses”. This means that code in C using decltype
could return a reference in C++, but a normal non-reference type in C, in the same header. That’s ripe for problems, so typeof
needs to be different!
What do you think of it? Get in touch with us or leave us a comment on our social media to discuss. We look forward to engaging with you about improving the C Standard!