Just double-checking the part of the presentation where they cite Plan 9's C compiler as "predictable" because it doesn't optimize away a useless loop... that's because the compiler is missing a bunch of useful optimizations isn't it?
Specifically they say GCC requires this form for the busy loop to be emitted:
for (int i = 0; i < 1000000; i++)
asm volatile ("" ::: "memory");
Where 9c will output a bunch of useless code when you tell it this:
>Plan 9 C implements C by attempting to follow the programmer’s instructions, which is surprisingly useful in systems programming.
It's like coding with -fno-strict-aliasing or -fwrapv in GCC, it's perfectly fine and justifiable but that doesn't mean that it makes sense for a compiler to default to it IMO because you're basically lulling your devs into writing into a specific dialect of C instead of the "real" language. It means that your code is effectively not portable anymore which is probably less of an issue for low level kernel code but could still easily cause issues as code is shared between projects. Again, there are situations where it makes sense to do so but I strongly believe that it should be an explicit choice by the programmer, not a compiler default.
Now I would argue that the for loop example is even worse than aliasing or wrapping-related issues because I very rarely write busy timing loops but I do very often write for loops that I expect the compiler to optimize (drop useless code, unroll etc...) correctly. So yeah, that really seems like a way to spin a limitation of the compiler into a "feature" that makes really little sense.
Also I just checked and gcc 8.2 does output the loop code when building with -O0 I guess they could alias that to --plan9-mode.
> but I do very often write for loops that I expect the compiler to optimize (drop useless code, unroll etc...) correctly
I feel like the "Plan 9 C" author would argue that optimizations like that should be explicitly enabled using inline pragmas, where something that has an optimization pragma is requiring the compiler to optimize it (so if it can't be optimized, the compiler should generate an error) and anything without the pragma requires the compiler to not optimize it. (And then you can have an "optimize if you can" pragma, too, but its usage would be comparatively rare to either explicitly requiring or disallowing optimization.)
Whereas, with regular C compilers—unlike compilers for most other systems languages—optimizations get turned on by a compiler switch entirely outside of the code, and then what gets optimized and what doesn't is invisible, and there are both no guarantees that anything will be optimized, and no guarantees that anything won't be optimized (unless you "trick" the compiler by using things like the asm volatile() above.)
I'm not sure if I personally agree with the PoV I just stated, but I think that's what they're thinking.
Compilers, including their optimizations, are implemented using abstractions. The component to remove a chunk of code might query some other component, "are any objects within this subtree used by anything outside this subtree"? If the answer is, "no", it gets removed.
Recognizing and preserving special syntax patterns requires additional work and can add substantial complexity. This is a common dilemma in software engineering, especially highquality software that applies sophisticated algorithms. The smarter a compiler in terms of the application of state-of-the-art algorithms, the more that these rigorous (but sometimes annoying) optimizations naturally happen. On the other hand, anything that breaks abstraction boundaries results in complexity which can make comprehension and maintenance quite burdensome.
If you've ever written code to build and transform an AST it should be obvious how difficult it can be to add in ad hoc logic that leads to inconsistent treatment of nodes. Even adding pragma opt-outs can add substantial complexity. The Plan 9 compiler recognizes this because it basically does no optimizations. In that sense it behaves much like GCC in preferring simplicity over ad hoc semantics; both recognize that to "have your cake and eat it too" is too costly.
Fortunately, C does make it relatively easy to compile different source units independently. So all you really need is a single mode that disables all optimizations, and put your special code in its own source file. But the trend is to remove this separate linking step (Go and Rust both do static linking across the application), and even C compilers are defaulting to so-called LTO which effectively recompiles the application at link-time and which deliberately violates previous semantics regarding cross-unit transformations and optimizations. That's something of a shame.
GCC does permit all manner of function-level attributes, but it adds substantial complexity, which is why clang and most other compilers don't support such flexibility to the same degree, and why GCC is often reticent to support yet another option.
> Plan 9 C implements C by attempting to follow the programmer’s instructions
Which, I might add, is a very silly thing to say. A programmer's intent and their written code are two very different things. How one maps to the other is defined only by the C standard, which says nothing about emitting specific assembly instructions, but only about the ultimate effect of code on memory.
The Plan 9 compiler deciding to pessimize your code because it assumes you actually meant for the code to be interpreted as portable assembly rather than a high-level description of a computation is kind of presumptuous. At that point it's just a different language with different (albeit compatible) semantics.
Not really. C99 adopted most (all?) of their extensions, including anonymous union and structure members, compound literals, long long, and named initializers.
Interestingly, with the exception of long long, these are the features that effectively forked C and C++.
Compiler optimizations are one of the primary culprits in making it difficult to reason about lock-free programs. Semantics-preserving optimizations in a single-threaded context are not necessarily semantics-preserving in a multi-threaded, lock-free context.
For example, if you're writing a spin-lock, the compiler may lift a read of the lock value out of a loop because, assuming a single thread, the value will never change. This can result in a non-terminating spin-lock. For more see Linux's ACCESS_ONCE.
The example you gave is unfortunate but the consequences of optimizing loops carelessly can be serious.
Isn't this the purpose of well-defined atomic primitives?
After all, not just the compiler, but also the processor can reorder operations. So you have to annotate synchronizing memory operations regardless of whether the compiler is optimizing. e.g., a lock-free algorithm implemented using only volatile (what ACCESS_ONCE does), even with -O0, is almost certainly wrong.
The alternative to explicit annotation is for the compiler to generate full memory barriers around every memory access. That would indeed preserve semantics in a multithreaded context, at a ridiculous performance cost.
The example I gave is simple and relates to the example of the parent but there are more complex cases for which it is a matter of ongoing research to define a semantics that also admits compiler optimizations.
For example the "well-defined" semantics of (C|C++)11's atomics admits executions where values can materialize out of thin air [1].
The broader point I was hoping to make is that optimizations are great but are not free in a multi-threaded context with data-races (even benign ones). As a consequence the choice to just remove many of them is one that is supported by many people in the weak-memory community and even appears in newer memory models [2]. For example preventing read-write reorderings to prevent causal cycles.
> If the loop is so useless, why is it in the code?
Because perhaps it contains a body that optimizes away based on conditions out of control of the programmer? This happens all the time with macros/templates, and with platform-agnostic code. Only the compiler can resolve what's in the body; I want to trust the compiler to remove the loop if it is useless.
That kind of empty loops are actually used for delays, waiting on interrupts to kick in etc. in embedded systems, where you typically fight against the compiler using volatile keyword.
Example from https://www.coranac.com/tonc/text/video.htm:
I'm curious as to if there's a tool that can map the sections of code that are optimized away by the compiler, and feed that back to the developer; thus code like this:
for (int a = 0; a < 10000; a++);
would emit a message at compile time allowing the human to take an additional look at the code and determine its usefulness. ultimately the code would be removed or refactored just to stop the nagging.
Specifically they say GCC requires this form for the busy loop to be emitted:
for (int i = 0; i < 1000000; i++) asm volatile ("" ::: "memory");
Where 9c will output a bunch of useless code when you tell it this:
for (int i = 0; i < 1000000; i++);
And this is... a good thing?