> But C never had been portable assembly. The ANSI C standards committee disagre...

dathinab · on Aug 3, 2024

The full quote is:

> Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program (§4).

This doesn't say that C is a high-level assembly.

It just says that the committee doesn't (at that point in time) wants to force the usage of "portable" C as a mean to prevent the usage of C as high-level assembler. But just because some people use something as high level assembler doesn't mean it is high level assembly (like I did use a spoon as a fork once, it's still a spoon).

Furthermore the fact that they explicitly mention forcing portable C with the terms "to preclude" and not "to break compatibility" or similar I think says a lot about weather or not the committee thought of C as high level assembly.

Most importantly the quote is about the process of making the first C standard which had to make sure to ease the transition from various non standardized C dialects to "standard C" and I'm pretty sure that through the history there had been C dialects/compiler implementations which approached C as high level assembly, but C as in "standard C" is not that.

mpweiher · on Aug 4, 2024

It specifically says that the use of C as a "portable assembler" is a use that the standards committee does not want to preclude.

Not sure how much clearer this can be.

HeroicKatora · on Aug 4, 2024

That statement means the comittee does not want to stop it from being developed. The question is, has it? They mean a specific implementation could work as portable assembler, mirroring djb's request for an 'unsurprising' C compiler. Another interpretation would be in the context of CompCert, which has been developed to achieve semantic preservation between assembly and its source. Interestingly this of course hints at verifying an assembled snippet coming from some other source as well. Then that alternate source for the critical functions frees the rest of compiler internals from the problems of preserving constant-timeness and leakfreedom through their passes.

mpweiher · on Aug 5, 2024

No.

C already existed prior to the ANSI standardization process, so there was nothing "to be developed", though a few changes were made to the language, in particular function prototypes.

C was being used in this fashion, and the ANSI standards committee made it clear that it wanted the standard to maintain that use-case.

HeroicKatora · on Aug 4, 2024

These are aspiration statements, not a factual judgment of what that standard or its existing implementations actually are. At least they do not cover all implementations nor define precisely what they cover. Note the immediate next statement: "C code can be non-portable."

In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against (by, e.g. varying the size of integers in such a way some promotion is changed to something leading to signed overflow; or bounds checking is ineffective).

The paragraph further down about explicitly and swiftly rejecting a validation test suite should also read as a warning. Not only would the proposal of modern software development without a test suite get you swiftly fired today, but they're explicitly acknowledging the insurmountable difficulties in producing any code with consistent cross-implementation behavior. But in the time since then, other languages have demonstrated you can reap many of the advantages of close-to-the-metal without compromising on behavior consistency in cross-target behavior, at least for many relevant real-word cases.

They really knew what they were building, a compromise. But that gets cherry-picked into absurdity such as stating C is portable in present-tense or that any inherent properties make it assembly-like. It's neither.

mpweiher · on Aug 5, 2024

These are statements of intent. And the intent is both stated explicitly and also very clear in the standard document that the use as a "portable assembler" is one of the use cases that is intended and that the language should not prohibit.

That does not mean that C is a portable assembly language to the exclusion of everything and anything else, but it also means the claim that it is definitely in no way a portable assembly language at all is also clearly false. Being a portable assembly (and "high level" for the time) is one of the intended use-cases.

> In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

Yes. The original intent for which it was designed and in which role it works well.

> The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against

Yes, that's the "other" direction that deviates from the original intent. In this role, it does not work well, because, as you rightly point out, all that UB/IB becomes a bug, not a feature.

For that role: pick another language. Because trying to retrofit C to not be the language it is just doesn't work. People have tried. And failed.

Of course what we have now is the worst of both worlds: instead of either (a) UB serving its original purpose of letting C be a fairly thin and mostly portable shell above the machine, or (b) eliminating UB in order to have stable semantics, compiler writers have chosen (c): exploiting UB for optimization.

Now these optimizations alter program behavior, sometimes drastically and even impacting safety (for example by eliminating bounds checks that the programmer explicitly put in!), despite the fact that the one cardinal rule of program optimization is that it must not alter program behavior (except for execution speed).

The completely schizophrenic "reasoning" for this altering of program behavior being somehow OK is that, at the same time that we are using UB to optimize all over the place, we are also free to assume that UB cannot and never does happen. This despite the fact that it is demonstrably untrue. After all UB is all over the C standard, and all over real world code. And used for optimization purposes, while not existing.

> They really knew what they were building, a compromise.

Exactly. And for the last 3 decades or so people have been trying unsuccessfully to unpick that compromise. And the result is awful.

The interests driving this are also pretty clear. On the one hand a few mega-corps for whom the tradeoff of making code inscrutable and unmanageable for The Rest of Us™ is completely worth it as long as it shaves off 0.02% running time in the code they run on tens or hundreds of data centers and I don't know how many machines. On the other hand, compiler researchers and/or open-source compiler engineers who are mostly financed by those few megacorps (the joy of open-source!) and for whom there is little else in terms of PhD-worthy or paid work to do outside of that constellation.

I used to pay for my C compiler, thus there was a vendor and I was their customer and they had a strong interest in not pissing me off, because they depended on me and my ilk for their livelihood. This even pre-dated the first ANSI-C standard, so all the compiler's behavior was UB. They still didn't pull any of the shenanigans that current C compilers do.

pjmlp · on Aug 3, 2024

Back in 1989, when C abstract machine semantics were closer to being a portable macro processor, and stuff like the register keyword was actually something compilers cared about.

SAI_Peregrinus · on Aug 3, 2024

And even then there was no notion of constant-time being observable behavior to the compiler. You cannot write reliably constant-time code in C because execution time is not a property the C language includes in its model of computation.

mpweiher · on Aug 3, 2024

But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

And that is actually not just compatible with the C "model of computation" being otherwise quite incomplete, these two properties are really just two sides of the same coin.

The whole idea of an "abstract C machine" that unambiguously and completely specifies behavior is a fiction.

trealira · on Aug 4, 2024

> But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

While you can often guess what the assembly will be from looking at C code given that you're familiar with the compiler, exactly how C is to be translated into assembly isn't well-specified.

For example, you can't expect that all uses of the multiplication operator "*" results in an actual x86 mul instruction. Many users expect constant propagation, so you can write something like "2 * SOME_CONSTANT" without computing that value at runtime; there is no guarantee of this behavior, though. Also, for unsigned integers, when optimizations are turned on, many expect compilers to emit left shift instructions when multiplying by a constant power of two, but again, there's no guarantee of this. That's not to say this behavior couldn't be part of a specification, but it's just an informal expectation right now.

What I think people might want is some readable, well-defined set of attribute grammars[0] for translation of C into assembly for varying optimization levels - then, you really would be able to know exactly how some piece of C code under some context would be translated into assembly. They've already been used for writing code generator generators in compilers, but what I'm thinking is something more abstract, not as concrete as a code generation tool.

[0]: https://en.wikipedia.org/wiki/Attribute_grammar

mpweiher · on Aug 5, 2024

> exactly how C is to be translated into assembly isn't well-specified.

Exactly! It's not well-specified so the implementation is not prevented from doing a straightforward mapping to the machine by some part of the spec that doesn't map well to the actual machine.

dathinab · on Aug 4, 2024

> But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

not rally, or at least not in a way which would count as "high level assembler". If it would the majority of optimizations compilers do today would not be standard conform.

Like there is a mapping to behavior but not a mapping to assembly.

Which is where the abstract C machine as a hypothetical machine formed from the rules of the standard comes in. Kinda as a mind model which runs the behavior mappings instead of running any specific assembly. But then it not being ambiguous and complete doesn't change anything about C not being high level assembly, actually it makes C even less high level assembly.

mpweiher · on Aug 4, 2024

> If it would the majority of optimizations compilers do today would not be standard conform.

They aren't.

pjmlp · on Aug 4, 2024

So you can easily tell, just by looking to the C source code, if plain Assembly instructions are being used from four books of ISA manual, if the compiler is able to automatically vectorize a code region including which flavour of vector instructions, or completely replace specific math code patterns for a single opcode.