Why not? If you're being so paranoid about the origin of a binary, you have to at least acknowledge the fact that you're trusting the compiler in making this comparison.
So let me throw out an idea that might help justify this trust too. Compile the same TrueCrypt sources with a totally different compiler, then use both binaries in a deterministic way and compare the raw encrypted result. (I'm assuming here that the same encryption keys and data will give the same result, but don't know for sure if that's true.)
Create a piss-poor barebones compiler (compiler A) by hand (literally, if possible. punchcards would allow you to hand-verify the contents of the program in a way that is not susceptible to Thompson's attack (there is no risk that your eyes were built with a compromised compiler)) that suffices to compile a compiler that you want to verify (compiler B). Compiler A should to run on hardware that you can trust (need not be x86).
(= ((a-binary b-source) b-source) b-binary)
If you trust a-binary, and you trust b-source, then if that returns true you should be able to trust anything created with b-binary. (a-binary b-source) will not equal b-binary, but ((a-binary b-source) b-source) should.
If the hardware is not executing binaries as described, then all bets are off. Compiler A could, in theory, be build and run on a homebrew CPU (http://www.homebrewcpu.com/), but if you are putting that much effort into this, this better be your hobby...
Let's not forget you need to verify by hand the electrical characteristics & functionality of every logic chip you use to build that homebrew CPU, because the supply chain of basic logic chips is already known to be infected (primarily a QA issue, but clearly a potential vector)
Don't forget the power company. They can cause "spurious" bit flips by momentarily dropping the voltage at just the right instant. Best to use a battery made from potatoes you've grown yourself.
You also need to ensure either the device, or the location you operate the device is rad-hardened. We cannot rule out the possibility that They can control the phases of the sun and introduce errors to your circuit at-will with solar flares.
True, though I suspect you are going to be doing that already since you will undoubtedly fuck up your homebrew CPU in novel ways many times while building it, and spend many hours debugging each piece of it. ;)
If all of those compilers share an ancestor compiler (in other words, if we think GCC is compromised and if Clang was originally bootstrapped with a compromised GCC), then I don't think that would be effective (although code that 'infects' not just future compilers of the same family (GCC that 'infects' future GCC's) but also any future compiler would be incredibly clever, to put it mildly).
Even if that were not the case, if the hypothesis is that GCC was compromised at some point in the past by a shadowy organization, then you have to consider the possibility that this shadowy organization also got to the other compilers. I think that is where probability steps in though; how confident are you that at least some of the compilers are still safe (or perhaps, at least compromised in conflicting ways)?
The TCC binary is small enough that it is eminently tractable to inspect it all by hand (or with IDA Pro if you are the rich kind of hacker). Binaries aren't black boxes, they're just code, only like it's written by a demented cowboy coder with really bad taste in variable names.
The problem is that hypothetically any tools you use on a computer could be compromised (by their compiler, or otherwise) to not show you truthful results on your screen. IDA Pro (and other tools at your disposal) may recognize certain patterns in binaries and know to show you a transformation of those patterns instead. This transformation would essentially be the reverse of the transformation that the compiler performs.
If you are able to inspect the actual contents of the program, not the output of a program that itself inspects the actual contents of the program, then this problem disappears. You have to examine the machine code without an intermediary program that could lie to you.
(Of course it is very unlikely that IDA Pro, objdump, or even 'od' is compromised in this way, but I would say this class of attack is largely hypothetical and implausible already...)
Edit:
From wikipedia: "What's worse, in Thompson's proof of concept implementation, the subverted compiler also subverted the analysis program (the disassembler), so that anyone who examined the binaries in the usual way would not actually see the real code that was running, but something else instead."
> If you're being so paranoid about the origin of a binary, you have to at least acknowledge the fact that you're trusting the compiler in making this comparison.
Which the author does, in fact:
"Of course, we need to trust the compiler, but in this case, it is independent of TrueCrypt."
You wouldn't just have to verify if it produces the same encrypted output, but also if all the steps along the way are carried out in precisely the same manner. A compromised version of TC may correctly encrypt the volume as expected, but also leak the key or the encryption password on the sly.
If I wanted to compromise TrueCrypt via a secret compiler-injected vulnerability, I'd replace the key generation logic with something that used maybe 64 actually random bytes as the input to an unpublished high-quality PRNG (the NSA almost certainly has a few of those hanging around). I don't think you could detect that by your method.
Because TrueCrypt is several orders of magnitude more high-profile of a target (being actual cryptography instead of a compiler) and probably also several orders of magnitude easier to compromise in a useful fashion & spread.
Paranoia is without bound. Who knows, you yourself might be a sleeper agent and you just don't know it yet! Maybe your eyes really were compromised in development! So you approach the problem from the point of view of what is likely, and what is not. You cannot guard against every single paranoia, but you can guard against ones you deem more likely.
You know that at a minimum, some manager at the NSA/CIA read that seminal paper and salivated at the prospect of a compromised compiler. Whether they are other there or not, I'm certain millions have been spent attempting it.
I seem to remember retrieving from a BBS way back when, an MS-DOS shareware pascal or C compiler of some kind that would leave behind a serial number foot print in executables that the author said he could use to prove that an unregistered version of his product was used. I wish I could remember it now, though
Dropping a 4-16 byte identifier string into an unused portion of a binary is world's away from reading source code, solving the halting problem to determine exactly which part you want to backdoor, and then outputting a backdoor binary. Shit, gcc embeds its version string in every object file, that doesn't mean it's a trap.
Don't need to solve the halting problem. There are plenty of well known, fixed points in any program you can attack. Patch the main entry point, patch the exit point, patch the memory allocation point, patch any function entry, etc. All you need is a few bytes of jump instruction to jump to the embedded compromised code. That can further download any specific code tailored to the specific program given its signature.
Since the compiler is in charge of generating the layout of the executable, it's in the perfect position to alter it so slightly to patch in a backdoor.
In order for your compiler to propagate the backdoor into my compiler and my compiler's output, it needs to recognize that it's compiling a compiler and insert the appropriate backdoor. It needs to identify the parts of my compiler that output binary code as opposed to an XML dump of the AST. That's hard.
Let me say it again, you can patch the WELL KNOWN points of any program.
I don't care where your compiler's AST tree or code generation is. For any compromised program (including a compiler) all I need to do is to monitor the files it generates (patch file_open), for any executable output files, patch its main entry point and add in a payload.
When a compromised compiler is generating your compiler, it will patch your compiler's entry point and add in an extra payload. When your compiler compiles another compiler, it will do the same thing, and so to any other programs it generates.
In the wise words of capitalists: show me the money. No one is going to develop some software to prove a point for an argument in an internet forum. You put up the money to commission a project with the ongoing rate and I'll show you the code.
I'm not wasting money to try to prove an improvable point.
It's very easy to play "specialist" and come up with theoretical scenarios, like the idiots that think it's possible to attack git using SHA1 collisions
In the purely theoretical sense, RSA is also broken, since you "only" need to gather a lot of computers to factor a key.
It's also every easy to make an empty one-liner, especially borrowing from some authority to make it appear important.
If you are not willing to waste money on proving a point, why would you expect me to waste substantial effort to write code to prove my point to you?
And if you are not willing to put money behind your statement, your one-liner talking point is exactly what it says, "talk is cheap."
I at least put in the effort to build detail case to rebut the previous comment poster's point and showed how it can be done. If you think my point was wrong, build a detail case to rebut it. Then we can have a meaningful discussion; otherwise, it's just cheap empty talk.
BTW, what I talked about was not theoretical. That's how viruses are written. You don't have to believe me, but again it's not my job to convince everyone.
It makes secure use of git a pain in the ass. You can't do even fetch objects from a source that isn't fully trusted, because they could override objects from a trusted repo.
I'm pretty sure you're wrong. Sometimes an argument is stronger motivation than money. Also sometimes better than money: knowledge, friendship, one-upping random comments, passion, aspiration, etc. Linus wrote Linux basically for reasons you say wouldn't motivate anyone to write code.
Look. I wasn't making a universal statement. My reply was specifically aiming your GP, whose smartass statement appealing to authority added nothing to the discussion. His statement embodies exactly what he is saying, "talk is cheap." And he wanted me to put in substantial effort for his one-liner? I wanted him to put some skin of his own in the game. Put up the money to make a point that his statement is not just cheap talk.
... as long as you also trust the compiler not to introduce any backdoor... (cf. Reflections on Trusting Trust)