Talking with one of my professors, the subject of trust in computing came up. The professor related a story about a paper discussing a theoretical vulnerability in the entire computing infrastructure. (Unfortunately, I can’t find the paper. I think it was somebody’s PhD dissertation, but I’m not positive.) The issue, in essence is that compilers are themselves compiled.
You see, all compilers in use today are written in high level languages themselves, and as such must have been themselves compiled. This paper theorized (or possibly proved, again I’m a bit hazy on the full details) that all modern compilers can trace their genealogies back to one single compiler hand coded in machine language by some poor soul. Most likely on a punch card.
Suppose now that this individual then inserted some malicious code into his compiler. The effect of this could be to insert a back door of some kind in everything it compiled. Additionally, suppose it detected when it was compiling a new compiler and modified this new compiler to do much the same thing. Finally, to make it’s modifications undetectable, all memory and disk reads were modified to make it impossible to see the back door.
Such an attack would be undetectable, completely and utterly. Of course, it would also complicate this original compiler so much that peer review would immediately catch that something strange was going on. Unless they were all in on it… (-;
Additionally, after talking to this professor about it, I tried to figure out how, exactly, I would do this, and I’m stumped. To insert a back door in all compiler output would be simple enough, however to detect that a new compiler was being built, and then modify it appropriately… I don’t that this is possible. In a fairly abstract sense, all programs are compilers, that is they take some input in one language and translate it to another language. True, there are certain ways of writing compilers that are used fairly often (e.g. finite state machines to extract lexemes, recursive descent parsing, etc.) however there’s no reason one must build a compiler in these ways. And then there’s the whole question of how to modify the code to be just as malicious as it’s predecessor. Essentially, this first generation compiler would have to understand how it’s input would function, and such semantic analysis would be impossible, I think.
So, while it’s a fairly nifty idea, the notion of a fundamentally untrustworthy computer infrastructure, due to a single bad compiler, is quite silly.
You’re thinking of Ken Thompson’s classic paper Reflections on Trusting Trust.
The problem would be to adapt the original malign
code for compilers compiled on new (and different) hardware.
It’s a scary thought, but it’s only just that–a thought.
Would make for an interesting novel ;)
You’re right about the complexity of semantic analysis, this sounds impossible, especially considering there has been no real improvemnts in “semantic intelligence” since the 70′ies, where everybody thought the computer would someday soon outsmart us.
I must agree with James, that if you can solve the platform/hardware adaptation problem, this threat would be much more plausible.
Remember however, that past problems, which seemed impossible to solve, have been solved, like the Poincare Conjecture.
Btw, the paper: http://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf