While I do not know whether or not Watson actually does this, the paper doesn't mention anything about parallelism -- I suspect that the exploitation of structured parallelism is another handy benefit to using Prolog on as many machines as Watson was using.
There is so much ambiguity in language that parallelization is often possible at a very course-grained level. This usually doesn't require more cleverness than multi-processing or multi-threading.
Given that they switched from their own pattern matching language to an optimized WAM, I suspect they use one of the common Prolog implementations.
The Wired article (linked within under "can control the universe") seemed to suggest that he solved the problem by "channeling Jesus"...which seems to misrepresent that he was talking about his childhood training, rather than how he actually solved the conjecture.
And the Pravda article it links to, although it does give the correct version of what he said about Jesus walking on water, has some major craziness of its own in its last paragraph: "According to the newspaper, both Russian and foreign special services are showing interest in Perelman's discoveries. The scientist has learned some super-knowledge which helps realize creation. Special services need to know whether Perelman and his knowledge may pose a threat to humanity. With his knowledge he can fold the Universe into a spot and then unfold it again. Will mankind survive after this fantastic process?" Oy.
"...couldn't really put his finger on what it was that he really wanted but couldn't just do."
As an amusing literal interpretation of your metaphor, IIRC Leonardo actually painted the shadows around the Mona Lisa's mouth using his finger, which enabled his style of sfumato shading. So I suppose he was putting his finger on it...
I speak for myself here, but I don't think something needs to be the "Future of the World" for it to be an immensely fulfilling way to lead one's life, degree or not.
The post above is deleted because I wasn't satisfied with it (the book sounds pretty great and I didn't feel like my comment had that much relevance). I'll bring it back since it got a reply:
The title is misleading; the book seems to be mostly about "information" in the colloquial sense, and only peripherally about information theory. For anyone seriously interested in the latter, David Mackay's book Information Theory, Inference, and Learning Algorithms is available online. It's great.
While I agree that it can be unstable (inference can get stuck in local maxima), latent variable models like LDA can be used to rigorously evaluate textual categories (e.g. journal articles). We take for granted that the categories we set are "useful", in some sense, so it's interesting to see that quantitatively questioned.
(For example: http://lambda-the-ultimate.org/node/1867.)