Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes.

Given the string so far, see how probable it is that the generator would generate the next character, p(x_n | x_<n). Running through the whole string you can build up the log probability of the whole string: log p(x) = \sum_n log p(x_n | x_<n). Comparing the log probabilities under different models gives you a language classifier. For a first stab at the one-class problem, compare the log probability to what the model typically assigns to strings it randomly generates.

For more on information theory, modelling and inference you might like: http://www.inference.phy.cam.ac.uk/mackay/itila/book.html



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: