Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks great. Can you tell us how you built it? I'm most interested in automatic categorization of submitted articles.


Hey, thanks! It's a bit of deep learning magic combined with a few days of manual labor tagging the training set :) The classifier itself is an LSTM and runs on TensorFlow. As said, this is a proof of concept so we'll try to improve it over time.


I have a similar classification project and I used word2vec to build embeddings from 1GB+ of text, then I just do vector similarity between the article and the topics.

The vector of an article can be obtained by summing the vectors of its words (minus stop words). For a topic you just sum up 5-10 of the topic keywords. You don't need to exhaustively list all the topic keywords because word2vec automatically maps them in close vicinity.

This system has the advantage that you don't need a training dataset. It's unsupervised learning coupled with a small amount of supervised topic pointers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: