Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doesn't this article seem to say that the size of the training set is related to the size of the resulting network? It should be proportional to the number of nodes/layers that the network is configured for, not proportional to the number of training instances. Am I missing something?


The network is sized to be able to learn the training data reasonably well (e.g. via hyper-parameter optimization). If there is too much variation in data that is not seen in the real application (like rotation of letters mentioned in the article), an appropriately sized network will still learn it, but would be an overkill for the application at hand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: