Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

newton's method?


I don’t know any of machine learning but I’ve heard folks in some fields will prefer a gradient descent algorithm even though it doesn’t have nice convergence speed because the Hessian (Hessian approximation) will be too huge or difficult to calculate


There are always tricks to keep the problem size small for quadratic methods(such as L-BFGS). In ML stochastic gradient descent (a bad name for doing gradient descent on batches of the data at a time) seems to have a bit of magic in it's ability to go uphill to escape local minima as well as letting the problem scale to much larger sets of data.


Newton's method finds zeros of functions, which is different from optimisation (finding maximums or minimums). If you use Newton's method on a function which is the derivative of some other function you're trying to optimise, then it will find critical points, which may be the optimums you're looking for (or inflection points, saddles, etc.).


This is why you actually do something like Levenberg-Marquardt, which combines a gradient descent with a trust region of quadratic optimization, which is varied depending on if the cost function is decreasing or not. And of course it is iterative, so a single bad step doesn't destroy everything. The advantage comes from faster convergence in practice.


Do you know of any good literature on LM or trust region? I'm interested in the details.


I strongly recommend reading the documentation for Ceres Solver [1]. It is intuitive and in-depth, and references everything you will need.

1. http://ceres-solver.org/nnls_solving.html#trust-region-metho...


Newton's method requires calculating the second derivative, which is quite a bit more expensive than SGD, especially with many parameters.


By Newton’s method, do you mean simple gradient descent?


no.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: