Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Looks neat, but I'm not sure how useful this will be in practice. If your problem domain seems to be linear, Eureqa won't offer any advantages over linear regression. If your problem domain is non-linear, Eureqa won't be able to answer the question of where the heck the sine and cosine terms are coming from, and might induce people to make up fudgy theories to explain their origin. I'm reminded of Arthur Eddington, who explained why the fine structure "must" be 136, then when better measurements were taken explained why it "must" be 137:

http://en.wikipedia.org/wiki/A._S._Eddington#Fundamental_the...

I could see Eureqa being useful 100 years ago, when everyone was scratching their heads over the blackbody radiation data and nobody realized there was a big fat e^nu term in the denominator keeping us all from giving off infinite quantities of gamma rays. (Thanks Planck.)

Anyway, not to be a hater, but in disciplines where statistical significance is valued, Eureqa is useless, because by cherry-picking models it completely invalidates significance tests.



In some disciplines I could see it being useful. (Astronomy comes to mind, but only because I happen to be an astronomer.) Oftentimes we need an empirical fit to some data and we don't really care why exactly the fit has the form it does. For instance, you might want to know what the density of a galaxy cluster is as a function of radius. Perhaps you just have an obsession with density profiles, but more likely you need to know what the density profile is for some other purpose (maybe you're looking at the evolution of radio jets in the cluster). In this case you don't really care if your density profile has the correct theoretical function form that a density profile should have; you just care that the empirical fit you use is a close match to the data.


Who's to say what humans do is significantly different from what Eureqa does? We constantly attack our models with new data, but effectively are using data to generate and test new models in the same way Eureqa is. It's certainly a simplified process of equation generation, and model comparison is a significantly open field of statistics, but the rudimentary procedure isn't especially strange.


Eureqa doesn't help in this regard. As the OP said, you will eventually find a giant model that is a brilliant fit for the data. What you need is some notion of regularization. To answer your comment, it doesn't deal with overfitting.


Except that it does. I'm not too familiar with the application itself, however I know the original research worked by finding pareto optimums based on the predictive power and the simplicity of the equation.

So for double pendulum it came back with like 8 equations, one of which was conservation of momentum (not actually accurate, but as close as you can get with that number of terms), and one of which was conservation of energy (actually accurate, but a bit more complicated).


I'm pretty sure that it does have regularization. The optimization process penalizes model complexity in addition to maximizing the fit to the data.


But isn't the purpose of cross-validation to avoid over-fitting and cherry-picking? Asked another way: how does you argument specifically single out Eureqa, instead of all of machine learning?


Eureqa does model selection as part of the optimization, so that would be an advantage over vanilla linear regression. Of course, there are a lot of specialized model selection techniques for linear regression that would probably be better. Using Eureqa for a linear model doesn't really make sense anyway. I think it would be mainly useful for exploratory data analysis.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: