Looks neat, but I'm not sure how useful this will be in practice. If your proble...

splat · on May 3, 2011

In some disciplines I could see it being useful. (Astronomy comes to mind, but only because I happen to be an astronomer.) Oftentimes we need an empirical fit to some data and we don't really care why exactly the fit has the form it does. For instance, you might want to know what the density of a galaxy cluster is as a function of radius. Perhaps you just have an obsession with density profiles, but more likely you need to know what the density profile is for some other purpose (maybe you're looking at the evolution of radio jets in the cluster). In this case you don't really care if your density profile has the correct theoretical function form that a density profile should have; you just care that the empirical fit you use is a close match to the data.

tel · on May 3, 2011

Who's to say what humans do is significantly different from what Eureqa does? We constantly attack our models with new data, but effectively are using data to generate and test new models in the same way Eureqa is. It's certainly a simplified process of equation generation, and model comparison is a significantly open field of statistics, but the rudimentary procedure isn't especially strange.

yid · on May 3, 2011

Eureqa doesn't help in this regard. As the OP said, you will eventually find a giant model that is a brilliant fit for the data. What you need is some notion of regularization. To answer your comment, it doesn't deal with overfitting.

aterimperator · on May 3, 2011

Except that it does. I'm not too familiar with the application itself, however I know the original research worked by finding pareto optimums based on the predictive power and the simplicity of the equation.

So for double pendulum it came back with like 8 equations, one of which was conservation of momentum (not actually accurate, but as close as you can get with that number of terms), and one of which was conservation of energy (actually accurate, but a bit more complicated).

dimatura · on May 4, 2011

I'm pretty sure that it does have regularization. The optimization process penalizes model complexity in addition to maximizing the fit to the data.

aterimperator · on May 3, 2011

But isn't the purpose of cross-validation to avoid over-fitting and cherry-picking? Asked another way: how does you argument specifically single out Eureqa, instead of all of machine learning?

dimatura · on May 4, 2011

Eureqa does model selection as part of the optimization, so that would be an advantage over vanilla linear regression. Of course, there are a lot of specialized model selection techniques for linear regression that would probably be better. Using Eureqa for a linear model doesn't really make sense anyway. I think it would be mainly useful for exploratory data analysis.