A professor of mine stated it very well. If you can imagine that there is a true model somewhere out in infinitely large model space then ML is just the search for that model.
In order to make it tractable, you pick a finite model space, train it on finite data, and use a finite algorithm to find the best choice inside of that space. That means you can fail in three ways---you can over-constrain your model space so that the true model cannot be found, you can underpower your search so that you have less an ability to discern the best model in your chosen model space, and you can terminate your search early and fail to reach that point entirely.
Almost all error in ML can be seen nicely in this model. In particular here, those who do not remember to optimize validation accuracy are often making their model space so large (overfitting) at the cost of having too little data to power the search within it.
In order to make it tractable, you pick a finite model space, train it on finite data, and use a finite algorithm to find the best choice inside of that space. That means you can fail in three ways---you can over-constrain your model space so that the true model cannot be found, you can underpower your search so that you have less an ability to discern the best model in your chosen model space, and you can terminate your search early and fail to reach that point entirely.
It seems like you can "mis-power" your model also.
For example, the Ptolemaic system could approximate the movement of the planets to any degree if you added enough "wheels within wheels" but since these were "the wrong wheels", the necessary wheels grew without bounds to achieve reasonable approximation over time.
Also, to add, when DGL bring this kind of mental model up they do so to motivate a kind of semi-parametric modeling where the design space changes progressively to move closer to the true model without growing so quickly as to make inference unstable. The problem being, of course, that this causes your algorithm run time to blow out to something cubic, I think, and so you have a beautiful model that loses out on search error.
Totally agree, though I'd call that maybe "picking the wrong shape" for your model space. You can pay a whole lot but if you cannot admit a shape that gets close to the truth then you're spending your data in vain.
> For example, the Ptolemaic system could approximate the movement of the planets to any degree if you added enough "wheels within wheels" but since these were "the wrong wheels", the necessary wheels grew without bounds to achieve reasonable approximation over time.
That would be an example of over-constraining your model (i.e. imposing the arbitrary constraint of a stationary Earth).
I don't think this is useful way to phrase the situation.
A system of Ptolemaic circles can approximate the paths taken by any system. So the system really isn't absolutely constrained to follow or not follow any given path.
You could claim you have constrained your model not be some other better model but that, again, seems like a poor way to phrase things since a more accurate model is also constrained not to be a poor model.
Even specifically, the Newtonian/Keplerian system has the constrain of the sun being stationary as much as the Ptolemaic system has the constraint of the earth being stationary.
Edit: As Eru points out, the Ptolemaic system basically uses the Fourier transform to represent paths. Thus the approximation is actually completely unconstrained in the space of paths, that is it can approximate anything. But by that token, the fact that it can approximate a given path explains nothing and the choices that are simple in this system are not necessarily the best choices for the given case, estimating planetary motion.
That's a good point, but after re-reading tel's original comment, I think my statement is still correct. Notice that tel's statement was that "you can over-constrain your model space so that the true model cannot be found". This doesn't necessarily mean constraining your model so that the true model is excluded from your parameter space. If your constraints technically encompass the true solution but only admit an overly complex parametrization of the solution, then it will still reduce (perhaps drastically) your power to find the true model. In this case, "overly complex" means unnecessarily many nonzero (or not almost zero) coefficients in the Fourier series.
My argument is that there are two kind of situations:
* The model could encompass the behavior of the input in a smooth fashion if it's basic parameters are relaxed.
* The model would tend to start finding models that are wildly different from the main model at the edges (space and time) if its parameter are relaxed, even if the model would eventually find the real model with enough input and training.
one has to handle these two conditions differently, right?
> (i.e. imposing the arbitrary constraint of a stationary Earth).
It's not really arbitrary--given the understanding at the time, there was no ability to measure the motion of the earth. In particular, stellar parallax which was understood as a contra-indication and too small to measure just yet. So a non-stationary Earth went against what they knew at the time rather strongly.
That said, relativity comes back and makes choosing a frame of reference arbitrary in the end, though some are easier to do physics in than others.
I wouldn't use the term "error" because some people might take that to mean there was a way to avoid these problems.
over-constraining your model space means having too few parameters in your model But fixing your data size, the "power" of your search goes down when you increase the number of parameters.
So it is no so much an issue of avoiding errors, but of choosing the right number of parameters for your model.
I called them errors because that's usually the technical term for them, but the real point, as always, is tradeoffs. The less "finite" a ML setup you want to buy, the less "error".
In order to make it tractable, you pick a finite model space, train it on finite data, and use a finite algorithm to find the best choice inside of that space. That means you can fail in three ways---you can over-constrain your model space so that the true model cannot be found, you can underpower your search so that you have less an ability to discern the best model in your chosen model space, and you can terminate your search early and fail to reach that point entirely.
Almost all error in ML can be seen nicely in this model. In particular here, those who do not remember to optimize validation accuracy are often making their model space so large (overfitting) at the cost of having too little data to power the search within it.
Devroye, Gyorfi, and Lugosi (http://www.amazon.com/Probabilistic-Recognition-Stochastic-M...) have a really great picture of this in their book.