Great explanation. I would like to add to this, that held-out data is often used in Bayesian learning too - for example, in cases when you intentionally over-specify the model (adding more parameters than might be needed) because you don't really know what the best model might be. The inference goes until the likelihood on held-out data keeps increasing. Example, gesture recognition in Kinekt. If someone finds this info useful, I also recommend Coursera course on Probabilistic Graphical Models.