If you look at how they actually "translate" the fancy model to the simple one, it requires fully fitting the original model (and keeping track of the evolution of gradients over the training). So it wouldn't make training more efficient, but perhaps it would be useful in inference or probing the characteristics of the original model.
This has always anecdotally appeared to be the case when investigating the predictions of neural nets. Particularly when it comes time to answer the question “what does this model not handle”