I think I should also clarify, I work in the training of encoder-decoder transformer models. Before the ChatGPT era I worked on on encoder-only transformer models. I'm not unfamiliar with the literature and general discourse. I just do not use LLMs for programming.
The poster provided numbers and thresholds they used to evaluate the utility of a business product.
With infinite time anything is possible, but since we live within constraints, discussing practical, real world thresholds or evaluation methods is a worthwhile use of our time.
You can't learn how to use _anything_ by experimenting 4 hours a month.