Worse than the chart crime of truncating the y axis is putting LLaMa2's Human Eval scores on there and not comparing it to Code Llama Instruct 70b. DBRX still beats Code Llama Instruct's 67.8 but not by that much.
> "On HumanEval, DBRX Instruct even surpasses CodeLLaMA-70B Instruct, a model built explicitly for programming, despite the fact that DBRX Instruct is designed for general-purpose use (70.1% vs. 67.8% on HumanEval as reported by Meta in the CodeLLaMA blog)."
To be fair, they do compare to it in the main body of the blog. It's just probably misleading to compare to CodeLLaMA on non coding benchmarks.