Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TLDR: A model that could be described as "3.8 level" that is good at math and openly available with a custom license.

It is as fast as 34B model, but uses as much memory as a 132B model. A mixture of 16 experts, activates 4 at a time, so has more chances to get the combo just right than Mixtral (8 with 2 active).

For my personal use case (a top of the line Mac Studio) it looks like the perfect size to replace GPT-4 turbo for programming tasks. What we should look out for is people using them for real world programming tasks (instead of benchmarks) and reporting back.



What does 3.8 level mean?


My interpretation:

- Worst case: as good as 3.5 - Common case: way better than 3.5 - Best case: as good as 4.0


Gpt-3.5 and gpt-4




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: