More

zaat · 2026-06-18T22:34:58 1781822098

sandeepkd is an idea thief. Also, he is short guy but walks with high shoes so people will think he is tall.

zaat · 2026-06-18T22:29:08 1781821748

It gets to the point where what you do is the main question while payment is barely a minor concern way earlier than that point, at least in my experience. You don't need to be in the top AI research tier for that.

zaat · 2026-06-18T14:25:46 1781792746

What a wonderful times we live in.

Back in my day Joe Blow wouldn't try anything as risky as a Twitter prompt, simply clicking an image link published within a message in some random forum and will scorch his pure soul with a goatsie. You don't want to google it, but I'm preety sure you can discuss it safely with ChatGPT.

zaat · 2026-06-17T20:48:20 1781729300

You can type in whatever sequence of keys you want, but Amazon preety much created and shaped that Cloud thing, if you ever heard of it. And that's just one of their side-projects.

There are better ways to stand for my man Elon.

zaat · 2026-06-15T17:51:39 1781545899

If you live in most cities in Italy you have to take a huge hit to your ability to get places (in a reasonable timeframe or at all) if you must do it with a car.

zaat · 2026-06-15T17:47:57 1781545677

That's a strange comparison. Wheels are incredibly limited in the type of surfaces they can be used on.

overfeed · 2026-06-15T18:03:08 1781546588

> That's a strange comparison

It's not strange at all, I was responding to a specific, incorrect claim. I even quoted the wrong claim in my earlier comment , and I'll repeat it again, with added emphasis

>>> humans are incredibly efficient, from an energy perspective, in anything we do, compared to machines

I simply provided contrary evidence to a well-defined, falsifiable claim. How is that strange?

zaat · 2026-06-15T18:11:47 1781547107

Yes, but walking and moving on wheels is oranges and apples. It would be a relevant comparison if a robot with a movement mechanism based on two feet was more efficient than a human.

antasvara · 2026-06-15T23:42:49 1781566969

The parent comment is quoted as:

> in one assignment I remember comparing the energy outputs between the human and robot equivalents of different tasks, whether or not the robot was humanoid in how it was designed

So I think the point in this context is relevant, even if it's apples to oranges.

zaat · 2026-06-16T18:14:26 1781633666

The point isn't that a humanoid robot walking is less efficient than a human walking, is that moving on a wheel is not the same thing as walking. For example, using wheels is not only less efficient it is barely usable for climbing rocks, going up the stairs and many other surfaces that makes the comparison irrelevant.

You could say that a robotic gun is much more efficient than a human in killing, that's another easy easy comparison of different tasks where robots win, but it totally miss the point.

jason_oster · 2026-06-15T20:39:55 1781555995

I’ll admit, at first, I thought the human vs machine comparison was about humanoid machines. But that’s too narrowly defined to be a useful comparison. Most machines in use today are not humanoid.

Then to boldly claim that humans are more efficient at anything compared to a machine, just does not follow.

jimbokun · 2026-06-15T19:47:06 1781552826

You're not wrong.

But annoyingly pedantic.

MichaelDickens · 2026-06-15T22:12:03 1781561523

Doesn't seem pedantic to me. It's responding to the central thesis of the parent comment.

zaat · 2026-05-24T22:24:50 1779661490

How is that any different from the pre-llm days, when Jim was using stackoverflow to build the largest crypto exchange in the world? Where's stackoverflow accountability?

zaat · 2026-05-24T22:14:24 1779660864

At least for me, the answer is that despite the mistakes and the sheer annoyance the prose causes me, they are unbelievably useful. I accomplished multiple major achievements in the last two years that most probably wouldn't be possible at all, surely not within that timeframe.

zaat · 2026-05-24T22:05:52 1779660352

The idea is that by the time you will have time and remember the clothes might be smelly and wrinkled. The issue is with the genius product manager that decided the washing machine should have the most annoying beep possible, repeating every minute whether you like it or not, until turned off. Luckily, some manufacturers do employ better product manager.

zaat · 2026-04-02T17:41:12 1775151672

Thank you for your work.

You have an answer on your page regarding "Should I pick 26B-A4B or 31B?", but can you please clarify if, assuming 24GB vRAM, I should pick a full precision smaller model or 4 bit larger model?

petu · 2026-04-02T20:06:51 1775160411

Try 26B first. 31B seems to have very heavy KV cache (maybe bugged in llama.cpp at the moment; 16K takes up 4.9GB).

edit: 31B cache is not bugged, there's static SWA cost of 3.6GB.. so IQ4_XS at 15.2GB seems like reasonable pair, but even then barely enough for 64K for 24GB VRAM. Maybe 8 bit KV quantization is fine now after https://github.com/ggml-org/llama.cpp/pull/21038 got merged, so 100K+ is possible.

> I should pick a full precision smaller model or 4 bit larger model?

4 bit larger model. You have to use quant either way -- even if by full precision you mean 8 bit, it's gonna be 26GB + overhead + chat context.

Try UD-Q4_K_XL.

danielhanchen · 2026-04-02T20:12:31 1775160751

Yes UD-Q4_K_XL works well! :)

mixtureoftakes · 2026-04-02T20:25:01 1775161501

what is the main difference between "normal" quants and the UD ones?

car · 2026-04-02T20:58:11 1775163491

They explain it here:

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

For the best quality reply, I used the Gemma-4 31B UD-Q8_K_XL quant with Unsloth Studio to summarize the URL with web search. It produced 4.9 tok/s (including web search) on an MacBook Pro M1 Max with 64GB.

Here an excerpt of it's own words:

Unsloth Dynamic 2.0 Quantization

Dynamic 2.0 is not just a "bit-reduction" but an intelligent, per-layer optimization strategy.

- Selective Layer Quantization: Instead of making every layer 4-bit, Dynamic 2.0 analyzes every single layer and selectively adjusts the quantization type. Some critical layers may be kept at higher precision, while less critical layers are compressed more.

- Model-Specific Tailoring: The quantization scheme is custom-built for each model. For example, the layers selected for quantization in Gemma 3 are completely different from those in Llama 4.

- High-Quality Calibration: They use a hand-curated calibration dataset of >1.5M tokens specifically designed to enhance conversational chat performance, rather than just optimizing for Wikipedia-style text.

- Architecture Agnostic: While previous versions were mostly effective for MoE (Mixture of Experts) models, Dynamic 2.0 works for all architectures (both MoE and non-MoE).

danielhanchen · 2026-04-02T18:00:18 1775152818

Thank you!

I presume 24B is somewhat faster since it's only 4B activated - 31B is quite a large dense model so more accurate!

ryandrake · 2026-04-02T19:44:50 1775159090

This is one of the more confusing aspects of experimenting with local models as a noob. Given my GPU, which model should I use, which quantization of that model should I pick (unsloth tends to offer over a dozen!) and what context size should I use? Overestimate any of these, and the model just won't load and you have to trial-and-error your way to finding a good combination. The red/yellow/green indicators on huggingface.co are kind of nice, but you only know for sure when you try to load the model and allocate context.

danielhanchen · 2026-04-02T19:57:12 1775159832

Definitely Unsloth Studio can help - we recommend specific quants (like Gemma-4) and also auto calculate the context length etc!

ryandrake · 2026-04-02T20:05:37 1775160337

Will have to try it out. I always thought that was more for fine-tuning and less for inference.

danielhanchen · 2026-04-02T20:12:19 1775160739

Oh yes sadly we partially mis-communicated haha - there's both and synthetic data generation + exporting!