More

mzl · 2026-06-09T11:18:23 1781003903

Skills for creating good and repeatable benchmarking scripts.

A knowledge base for my research area, with tools for paper ingestion and search.

An md file to html presentation tool, there are several but this one helps me.

A review tool that splits a PR or branch intelligently into modules, and does per module reviews and global reviews for different aspects, and then summarizing that into a report. Can be used with multiple different harnesses. Written as a Python project, but build-time assembled into a single-file Python script with uv run --script shbang line.

mzl · 2026-04-28T14:13:40 1777385620

If you modify the work, that creates a derived work from whatever copyright the original works has, not a new work that is fully copyrightable.

As the article says in the Tl;DR at the top the code may be contaminated by open source licenses

> Agentic coding tools like Claude Code, Cursor, and Codex generate code that may be uncopyrightable, owned by your employer, or contaminated by open source licenses you cannot see

mzl · 2026-04-24T07:38:03 1777016283

Kimi K2.5 and K2.6 are both >1T

mzl · 2026-04-24T07:21:10 1777015270

It is tricky to build good infrastructure for prompt caching.

jatora · 2026-04-24T12:48:15 1777034895

Its as simple as telling your claude code to implement prompt caching!

mzl · 2026-04-22T09:25:58 1776849958

Which version of Kimi and served from where?

mzl · 2026-04-22T09:25:34 1776849934

Composer-2 is based on Kimi K2.5, but with extensive RL. Cursor estimated 3x more compute on their RL than the original K2.5 training run (some details in https://cursor.com/blog/composer-2-technical-report).

Composer-2 seems very useful in Cursor, while K2.6 according to AA seems to be a really useful general model: https://artificialanalysis.ai/articles/kimi-k2-6-the-new-lea...

dmix · 2026-04-22T13:32:50 1776864770

I used to hate on Composer 2 but I'm coming around to it. Opus for the big stuff and multi-file operations, Composer for all the small day-to-day IDE tasks works pretty good for me.

mzl · 2026-04-22T07:26:20 1776842780

I've been prescribed slightly more than 5g per day (2 x 650mg tablets every 6 hours) for pain after an operation jointly with ibuprofen, which is scarily close to the limits.

TheOtherHobbes · 2026-04-22T09:03:53 1776848633

I managed to overdose by accident with severe dental pain. Wasn't thinking straight, took about 8g - which is even more scarily close to the limits.

I'm fairly sure that caused some liver damage. I wasn't aware of anything apart from feeling a bit weird.

At the time, I had no idea it was potentially deadly.

mzl · 2026-04-20T12:56:13 1776689773

I've heard people saying the study is bad, but whenever I've asked about why the answers have been pretty bad. Do you have a good source for why we should disregard it?

mzl · 2026-04-20T12:53:57 1776689637

Dan Luu had some interesting analysis about car safety, comparing how different auto-makers fared on newly introduced crash tests: https://danluu.com/car-safety/

The main take-away for me from that page is that very few manufacturers seem to design for actual safety (only Volvo had good results), and Tesla was angry that a new test had been introduced which feels indicative of a bad safety culture.

mzl · 2026-04-01T07:54:08 1775030048

There was an interesting scandal in Sweden where Oracle managed to sell the Millenium system to a regions hospitals even though they did not fulfill the requirements, and then when it inevitably crashed and burned they had to do an emergency rollback to the previous system after just a few days.

Here is an article in English: https://www.heise.de/en/news/Scrapping-the-millennium-introd...