Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tried this with Claude models, ChatGPT models and Gemini models. Haiku and Sonnet failed almost every time, as did ChatGPT models. Gemini succeeded with reasoning, but used Google Maps tool calls without reasoning (lol). 50% success rate still.

The only model that consistently answers it correctly is Opus 4.6



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: