Benchmark

LLM Benchmarks: Is It Worth ($$) Mixing 2 Models? (Planner + Executor)

April 25, 2026

LLM Coding Benchmark (April 2026): GPT 5.5, DeepSeek v4, Kimi v2.6, MiMo, and the State of the Art

April 24, 2026

LLM Benchmarks Part 2: Is It Worth Combining Multiple Models in the Same Project? Claude + GLM??

April 18, 2026

Testing Open Source and Commercial LLMs - Can Anyone Beat Claude Opus?

April 5, 2026

Rant - Will LLMs Evolve Forever? Demystifying LLMs in Programming

May 1, 2025