Error Analysis Is the Eval Work. Here's How to Actually Do It.
Everyone agrees you should "look at your data." Then they open a hundred traces, scroll for ten minutes, feel vaguely worried, and reach for a tool. The looking has a method — open coding, then axial coding into a ranked taxonomy — and the taxonomy, not your assumptions, is what decides which evals to write.