How to tell your agent its design is wrong

Jun 08, 2026

Last Monday night, I read the first draft of a new report format that Peter, my investing agent, had written. The format was supposed to do two things: summarize my portfolio every day and teach me one thing about investing along with the summary. Sample emails, a teaching protocol, a proposal for how it wanted to communicate with me going forward. The form was clean. The headers were where I’d put them. The voice was close to mine.

I simply responded that the whole design was wrong.

Not “make it better.” Not “tighten it up.”

I named the structural failures, pointed out a section where I wanted a 60% cut, and then I wrote three sample emails myself, longhand, to show it the shape I actually wanted. By the next morning the agent shipped v2, and the format it produced was different enough that it felt like a different tool.

Most people give AI tools feedback the way they’d compliment a coworker. “This is great, maybe a bit shorter?” “Can you make it more concise?” “Less robotic, please.” Language models treat that input as a request to optimize at the margin. They adjust word count, soften a sentence, swap one verb for another. The output gets slightly different and exactly as wrong as before, because folks don’t say enough to the model that the wrongness was in the design, not in the wording.

The trick is to be specific about the layer that’s broken.

When I read Peter’s draft, the things I wrote down were not stylistic. They were structural.

**No portfolio-level view** existed anywhere in the output. I could read three name-level entries and never see the picture of what I owned.

**The watchlist was treated as a long catalog** where every name got 395 words of equal treatment, when what I needed was a ranked queue with the top three names getting real attention and the bottom twenty getting one line.

**The teaching was camouflaged as reasoning.** There was a Q1-through-Q5 chain that was supposed to teach me something general, but by the third name the chain was repeating itself almost word-for-word, and I’d started reading it as form and skipping past the lesson.

The form was sound. The substance was buried.

Without explicitly naming the structural failures, the feedback would have been “this is too long and reads the same after a while,” and the agent would have come back with a 15% word count reduction and exactly the same problems.

Then I did something that took longer than the critique itself: I wrote actual samples.

Three full sample reports, written by me, in plain prose.

**The daily version** was 60 seconds long and taught one specific thing about the market.

**The weekend version** was 250 words to explain one specific framework.

**The deep-dive version** only existed when there was a real decision on the table.

I wrote them by hand because the shape of the output was harder for me to describe in prose than to show in 200 words of working example.

I had a thought while writing them that I want to say out loud, because it changed the whole project. I’d been trying to fix the per-name template, asking how each individual entry should be structured. Halfway through writing the second sample I realized that wasn’t the problem. The per-name template was fine. What was broken was the layering above it: the portfolio view that should sit above the names, the queue logic that should rank the names, the teaching dose that should sit at exactly the level of attention I was bringing to that cadence.

Daily attention is cheap, so daily teaching has to be cheap and short. Weekend attention is more expensive, so weekend teaching can be longer. Deep dives only get written when a decision is at stake, so their teaching is decision-shaped. The lesson lives inside the cadence that already has the reader’s attention. It doesn’t sit in a separate file. It doesn’t get its own header. It’s the highest-leverage section inside the document that was already getting opened.

That principle wasn’t in my critique. It surfaced while I was writing the samples. The critique gave me the wrong diagnosis; the samples gave me the right one. This is part of why writing the examples by hand is worth the time. You think you know what you want until you try to produce it, and then the gap shows up.

By morning the agent had shipped v2. Portfolio view at the top. Watchlist ranked, not catalogued. Teaching dosed by cadence. The new format is the one I read now, every day.

So the feedback technique, named:

Don’t tell the agent the output is bad. Tell it which layer is broken. Name the structural failures. Give a concrete budget on the dimensions that matter, words, count, length, frequency. Then write enough of the output yourself, by hand, that the agent can see the shape. The samples are not for the agent’s training data. They’re for your own clarity. You will discover what you actually want by writing it.

The format the agent ships when it understands the structural layer is a lot better than the format it ships when you tell it to clean things up.

Compounding Decisions for the Age of AI

Discussion about this post

Ready for more?