My AI agent is confident about everything. That's not the same as right
Your AI agent sounds sure of itself. Here's what I do about it.
My investing agent had an opinion about how to find a cheap stock, and it was sure of itself. It usually is. That’s worth saying plainly, because it’s the part people miss about these tools: an agent will pick a position and defend it in clean, sober paragraphs whether or not the position is right. Confidence is its default setting, not a signal that it’s correct.
So I didn’t take its word and I didn’t overrule it. Both of those are just me guessing. Instead I gave it a clear question, two ways to answer it, and two weeks of real market days, and I let the data tell me what each one was actually worth.
Here’s the disagreement. The agent had two genuinely different ideas about what “cheap” even means.
The first one watches price. A name fires when it’s been beaten down hard off its recent highs and the rest of its sector is selling off too, the theory being that good companies get marked down on macro noise and that’s your entry.
The second one ignores price completely. It runs a discounted-cash-flow estimate, asks what the business is actually worth, and fires when the price is well below that number and the company’s competitive position is still intact.
One is “what’s falling.” The other is “what’s underpriced.” Those aren’t two settings on the same tool. They’re two different beliefs about the world, and the agent couldn’t tell me which one was right any better than I could.
The lazy move would have been to pick the one that sounded smarter in the moment and write it into the rules. I’ve done that before and paid for it. The idea that sounds best when you’re talking it over is often not the one that holds up once you watch it run. So instead both ideas ran side by side, one scan a day, every trading day for the window. I kept the second one silent the whole time, logging to a file but never posting, so it couldn’t quietly influence what I thought about the first. Each day wrote its own dated record, no memory of yesterday’s argument, just what each scan actually flagged. The agent’s job wasn’t to be right. It was to run the test and hand me the data.
What came back was more useful than either idea winning cleanly, because both of them embarrassed themselves first.
The price-based one found nothing for the first four days. The market was sitting at all-time highs, nothing was beaten down, so it just sat there. Honest behavior, and also a real weakness: a whole stretch where it’s structurally blind. The value-based one had the opposite problem. It found the same four names every single day and never moved off them. A static list pretending to be a daily scan. If I’d shipped either one alone, I’d have shipped its blind spot with it and not known until it cost me something.
Then the thing happened that made the whole exercise worth it. Near the end of the window, on three consecutive days, two names appeared in both scans at once. The price-based idea and the value-based idea, which look at completely different things and agree on almost nothing, pointed at the same two companies. That convergence was the real finding. Not “method A beat method B.”
The best signal the whole setup produced was that both methods landed on the same stock, and that happened because both were running. Shut one off to keep things simple and you lose the one thing the pair was uniquely able to tell you.
So the decision wasn’t the one I set out to make. I went in expecting to crown a winner and retire a loser. What the trial actually taught me was that “which one is right” was the wrong question. Each method was blind exactly where the other one worked, and the rare day they both fired on the same name was worth more than either one running alone. So I kept both. I just understood, finally, what each was for, and I understood it from what happened instead of from whichever pitch sounded better on a Sunday afternoon.
Here’s the part that changed how I work with these tools. The agent was never going to talk me to the right answer, and I was never going to argue my way there either. The answer came from giving it a clear goal, enough context to run a real test, and two weeks of actual data to run it against.
Goal, context, data. That’s the secret sauce to building a better AI agent. Get those three right and the agent turns into something that can find things you didn’t know to look for. Get them wrong, or skip them and just trust the confident summary, and you’ve got a very articulate way of being wrong.
So when an agent hands you a sure-sounding opinion, that’s not the end of the work. It’s the start of it. The opinion tells you what to test. The data tells you what’s true. The two weeks of results showed me one method that goes blind for specific context and another method that gets stuck on four names repeatedly, and they showed me two of those names lighting up in both scans at once, which is the kind of thing no amount of confident writing would ever have surfaced.
The agent didn’t give me the decision. The data did, because I gave the agent what it needed to go get it.
