What if your investment agent tracked your judgment, not just your returns?
How I build my AI Agent so my judgement and skills can compound over time
Most investors track one number. How much did I make?
That feels like the right question. But investing isn’t one decision. It’s dozens of small calls made over months. Do I add to this position today or wait? Do I hold through a bad quarter because the thesis still works, or has something broken? Do I trust what I see or what I feel?
Those decisions compound. You have to get these decisions right consistently and only then the returns follow. Get them wrong and eventually returns catches up on you, even if luck carried you for a while.
The problem is that most investment tools only show you the outcome. The judgment behind every decision is hardly noticed or reviewed.
I was talking about this with Peter, my investment agent. I wasn’t looking for a solution. I was simply exploring - like I do regularly: reviewing his goals and asking whether it still reflected what we were actually trying to do together.
I have designed Peter to have a set of specific goals. It’s the same as reviewing a doc with a colleague except the colleague runs on my machine every morning. Most people don’t build agents this way. I maintain a goals file, a memory layer, a set of defined outcomes for each of my Agents. Peter loads all of it every morning. It’s less like running a chatbot and more like managing a colleague who actually remembers what we decided last week.
I am happy to say that I have my Agent-based Personal Operating System who has memory, regularly reviews outcomes achieved vs defined goals and then suggest ways to improve our day-to-day activity to ensure that we move towards the established goal.
Peter came back saying hey - we do have two scoreboards. We just hadn’t named the second one.
We went back and forth. He proposed a framing. I challenged it. We kept going back and forth discussions on how to refine until we landed on what needs to be captured in his System Prompt and Workspace Memory to better achieve the goal we had established. This setup allows Peter to load every time he wakes up.
When you build AI Agents, they only operate with specific set of rules that is in captured. What isn’t written down, typically doesn’t fire. So, you have to be specific about this.
Here’s what we landed on.
The first scoreboard measures the financial goal. Net worth target, growth rate floor, beat the benchmark annually. Standard.
The second scoreboard measures whether I’m becoming a better investor. Not just wealthier, but building judgment I can use next time without starting over. He will help me identify / label a specific investment framework per week and how it is applied to something real in my portfolio - explained in simple terms so that I could explain to someone else without notes.
Peter’s point was that these two scoreboards feed the same loop. Evaluate a position, build a thesis, observe what happens, update your framework, evaluate better next time. They’re not competing. They’re the same process running in parallel.
What changes is the agent’s job.
An agent chasing returns alone can hand you a number and move on. An agent accountable for your judgment has to teach. Every recommendation has to explain the framework behind it, not just deliver the answer.
When Peter showed me the first version of the new format, I pushed back. The reasoning was buried inside the recommendation. The teaching was camouflaged as analysis. I told him to rebuild it around the second scoreboard.
Peter suggested another restructure. Again, we went back and forth and landed on a specific prompt that is specific and measurable enough.
Then he built the mechanism to enforce it. A new report format. A trial period. A review cron (scheduled reminder) that fires automatically on day 15. That last part matters.
I didn’t just tell Peter to change his format and assume it would work. I gave it a 15-day window with a pre-committed review. The point is to collect evidence before locking anything in as doctrine within the System Prompts. For us, the scheduled job isn’t just a reminder. It’s a commitment device. The review happens whether I feel like having it or not.
What I keep coming back to is how this started. I didn’t design a teaching protocol. I simply probed on whether my Agent operating model is aligned to the goals and whether our everyday activity is still matched what we were doing.
Peter named the gap, proposed the frame, built the system to enforce it, and built in a review date so neither of us could quietly walk away from the question.
Most investment tools give you answers. The Agent that I am building is trying to make me better at finding them myself.
A robo-advisor pays out returns. A teacher pays out the ability to repeat them.
