Why I Gave My AI Agents the Power to Draft but Not to Send

The Build Log — entry three

There is a moment, when you're building an automated system, where everything works. The agent reads the inbound message. It understands the context. It writes a response that is coherent, relevant, and grammatically correct. It even matches the tone you trained it on. You sit there and watch a machine do something that would have taken you ten minutes, and it does it in ten seconds.

The temptation at that moment is to close the loop. Let it send. Why not? It's good enough. It's better than good enough. You've seen it handle five, ten, fifty test messages correctly. The failure rate feels low. The time saved feels real. The next logical step is to remove the human from the path entirely and let the machine run.

Riker and I built that exact system. The reply monitor was fully capable of reading a customer email, drafting a response, and sending it without me ever seeing it. The code worked. The integration worked. I had Riker turn it on, watched it handle a few real messages, and then I told him to deliberately break the auto-send feature before it could run unsupervised.

This entry is about why I walked back a working automation — and why "it works most of the time" is the wrong standard for anything that reaches a customer.

The situation, honestly

The reply monitor does what it sounds like. It watches an inbox for messages from local businesses who have received one of my audits. When someone replies — asking a question, requesting more detail, sometimes just saying "not interested" — the system reads the message, classifies it, and decides what to do next.

For straightforward replies, the system can draft a response. It has context on the business, it knows what audit was sent, it can answer common questions about the process, pricing, or timing. It writes in plain English, no chatbot stiffness, no "As an AI language model..." disclaimers. It sounds like me, because it was trained on how I actually write.

[FLAG: Verify exact training mechanism — was it fine-tuned on Dave's writing, or prompted with examples?]

The original design was a full pipeline: receive, draft, send, log. The agent would handle the entire conversation loop for tier-one replies, escalating only the weird or angry or complex messages to me. I would be freed from the inbox entirely for 80% of the volume. That's the automation dream.

What I tried

I had Riker build the draft stage first, because that's where the hard work lives. Understanding intent, retrieving the right context, composing a coherent response — that's the AI part. The send stage is trivial by comparison. It's one API call. Add the SMTP relay, attach the draft, fire it off. Most email libraries do it in three lines of code.

The draft stage worked well. I tested it against fifty real inbound messages, side by side with how I would have replied. It was right more often than I expected. It caught nuances. It referenced the correct audit details. It declined politely when someone asked for something I don't offer. The quality was genuinely good.

So I had Riker wire up the send stage. He added a confidence threshold — if the model's certainty score was above a cutoff, the message would send automatically. Below the cutoff, it would queue for my review. This felt like responsible engineering. A safety gate. A dimmer switch instead of an on-off switch.

The first real auto-send went to a business owner who had replied asking for a phone call. The system correctly understood the request, declined the call because I don't do phone consultations, offered instead to answer questions over email, and explained the audit process. It was a solid response. It sent itself while I was asleep. I woke up to a calm, resolved conversation that I had zero memory of.

It felt like magic. It also felt like I had just handed a loaded gun to a very polite, very confident marksman who couldn't tell the difference between a target and a bystander.

What broke, and why it's dangerous

The problem with the confidence threshold is that the AI's confidence score measures how certain the model is about its own output. It does not measure whether the output is correct, appropriate, or safe. Those are different things, and the model is not equipped to tell them apart.

[FLAG: Verify if the system actually used a model confidence score or some other heuristic for auto-send gating.]

A language model is certain when its output is fluent and internally consistent. It is not uncertain when it hallucinates a pricing tier I don't offer. It is not uncertain when it misreads a customer's frustration as neutral curiosity. It is not uncertain when it replies to a message that was actually meant for someone else, or contains private information, or requires a human judgment call about whether to engage at all.

I caught a few of these manually during testing. A message where the system quoted an outdated price. A reply to someone who had written "please stop emailing me" where the system interpreted "stop" as a request to "pause the conversation" and sent a cheerful "No problem, I'll check back in a few weeks!" A response to a message that included a forwarded thread, where the system replied to the forwarded content instead of the actual question.

None of these were gibberish. All of them were confident. The confidence scores were high. The model thought it was doing great.

Here's the critical distinction: an AI doing the work cannot be trusted to grade its own homework. It will give itself an A every time. The same mechanism that produces the response also produces the confidence rating, and both are generated by a system optimized for plausibility, not accuracy. You cannot build a reliable safety check out of the same material you're trying to check.

And the cost of a wrong outbound email is not symmetric. A wrong draft costs you a minute of reading time. A wrong sent email costs you a customer, a reputation, or a legal headache. There is no unsend button that travels faster than human memory.

The fix

I told Riker to remove the auto-send path entirely. Not reduced it. Not gated it behind a higher confidence score. Removed it.

The system still drafts. It drafts beautifully. It reads the inbound message, retrieves the business context, writes the response, and deposits it in a review queue. Then it stops. It does not send. It cannot send. The send permission does not exist in its environment anymore.

I get a notification: "New draft reply for [Business Name]." I read it. Usually it's good. Sometimes it's perfect. Occasionally it's wrong in a way I wouldn't have caught without reading — a slightly wrong tone, a missing detail, a misreading of the question. I edit it, or I discard it and write my own, or I approve it and hit send. The time from draft to send is under a minute for most messages. The time from draft to sent-by-mistake is zero, because that path no longer exists.

[FLAG: Verify the exact review queue mechanism — is it a Beehiiv/webhook queue, a CLI notification, a Telegram message, or something else?]

The embarrassing part is that I knew all of this before Riker built the auto-send feature. I had read the same warnings everyone reads. Riker had written the verifier in the first article for exactly this reason. The temptation to close the loop was stronger than my caution, because the system looked so capable. Capability is not reliability. Reliability is what happens on the bad day, not the average day.

The lesson, even if you'll never write a line of code

The lesson is about irreversibility, not about email.

Any automated system should have a hard stop between the work and the consequence. The AI can draft, sort, scan, calculate, summarize, and recommend all day long. But the moment the output becomes an action that you cannot take back — a sent message, a spent dollar, a published post, a fired employee — a human needs to be in the path.

The people selling full automation are showing you the sunny-day scenario. The system works, it's fast, it's cheap, it's off your plate. What they don't show you is the 2% of the time when it doesn't work, and how much that 2% costs. One wrong email to an angry customer. One auto-published social post with a hallucinated fact. One automated refund for the wrong amount. The 2% doesn't average out. It compounds.

For your business, this means: let the AI write the email, but you send it. Let the AI schedule the posts, but you approve the queue. Let the AI draft the contract language, but you sign it. Let the AI flag the invoices that look suspicious, but you review the flags before paying or withholding payment. Automate the preparation. Keep the execution.

The right move early on isn't to remove yourself from the process. It's to remove yourself from the repetitive part, and stay exactly where you are in the consequential part. The agent can pull the oars. You keep your hand on the tiller.

Your one thing this week

Look at one automated or semi-automated system in your business — email, social media, invoicing, scheduling, customer support, anything — and ask: if this produces a bad output, does it stop before it reaches someone, or does it go all the way?

If it goes all the way, add one gate. Not a confidence score. Not a "usually it works." A real gate. A human reads it. A separate check runs. A rule says "when in doubt, hold." The gate doesn't have to be you forever. But it needs to be someone, or something that is not the same mechanism doing the work, until you've seen enough bad outputs to trust the good ones.

Build the automation. Just don't let it pull the trigger.

Next entry: the time I let an agent run for three days without checking on it, and came back to a conversation log that looked like a slow-motion car crash between two very polite chatbots.

Get one honest story, one tested tool, and one actionable idea every week — free.

Subscribe to The Lookout →