← AIQuarterdeck
Build Log · Entry Two

My AI Reported a 9.9% Open Rate. The Real Number Was Half That.

The Build Log — entry two

A 9.9% open rate doesn't sound like much to celebrate if you've run email campaigns before. In the local service world, where inboxes are crammed with quotes and reminders and the occasional roof-repair flyer, it's actually decent. Not spectacular. But decent enough to mean the message is getting through.

I saw that number one morning in my dashboard. 9.9%. I remember thinking, okay, this thing has legs. The outreach system I'd spent weeks building — the one that scans local businesses, finds gaps in their online presence, and sends a plain-English audit — was getting opened. Real people were opening real emails. That's the first hard checkpoint in any outreach machine: does anyone even look?

I almost tweeted about it. I almost told a friend the system was "working." Then I did what I should have done first: I checked the math.

The real open rate was roughly 5%. My system had been double-counting every open from Gmail users, which, in my small local market, is most of them. The 9.9% was a lie my own dashboard told me, and I was two minutes away from repeating it to someone else.

The situation, honestly

The outreach flow is simple. A local business gets an email with a subject line like "A quick note about [Business Name]'s online presence." Inside is a short, specific observation — something my system actually found — and an invitation to see the full audit. No spammy promises, no cold-calling scripts. Just a real observation from a real scan.

To know if anyone cares, the email contains a tracking pixel. That's a one-pixel image hosted on my server. When the recipient opens the email, their mail client loads that image, and my server logs a hit. Count the hits, divide by emails sent, and you have an open rate. It's not perfect — some clients block images, some people read in preview panes without triggering it — but it's the standard rough measure everyone's been using for twenty years.

The pixel worked. The dashboard displayed a number. The number was wrong by a factor of two.

What I tried

Version one of the tracking was about as basic as it gets. The outbound email includes an <img> tag pointing to a URL on my server. The URL is unique per email, so when the server sees a request for that specific image, it records one open for that specific recipient. Simple. Reliable-sounding. The kind of thing you build in an afternoon and forget about.

[FLAG: Verify exact pixel implementation — was it a unique per-recipient URL or per-campaign?]

The server logged every request. No filtering, no deduplication, no questions asked. If something hit that URL, that was an open. The dashboard pulled the raw count, divided by sends, and displayed the percentage to one decimal place. 9.9%.

I had even built a little confidence interval around it, which is a fancy way of saying I showed the raw numbers underneath: 142 opens out of 1,436 emails sent. The math checked out. 142 divided by 1,436 is 9.9%. The problem wasn't the division. It was the 142.

What broke, and why it's dangerous

Gmail — and most major email providers — doesn't let your mail client load images directly from a stranger's server. Instead, it routes the request through Google's own image proxy. When your recipient opens the email, Gmail fetches the pixel on their behalf, caches it, and serves its own copy to the user. This is a privacy and security feature. It also means the request hitting my server says it's from Google, not from the person reading the email.

That part I knew. What I hadn't accounted for was the caching behavior.

[FLAG: Verify exact mechanism — was it multiple proxy requests per open, or something else like desktop + mobile clients both triggering?]

In practice, a single human opening an email once could generate multiple requests to my tracking URL. Gmail's proxy might pre-fetch. It might refresh the cache. The same recipient checking the email on their phone an hour after their laptop might trigger a second request, because mobile and desktop are separate cache contexts. My server saw two, three, sometimes more hits for what was actually one person glancing at an email over coffee.

And here's the part that should worry you if you're measuring anything in your own business: the numbers weren't obviously wrong. 9.9% is plausible. It wasn't 99%. It wasn't zero. It sat right in the band where you nod and think, "Okay, that's about right." A wrong number that looks right is far more dangerous than a number that's obviously broken, because you'll act on it. You'll scale up the campaign. You'll tell your partner it's working. You'll make decisions.

The fix

I didn't stop using a tracking pixel. The pixel is fine; the counting was naive. The fix was to stop treating every server request as an honest open.

First, Riker added a deduplication window. The same recipient — identified by the unique URL they were sent — can only log one open per 24-hour period. If Gmail's proxy hits that URL five times, it's one open. If the person reads it on their laptop and then their phone six hours later, it's still one open. I'm measuring "did this human see the email?" not "how many times did a server request this image?"

Second, I started logging the User-Agent strings that came with the requests. [FLAG: Verify if User-Agent logging was actually implemented or if some other signal was used to identify proxy traffic.] This let me separate obvious bot and proxy traffic from real mail-client opens. It's not foolproof — proxies don't always announce themselves clearly — but combined with the time window, it caught most of the inflation.

The corrected count for that same batch of 1,436 emails was 71 unique opens. Not 142. 5.0%, not 9.9%.

The embarrassing part: I could have spotted this earlier if I'd looked at the raw request log instead of the pretty dashboard number. The log was full of sequential hits from the same IP ranges, seconds apart. It looked like a machine doing machine things, not like 142 separate humans waking up and deciding to read my email. I didn't look because the dashboard was right there, and the dashboard said 9.9%, and 9.9% felt good.

The lesson, even if you'll never write a line of code

The lesson isn't about email pixels. It's about any number you look at to decide if something is working.

The easiest person to fool with a metric is the person who built the system that produces it. You want the number to be good. You built the thing, and you want it to work. That desire is a filter, and it colors what you see. A dashboard showing a green number you like is not a substitute for looking at the raw inputs and asking whether they make sense.

For your business, this might mean: the AI tool says it saved you twelve hours this week. How is it measuring that? The analytics platform says your ad drove fifty conversions. Are those fifty people who bought, or fifty people who clicked a button and bounced? The scheduling tool says your response time is under an hour. Is that because you're fast, or because it auto-sends a "we got your message" reply that counts as a response?

Before you scale anything — before you spend more money, before you tell a client it's working, before you even congratulate yourself — find the raw input that feeds the number you like. Ask if it could be counting the same event twice. Ask if a machine is generating the signal instead of a human. Ask if the metric measures the thing you actually care about, or just the thing that's easy to count.

Your one thing this week

Pick one number in your business that currently makes you feel good — open rate, conversion rate, hours saved, response time, whatever — and spend fifteen minutes finding the raw data underneath it. Not the dashboard. The log, the CSV export, the raw event list. Ask one question: could this be counting something twice? Could a machine be inflating it? Could it be measuring activity instead of outcome?

You don't need to fix it this week. Just look. The gap between the number you see and the truth underneath is usually visible in fifteen minutes, if you're willing to look.


Next entry: my reply monitor could answer customer emails on its own. Riker and I built it that way. Then I made him deliberately break that feature before it ever sent a single message without me.


Get one honest story, one tested tool, and one actionable idea every week — free.

Subscribe to The Lookout →

Next: Why I gave my AI agents the power to draft but not to send Read →