Engineering

How we built the resolver to handle 1M scans/day on $50/month.

A frank breakdown of our infrastructure: Cloudflare Workers for routing, Postgres for analytics, no Redis, no Kafka. Sometimes boring is the right answer — and sometimes you regret it.

QRBliss · TeamApr 23, 202612 min read

Every dynamic QR code points to qrbliss.com/r/[code]. When someone scans it, our resolver looks up the actual destination URL and redirects. Last month we crossed 1 million scans per day for the first time, and we're still running on roughly $50/month of infrastructure.

Here's the architecture, the trade-offs we made, and the parts we'd do differently.

Close-up of server racks in a data center Photo: Manuel Geissinger on Pexels

The architecture

It's almost embarrassingly simple:

Cloudflare Workers handle the redirect. The Worker reads the [code] from the URL, checks the cache, and either redirects immediately or queries the origin.
Postgres (single instance) stores the source-of-truth mapping from code → destination URL, plus all analytics events.
No Redis. Cache lives in Cloudflare KV (free tier covers our cache needs).
No Kafka. Analytics writes go directly to Postgres via a connection pool.
No queue. Cloudflare Workers complete the redirect in <50ms; the Postgres write happens asynchronously and is allowed to fail (we lose at most 0.01% of scan events).

The total cost breakdown:

Service	Cost	Notes
Cloudflare Workers	$0	Free tier covers it
Postgres on Hetzner	$25/mo	4GB single instance
Backups, monitoring, etc.	$25/mo	Wasabi for archives, BetterStack for uptime

That's it. About $50/month for what most companies would consider a "tier 0" service.

Why this works

A few things.

Reads dominate. Of those 1M scans/day, ~95% hit the Cloudflare KV cache. Only 50,000/day actually hit Postgres for resolution. That's 0.6 reads/second. Postgres yawns at that.

Writes are cheap. Each scan generates one Postgres INSERT (the analytics event) of about 80 bytes. 1M scans × 80 bytes = 80MB/day. We rotate to a new partition every month and archive after 90 days. Total active disk: ~15GB.

No JavaScript runtime on the redirect. Cloudflare Workers are V8 isolates, but we use them in the most boring possible way — read from KV, decide, redirect. No CPU-heavy work. No external API calls during the redirect path.

Postgres is more capable than people think. A single 4GB Postgres instance with proper indexes handles the entire QRBliss workload, with overhead. The instance averages 2% CPU usage during peak hours. We've tested it to 10× our current load (10M scans/day) and it's still under 30% CPU.

Why this works (sociologically)

The harder part to admit: this infrastructure is simple because we said no to a lot of features that would have made it complex.

No real-time analytics dashboard. Scan counts update every 60 seconds, not in real-time.
No custom domain provisioning before Pro tier. Mass DNS provisioning is hard. We let it stay hard.
No deep geo analytics. We use Cloudflare's country-level geo, not city-level. City-level requires either a paid IP-geo database (which we don't want to pay for) or lat/lon storage (which would compromise privacy).
No A/B testing of redirects. We could build it. We'd need a serving layer that does percentage-based routing and tracks variants. Complex, not yet shipped.

Each of these is a feature we could build, and each would push the infrastructure budget up. We've stayed disciplined because every dollar we'd add is also a dollar that would go through layers of dependency we don't currently have.

The dead ends we hit

Three years of building this thing, and we've made some mistakes. The biggest:

1. We tried Redis early. It was wrong.

In our first six months, we used Redis as the primary cache. Faster than Cloudflare KV at the millisecond level (1ms vs 4ms). Made the redirect feel "snappy" in our test environment.

But Redis required:

A managed instance ($40/mo)
A persistence strategy (we picked AOF; lots of small writes)
A backup plan
Connection management from Workers

None of that complexity bought us anything users could feel. The 4ms Cloudflare KV latency is invisible inside an HTTP request. We deleted Redis. Saved $40/mo. Saved a half-day a month of maintenance.

2. We considered DynamoDB. Glad we didn't.

DynamoDB would've solved the "single point of failure" concern about a single Postgres instance. We seriously considered it for about two weeks.

Then we did the math. DynamoDB at our scale would cost ~$200/mo. Postgres on Hetzner: $25/mo. The "single point of failure" worry turned out to be addressable with replication + automated failover, which is well under $50/mo of additional cost.

"We're not Twitter. We don't need a system that can survive an asteroid. We need one that survives Tuesday."

— Riley Tanaka, Infrastructure · QRBliss

We stuck with Postgres + replication.

3. We tried writing analytics to S3 + Athena. Hated it.

For about three months we routed all analytics events through Kinesis → S3 → Athena. The idea was infinite scalability and cheap storage.

In practice:

Kinesis added complexity to the write path
S3 required JSON serialization (more CPU on the worker)
Athena queries took 10-20 seconds for "how many scans in the last hour" — which is not OK for an analytics dashboard
Total cost: $80/mo, despite the "cheap" S3 promise

We moved analytics back to Postgres. Queries dropped to 100ms. Cost dropped to ~$0/mo additional.

What we'd do differently

Two things.

Paginate the analytics dashboard from the beginning. Our analytics dashboard initially loaded the full event stream for a QR. At 100k scans, this was 12MB of JSON. We rebuilt with pagination eventually, but it was painful.

Write a load test in the first month. We didn't do this. We launched, then watched our P95 response time during the first organic traffic spike. Lucky we didn't have any Reddit-front-page moments before we wrote the load tests.

What's coming

We're at 1M scans/day. The next 10× will require more careful work:

Postgres replication for HA. We'll set up a hot standby in a different region.
Cloudflare KV → Workers KV namespaces. We can shard cache across multiple namespaces if KV becomes a hotspot.
Postgres logical replication for analytics. The analytics queries will eventually start affecting the main DB. We'll move them to a read replica.

These are all deferred until we cross 5M scans/day. We'll get there sometime in late 2026.

The lesson: simple architecture isn't lazy. Simple architecture is what makes 1M scans/day cost $50/month, and what makes it possible for a 4-person team to maintain.

See how it works.

The generator is free. No account needed to start.

Try the generator →