What is the Scale Cube?

A model from The Art of Scalability describing three independent ways to scale a system. The X-axis is horizontal duplication (run more identical copies behind a load balancer). The Y-axis is functional decomposition (split by capability — this is microservices). The Z-axis is data partitioning (shard the data by some attribute like customer ID). They're orthogonal, and most real systems combine all three.

Are microservices required to scale?

No. Microservices are one axis of scaling — functional decomposition. Plenty of systems scale far on X-axis duplication and Z-axis sharding while staying a monolith. Reach for microservices when independent deployment, team autonomy, or isolating a load-heavy capability is the actual problem — not because 'scale' is a vague goal.

What is the dual-write problem?

When a service must update its database and notify other services, the naive approach is two separate writes: commit to the DB, then publish an event to a broker like Kafka. These two actions aren't atomic. If the process crashes between them, you either lose the event (others never learn the order was placed) or publish an event for a transaction that rolled back. There's no retry that makes two independent systems atomic.

How does the transactional outbox pattern fix dual writes?

Instead of publishing to the broker directly, the service inserts the event as a row into an 'outbox' table in the same local database transaction as the business change. Either both commit or neither does. A separate process — usually Change Data Capture reading the database log — then reads the outbox and publishes to the broker. Atomicity is restored; the cost is an extra moving part and a small amount of latency.

Choreography or orchestration for sagas?

Choreography has each service react to events and emit the next one — decentralized, no coordinator, but the overall flow is implicit and hard to trace. Orchestration uses a central coordinator that tells each service what to do and handles compensation — easier to reason about and observe, at the cost of a component that knows the whole workflow. Small flows lean choreography; complex, auditable flows lean orchestration.

Microservices Scale Your Code. Your Data Is What Breaks.

Splitting a service is a Tuesday-afternoon refactor. Splitting its data is the part that pages you at 3 a.m.

A team decides to scale. The monolith is groaning, deploys are scary, and one capability — say, order processing — is eating most of the load. So they do the textbook thing: carve order processing into its own service. The split looks clean on the architecture diagram. Two boxes instead of one, a tidy arrow between them.

Then the orders service needs a customer’s address, which lives in the user service. And the payments service needs to know an order exists, which lives in the orders service. And finance needs to join orders against payments to close the books — except those two tables now live in two different databases that can’t be joined. The clean diagram was real. The clean data was a fantasy.

This is the thing nobody tells you up front: microservices are a scaling pattern for code. The data is what actually breaks.

What “scaling” actually means

Strip the word back. Scaling means handling more — more load, more data, more teams — without the system falling over or the org grinding to a halt. There isn’t one way to do it; there are three, and they’re independent. The cleanest map of them is the Scale Cube from The Art of Scalability.

X and Z scale a correct monolith without touching its architecture; Y is for complexity, not load.

X-axis — horizontal duplication. Run N identical copies of the whole thing behind a load balancer. Cheapest and most common. Doubles your throughput by doubling your instances. Costs you almost no design complexity — until the database behind all those copies becomes the bottleneck.
Y-axis — functional decomposition. Split the system by capability: orders, payments, inventory, each its own service. This is microservices. It scales your codebase and your teams more than it scales raw throughput.
Z-axis — data partitioning. Split the data by some attribute — customer ID, region, tenant — so each shard owns a slice. This is how you scale past what one database can hold.

Here’s the insight most “let’s go microservices” decisions miss: these axes solve different problems. Microservices (Y) are what you reach for when independent deployment and team autonomy are the pain. But a lot of teams adopt Y when their actual problem was X (just need more replicas) or Z (the database is too big for one box). Choosing the wrong axis is expensive, because each one commits you to a different set of costs — and Y commits you to the hardest one.

The instinct: reason it out as a monolith first

Before splitting anything, solve the problem as if it were a single application. Not as a deployment decision — as a thinking tool. In one process, with one database, you can enumerate every use case and every transaction boundary without the fog of network calls and partial failure. Get the what completely correct first. Only then ask where it should physically live.

A modular monolith decomposes into services only when complexity — not load — demands it.

The reason this matters: when you split too early, you bake your half-understood domain boundaries into network boundaries. A bad function call is a quick refactor. A bad service boundary is a migration, a data backfill, and a quarter of regret.

The real commitment behind a microservice: its database

A microservice isn’t really “a small service.” Its defining property is database-per-service: it owns its data, and no other service touches that database directly. The moment you accept that — and you must, because a shared database is just a monolith with extra network hops — two things you took for granted disappear.

Joins disappear. Finance wants orders joined with payments. Those tables are now in two databases. There is no JOIN. You either call two services and stitch the result in application code, or you maintain a separate read store that has both. (Hold that thought — it’s exactly the warehousing instinct of storing the same data more than once, shaped for the query pattern that needs it.)

Cross-service transactions disappear. Placing an order should reserve inventory and charge payment. In a monolith that’s one ACID transaction — all of it commits or none of it does. Across three services with three databases, there is no single transaction that spans them. This is the heart of distributed data, and everything below is a response to it.

Failure mode #1: the dual-write trap

Here’s the most common way teams lose data without knowing it. The orders service must do two things when an order is placed: persist the order, and tell the world (so payments and inventory can react). The naive code:

// DON'T DO THIS
orderRepository.save(order);              // 1. write to Postgres
kafkaProducer.send("orders", orderEvent); // 2. publish to Kafka

Two writes, two systems, no atomicity between them. Walk the failure chain:

The process crashes after line 1, before line 2 → the order exists, but payments and inventory never hear about it. A paid-for order that ships nothing, or an order that never gets charged.
Line 1 commits, line 2 is attempted, but the broker is briefly down → same outcome, or an inconsistent retry.
You reorder the lines to publish first → now a broker hiccup can emit an event for an order that never committed. A phantom order downstream.

There is no ordering of two independent writes that’s safe. Retries don’t save you — a retry of a two-step process can double-apply. This is the dual-write problem, and it’s the single most common source of silent data loss in event-driven microservices. It’s a bad-data problem wearing an architecture costume.

The fix: transactional outbox + Change Data Capture

The trick is to stop treating “save the order” and “record that an order happened” as two systems. Make them one local transaction.

You add an outbox table in the same database as your orders. When an order is placed, you write the order and an outbox row in a single transaction:

BEGIN;
  INSERT INTO orders (id, customer_id, total, status)
  VALUES ('ord_123', 'cust_42', 4999, 'PLACED');

  INSERT INTO outbox (id, aggregate, event_type, payload)
  VALUES ('evt_987', 'order', 'OrderPlaced',
          '{"orderId":"ord_123","customerId":"cust_42","total":4999}');
COMMIT;

Either both rows commit or neither does. The event can no longer disappear, because it lives in the same ACID boundary as the thing it describes.

Now a separate process publishes the outbox rows to Kafka. You could poll the table, but the clean way is Change Data Capture — read the database’s write-ahead log (the WAL in Postgres, the binlog in MySQL) and stream every committed change out. A tool in this class (Debezium is the common example) tails the log and emits the new outbox rows downstream:

# Debezium connector (sketch): stream the outbox table's inserts to Kafka
connector.class: io.debezium.connector.postgresql.PostgresConnector
database.hostname: orders-db
plugin.name: pgoutput
table.include.list: public.outbox
transforms: outbox            # unwrap the row into a clean event
transforms.outbox.type: io.debezium.transforms.outbox.EventRouter

What this costs you — and it does cost. You’ve added a moving part: a CDC pipeline that has to be deployed, monitored, and upgraded. At small scale it’s easy. At large scale, running the streaming layer (Kafka plus the connect/CDC tier) is a job of its own — security patches mean testing version upgrades, watching for connector lag, and handling the log-retention edge cases when a consumer falls behind. You also accept a little latency: the event is now eventually published, not synchronously. In exchange you get atomicity and replayability — the log is your replayable source of truth. For most systems that’s a trade worth making. Name it anyway; don’t pretend CDC is free.

 Order Service                      Broker            Consumers
 ┌────────────────────┐                               ┌───────────┐
 │  BEGIN              │                               │ Payments  │
 │   INSERT order      │     ┌──────────┐  OrderPlaced │           │
 │   INSERT outbox ────┼──►  │   CDC    ├────────────► │ Inventory │
 │  COMMIT  (atomic)   │ WAL │ (reads   │              │           │
 └────────────────────┘     │  the log)│              │ Read model│
                            └──────────┘              └───────────┘

Failure mode #2: the order spans three services — sagas

Atomic publishing solves “did the event escape.” It doesn’t solve “this business action touches three services and any of them can fail.” Placing an order must: create the order, charge the payment, and reserve inventory. Three services, three databases, no shared transaction.

The answer is a saga: a sequence of local transactions, each emitting an event that triggers the next, with an explicit compensating action for every step that already succeeded. If inventory reservation fails after payment succeeded, you don’t roll back — you can’t — you run a compensation: refund the payment.

  OrderCreated ──► PaymentCharged ──► InventoryReserved ──► OrderConfirmed
                        │                    │
                        │  (reserve fails)   │
                        ▼                    │
                  RefundPayment  ◄───────────┘   (compensation)
                        │
                        ▼
                  OrderCancelled

Two ways to wire it:

Choreography — each service listens for events and emits the next. No central brain. Simple for short flows, but the overall workflow is implicit: nobody owns the picture of “what happens when an order is placed,” which makes it brutal to debug and trace as it grows.
Orchestration — a coordinator service explicitly drives each step and triggers compensations. Easier to observe, audit, and change, at the cost of a component that knows the whole workflow.

What sagas cost: you give up atomic rollback and accept eventual consistency. For a window of time the system is in an intermediate state — payment taken, inventory not yet reserved. Your domain logic and your UI have to tolerate that window. And remember the correctness bar from refunds: every compensating action must be idempotent, because it will get retried, and a compensation that double-refunds is a direct financial loss.

Scaling reads: CQRS and deliberate redundancy

Splitting services solves write ownership. It often makes reads worse — the data you want is scattered across services. The response is CQRS (Command Query Responsibility Segregation): keep the normalized write models per service, and build separate read models optimized for how the data is actually queried.

That dashboard joining orders and payments? Don’t fan out to two services on every page load. Consume the OrderPlaced and PaymentCharged events (the ones your outbox + CDC already emit) and project them into a single denormalized read store. This is the warehousing conviction stated in distributed-systems language: store the same data more than once, each copy shaped for a query pattern. One write model keyed for correctness; one read model keyed for the query. Redundancy is a design choice, not a smell.

The cost, predictably, is consistency: the read model lags the write model by however long the event takes to propagate. You’re trading freshness for read scalability and query simplicity. Sometimes that’s fine (an analytics dashboard). Sometimes it isn’t (showing a user their account balance right after a transfer). Decide per read, not globally.

Scaling the data itself: the Z-axis

Eventually a single service’s database is too big or too hot for one box. Now you’re on the Z-axis: shard the data. Partition by a key — customer_id, tenant_id, region — so each shard owns a slice and no single node holds everything.

Sharding buys near-linear capacity, but the bill is specific:

Cross-shard queries get expensive. “Top 10 customers globally” now has to fan out to every shard and merge. Choose a shard key that matches your dominant access pattern, so the common query hits one shard.
Hot shards. If one tenant is 100× the others, its shard melts while the rest idle. Shard keys need to spread load, not just data.
Rebalancing is the hard part. Adding shards means moving live data, and naive modulo sharding (hash(id) % N) reshuffles almost everything when N changes. Consistent hashing or a lookup-table approach exists precisely to make this survivable — and it’s the operation you must design for before you need it, not during the incident.

What teams actually get wrong

The patterns above are well understood. The failures are almost always in the gaps between them:

The distributed monolith. Services split, but every request still triggers a synchronous chain of calls across all of them. You bought the operational cost of microservices and kept the coupling of a monolith — the worst of both. The tell: you can’t deploy one service without the others.
The shared database. Two services quietly reading the same tables “just for now.” This re-couples them at the data layer, and schema changes become cross-team negotiations again. Database-per-service exists specifically to prevent this.
Synchronous everything. Using blocking request/response where an event would do. Now a slow downstream service makes the upstream one slow too, and a single failure cascades. Asynchrony via events isolates failures; synchronous chains propagate them.
Treating schema change as an afterthought. A producer renames a field; every downstream consumer breaks. Across service boundaries, schema is a contract — additions are cheap, renames and removals must route through a deprecation period with both versions alive, exactly as you’d evolve a warehouse table feeding live dashboards.

These aren’t exotic. They’re the default outcome if you split services without respecting what happens to the data.

When to use this — and when not to

Microservices and the data patterns around them are powerful and expensive. Use them when the problem actually calls for the Y-axis: independent deployment, team autonomy at scale, or isolating one capability whose load or failure profile is genuinely different from the rest. In a fast-scaling org going from mid-size to large in a hurry, that pressure is real, and these patterns are how you absorb it.

Avoid them when your real problem is a different axis. If you just need more throughput, X-axis replicas behind a load balancer are simpler and cheaper. If your database is the bottleneck, Z-axis sharding may solve it without splitting a single service. And if your boundaries aren’t clear yet, stay a monolith and get the domain right first — a well-factored monolith outperforms a confused mesh of services every time.

The core lesson holds across all of it: the service split is the easy, visible part. The data — its consistency, its movement, its redundancy, its evolution — is where scaling actually lives. Get that right, and the architecture diagram takes care of itself.