Building Scaling Intuition in the Age of AI

Why correctness, consistency, and architectural judgment still belong to you.

AI has collapsed the cost of producing code. A working system — services, a database schema, a scaling strategy, a clean distributed design — can now be generated in minutes. That changes what is scarce. It is no longer the typing, the boilerplate, or even the first draft of an architecture. What is scarce is the intuition to know what the system must actually do, what will hold up under load, and what will quietly break. That intuition is the thing AI cannot hand you, and it is the thing this article is about how to build.

Here is the trap, and it is sharper now than ever. The most visible, most discussed, most rewarded-feeling part of building software is the architecture: the service boundaries, the message brokers, the scaling strategy, the elegant distributed design. It pulls attention toward itself — and AI, happy to generate any of it on demand, makes it even easier to over-invest there. The result is that the other half of the job — getting the domain logic right, the rules that actually decide what the system does — quietly takes a back seat.

That trade is backwards. Architecture and domain correctness are different concerns, they fail in very different ways, and scaling intuition begins with telling them apart. Confusing the two, or letting one crowd out the other, is one of the most common and most expensive mistakes there is. So before we talk about how to scale, let’s build the instinct from the ground up.

The foundational instinct: correctness first, scale second

After about ten years of building systems, the rule I keep coming back to is simple: solve for reliability and consistency first, and think about scale second.

Your business logic should run correctly, every time. If a system buckles under load — slow responses, a queue backing up, an instance falling over during a traffic spike — that is unfortunate, but it is recoverable and usually visible. You add capacity, you tune, you shed load. A functional bug is a different animal. It is the biggest red flag there is, because it produces wrong outcomes silently and those outcomes compound. Scaling problems cost you availability; correctness problems cost you trust, money, and sometimes a mess that takes weeks to untangle.

Consider a concrete example. In an e-commerce system, imagine a service that refunds money for returned orders. That logic has two non-negotiable properties:

Reliability — the refund must actually happen, so the customer is made whole before the situation escalates into a support ticket or a complaint.
Consistency — the refund must happen exactly once. Refunding twice is a direct financial loss; never refunding is a customer you have lost.

Getting this right is not primarily an architecture problem; it is a correctness problem, and it demands guardrails. You design for at most one refund, backed by idempotency so a retry can never double-pay. You add a reconciliation mechanism that compares what should have been refunded against what actually was, and flags the gaps. And you make a deliberate choice about which way to fail: if a refund does not go through automatically, it can be completed manually — the customer is never harmed, and a human closes the loop. That is the posture correctness-critical logic deserves. Notice that none of it depends on whether the service is a microservice or part of a monolith. It is about the rules being right.

Correctness guardrails for a refund — idempotency, reconciliation, and a safe failure mode.

Training the instinct: solve it as if it were a monolith

This is where scaling intuition is actually built — not by reaching for distributed patterns, but by reasoning the problem all the way through first. When you are designing a solution, work it out theoretically as though it were a single monolithic application. This is not a statement about deployment; it is a thinking tool. Reasoning about the problem in one place forces you to enumerate every use case, every edge condition, every functional requirement, without the distraction of network boundaries, partial failure, and inter-service contracts. Get the what completely clear before you worry about the where.

The progression that develops the intuition is to solve the same problem at increasing levels of difficulty:

Solve it for one user. This pins down the functional requirements. What are the rules? What are the states? What must always be true? At this stage you are doing pure domain modeling.
Solve it for a hundred concurrent users. Now correctness meets concurrency. The same user might be hitting the feature from two devices at the same time — you genuinely cannot assume otherwise. This is where ACID guarantees, locking, idempotency, and race conditions enter the picture. These are still largely correctness concerns, not scaling concerns.
Solve it for the number of users you actually expect. Design for the real, anticipated load — not an imaginary internet-scale number, and not a single user. Sizing the solution to realistic demand is what keeps the design honest.

The progression that builds the intuition: one user, then concurrency, then realistic load.

Intuition about timing: start with a monolith

A practical consequence of all this: when scale is modest, a single host can serve multiple modules that would, at larger scale, become separate microservices. You keep them cleanly separated by module — clear internal boundaries, distinct responsibilities — but you run them in one process. During prototyping, or while you are solving for a thousand users, this is not a compromise; it is the efficient choice.

This is especially true for startups in their early phase. A monolithic design lets you move fast, iterate on the product, and get the domain logic right without paying the tax of a distributed system before you have to.

Chris Richardson makes exactly this argument in Microservices Patterns. He is blunt about it: “When developing the first version of an application, you often don’t have the problems that this architecture solves. Moreover, using an elaborate, distributed architecture will slow down development. That can be a major dilemma for startups… A startup should almost certainly begin with a monolithic application.” Distributed systems introduce real cost — interprocess communication that is far harder than a method call, services that must handle partial failure, transactions and queries that now span service boundaries and require sagas and CQRS to hold consistency together. You take that cost on when it buys you something. Early on, it does not.

When microservices actually come into the picture

So when is it time? Microservices earn their place when the application has outgrown the monolith — what Richardson calls monolithic hell. The symptoms are worth recognizing, because they are the real trigger, not fashion:

The codebase is too complex for any one developer to fully understand, so changes are slow and bug-prone.
Development slows down — the IDE lags, builds drag, startup is slow, and the edit-build-test loop becomes painful.
The path from commit to production is long and arduous, blocking the frequent, reliable deployment that continuous delivery depends on.
Scaling is difficult because different modules have conflicting resource needs (one is memory-hungry, another is CPU-bound) yet must be deployed together.
Reliability suffers — there is no fault isolation, so a single misbehaving module, like a memory leak, can take down every instance of the application.
You are locked into an aging technology stack, because rewriting the whole monolith to adopt a new framework or language is too risky.

When these are your problems, functional decomposition — splitting the application into services with focused, cohesive responsibilities — is the cure. The benefits line up against the pain points: independent deployability, independent scalability, fault isolation, autonomous teams, and freedom to adopt new technologies per service.

A modular monolith decomposes into services only when complexity — not load — demands it.

The mental model: the scale cube

Intuition needs vocabulary, and “scaling” is too vague a word to reason with. The scale cube from The Art of Scalability (which Richardson uses to frame the whole book) is the right reference. It describes three independent axes:

X-axis — horizontal duplication. Run multiple identical instances behind a load balancer. This is the classic, cheap way to scale a monolith, and it improves capacity and availability.
Z-axis — data partitioning. Run multiple instances where each handles a subset of the data, routing requests by an attribute such as userId. This scales transaction and data volume.
Y-axis — functional decomposition. Split the application into services by function. This is microservices.

X and Z scale a correct monolith without touching its architecture; Y is for complexity, not load.

The point that ties back to the whole argument: X-axis and Z-axis scaling improve capacity and availability without touching your architecture. You can take a correct monolith a very long way with cloning and partitioning alone. Y-axis decomposition is the one you reach for when the problem is no longer raw load but development and application complexity — exactly the monolithic-hell symptoms above.

So the sequence is: get the domain logic provably correct, run it as a well-modularized monolith, scale it on the X and Z axes as demand grows, and decompose along the Y axis only when complexity — not load — forces your hand. Architecture is a tool you apply to a correct system. It is never a substitute for one.

Your job now: thinker and director

Which brings us back to where we started. It would be easy to assume that once AI can generate all of this — the correct logic, the monolith, the scale-cube decision — the intuition no longer matters. The opposite is true. AI handles the production; the intuition is exactly what it cannot supply, and believing otherwise is the most dangerous version of the mistake this whole article is about.

In the age of AI, you are responsible for both the business logic and the architecture. AI is an extraordinary tool to help you write code, research options, prototype, implement, and ship faster. But it is you who must guide it. The model does not know your domain, your customers, your edge cases, or the financial consequence of refunding a returned order twice. It will confidently produce something plausible; whether that something is correct is your judgment to make.

You cannot do your organization a greater disservice than to delegate the thinking itself. Delegating the typing is fine — that is exactly what AI is for. Delegating the understanding — the decision about what the system must do, where consistency is non-negotiable, which failure mode is acceptable — hands away the one thing you are actually accountable for. A wrong refund rule generated in seconds is still a wrong refund rule, and it is your name on it.

This is why building a strong foundation matters more than ever, not less. The depth to reason about reliability, consistency, concurrency, and the trade-offs of an architecture is precisely what lets you direct AI well — to spot when its answer is subtly wrong, to ask it the right question, to reject the plausible-but-incorrect. Without that foundation you cannot tell good output from bad, and you become a conduit for the model’s mistakes rather than a check on them.

The job has shifted, but it has not shrunk. Your role is now that of the thinker and the director: the one who holds the requirements clearly, decides the correct design, and steers the tool toward it. AI delivers faster; you make sure what gets delivered is right.