Skip to main content

Command Palette

Search for a command to run...

Chapter 2: Scalability

Updated
•8 min read
Chapter 2: Scalability
S

👋 Hi, I'm Sourav Bandyopadhyay, a passionate software developer with a love for coding and a hunger for learning. 🚀 I thrive on solving complex problems and delivering efficient software solutions. 💡 Check out my newsletter and blog for more insights: corecraft.substack.com

Every thriving digital product eventually faces the same challenge: what worked perfectly for your first thousand users begins to struggle at ten thousand, and collapses completely at a million. Your application that smoothly handled 100 requests per second now falters at 10,000. This universal growth barrier is where scalability becomes your most critical architectural consideration.

What Is Scalability?

Scalability is a system's capability to accommodate increased load by efficiently adding resources. The operative word is "capability"—a truly scalable system can expand to meet rising demand without requiring complete redesign or disruptive overhauls.

Think of scalability as the foundation of robust system design. The principles you'll learn here directly enable the availability and reliability we'll explore in subsequent chapters, forming the bedrock of production-ready applications.

Measuring Scalability: From General Goals to Clear Metrics

You cannot improve what you cannot measure. Vague aspirations like "we need to handle more traffic" are useless without specific, measurable targets. Scalability is evaluated through several key dimensions:

Critical Load Metrics

MetricDescriptionReal-World Example
Requests Per Second (RPS)API calls processed15,000 RPS
Concurrent UsersSimultaneously active users75,000 concurrent
Data VolumeStorage and processing requirements25 TB dataset
ThroughputData transfer rate2.5 GB/second
Query RateDatabase operations per second80,000 QPS
Message RateQueue processing capacity250,000 messages/second

Performance Under Increasing Load

A system scales effectively when it maintains acceptable performance as load increases. Here's what scaling success and failure look like:

Load IncreaseResponse TimeSystem BehaviourInterpretation
1Ă— (Baseline)50msNormal operationExpected baseline performance
2Ă—55msExcellent scalingSublinear growth indicates efficient caching
5Ă—75msGood performanceSystem handling increased load well
10Ă—160msAcceptable degradationLinear, predictable performance decline
10Ă—650msConcerning bottleneckSuperlinear degradation signals architectural limits
10Ă—TimeoutCritical failureSystem has reached breaking point

The ideal scenario maintains near-linear performance degradation—doubling load shouldn't double response time. When response times spike exponentially or systems begin timing out, you've encountered a scalability barrier that requires architectural intervention.

Scaling Strategies: Vertical vs. Horizontal Approaches

Vertical Scaling (Scaling Up)

Vertical scaling enhances capacity by adding resources to existing machines—upgrading to more powerful hardware rather than increasing machine count.

This approach is often the initial response to performance issues since it requires minimal architectural changes.

Common Vertical Scaling Actions:

  • Adding CPU cores for computation-intensive tasks

  • Increasing RAM for enhanced in-memory caching

  • Upgrading to faster SSDs to reduce I/O bottlenecks

  • Implementing higher-bandwidth network interfaces

Advantages:

  • Simplicity: Typically requires no code changes

  • Reduced Latency: All components reside locally without network hops

  • No Distributed Complexity: Avoids synchronization and partitioning challenges

Limitations:

  • Hardware Ceilings: Maximum capacity limited by largest available machine

  • Single Point of Failure: Entire system depends on one machine

  • Exponential Cost Curve: Doubling capacity often more than doubles cost

  • Service Disruption: Upgrades typically require downtime

When Vertical Scaling Excels:

  • Databases requiring strong data locality

  • Applications with strict consistency requirements

  • Early-stage startups prioritizing simplicity

  • Workloads with predictable, moderate growth patterns

Important Note: Never dismiss vertical scaling as "not truly scalable." Many production systems operate successfully on vertically scaled databases for years. The key is recognizing when horizontal scaling becomes necessary.

Horizontal Scaling (Scaling Out)

When vertical scaling reaches its limits—either through hardware constraints or fault tolerance requirements—horizontal scaling becomes essential. This approach adds more machines rather than upgrading existing ones, distributing load across multiple commodity servers.

This is the foundational architecture behind tech giants like Google, Netflix, and Amazon, enabling them to handle billions of daily requests.

Advantages:

  • Theoretically Unlimited: Continue adding servers as needed

  • Built-in Fault Tolerance: Failure of one server doesn't collapse the system

  • Cost Efficiency: Multiple smaller machines often cost less than equivalent single large machines

  • Geographic Distribution: Place servers closer to users for reduced latency

Challenges:

  • Architectural Complexity: Distributed systems are harder to design, debug, and maintain

  • Data Consistency: Synchronising state across servers introduces complexity

  • Network Overhead: Inter-server communication adds latency

  • Stateless Requirement: Servers typically must not maintain local session state

Stateless vs. Stateful Architectures

Horizontal scaling demands stateless services—servers that don't store session data locally, allowing any server to handle any request.

Stateless Architecture (Easily Scalable)

Stateful Architecture (Challenging to Scale)

In stateful models, once a user's session resides on Server 1, all subsequent requests must route to that same server, creating hotspots and complicating server management.

Achieving Statelessness:

  • Store session data in shared caches (Redis, Memcached)

  • Implement token-based authentication (JWT)

  • Utilize object storage (S3, Cloud Storage) for uploaded files

Component-Specific Scaling Strategies

Modern applications comprise multiple layers, each with distinct scaling characteristics and requirements.

Application Tier Scaling

Application servers are typically the easiest to scale horizontally when stateless:

Key Strategies:

  • Implement stateless service design

  • Deploy load balancers for intelligent traffic distribution

  • Configure auto-scaling based on CPU, memory, or custom metrics

  • Distribute across multiple availability zones for resilience

Database Tier Scaling

Databases present the greatest scaling challenge due to state management. Unlike application servers, you cannot simply add database instances behind a load balancer without considering consistency, durability, and transaction isolation.

Identify Your Bottleneck Pattern:

1. Read Replicas For predominantly read workloads (common in most applications), create database copies that handle read operations:

When to Use: Read-to-write ratio exceeds 10:1 and writes aren't the bottleneck.

Advantages:

  • Simple implementation (especially with managed services)

  • Offloads read traffic from primary database

  • Provides read availability during primary failures

  • Minimal application changes required

Considerations:

  • Doesn't address write-heavy workloads

  • Introduces replication lag (eventually consistent reads)

  • Replicas require full data copies

  • Failover may cause temporary inconsistency

2. Sharding (Horizontal Partitioning) When write volume exceeds single-database capacity or datasets grow unwieldy, partition data across multiple databases using a shard key:

Sharding Strategies:

  • Range-based: Partition by value ranges (A-H, I-P, Q-Z)

  • Hash-based: Apply hash function to key, mod by shard count

  • Directory-based: Maintain lookup table mapping keys to shards

Advantages:

  • Distributes both read and write operations

  • Enables near-linear horizontal scaling

  • Individual shards remain smaller and faster

  • Supports geographic distribution

Complexities:

  • Implementation requires careful design

  • Cross-shard queries become expensive or impossible

  • Shard rebalancing is operationally intensive

  • Multi-shard transactions are challenging

3. NoSQL Databases NoSQL systems like Cassandra, MongoDB, and DynamoDB incorporate horizontal scaling as a foundational design principle:

Key Characteristics:

  • Built-in automatic sharding

  • Emphasis on availability over strong consistency

  • Schema flexibility without joins

  • Optimised for specific access patterns

Considerations:

  • Requires different query approaches than SQL

  • Often necessitates data denormalization

  • Typically employs eventual consistency models

  • May have less mature tooling ecosystems

Caching Tier Scaling

Caching dramatically reduces database load while improving response times. Modern cache systems like Redis can handle 100,000+ operations per second—orders of magnitude beyond typical databases.

Effective Caching Strategies:

  • Redis Cluster: Automatic data partitioning across nodes

  • Consistent Hashing: Even key distribution with minimal redistribution during scaling

  • Cache-Aside Pattern: Application checks cache first, falls back to database on misses, then repopulates cache

Message Queue Tier Scaling

Message queues enable scalable asynchronous processing by decoupling producers from consumers:

How Queues Enable Scalability:

  • Decouple components for independent scaling

  • Buffer traffic spikes to prevent consumer overload

  • Enable parallel processing through topic partitioning

Scaling in Practice: A Social Media Platform's Evolution

Theory informs practice, but real-world scaling journeys provide invaluable insights. Let's trace a social media platform's evolution from launch to millions of users.

Stage 1: Single Server (0-10,000 Users)

Characteristics: Simple, cost-effective, adequate for initial traction.
Emerging Bottleneck: Application and database compete for shared resources.

Stage 2: Separated Database (10,000-100,000 Users)

Improvement: Independent resource allocation for each component.
Emerging Bottleneck: Database becomes the primary constraint under increased query load.

Stage 3: Caching Layer (100,000-500,000 Users)

Improvement: 80-90% of reads served from memory, dramatically reducing database load.
Emerging Bottleneck: Single application server cannot handle request volume.

Stage 4: Multiple Application Servers (500,000-2 Million Users)

Improvement: Horizontal scaling at application tier with stateless servers.
Emerging Bottleneck: Database write capacity becomes saturated.

Stage 5: Read Replicas (2-10 Million Users)

Improvement: Read capacity multiplied through replication.
Consideration: Acceptable replication lag for most use cases.
Emerging Bottleneck: Write throughput limitations on single primary.

Stage 6: Database Sharding (10+ Million Users)

Improvement: Both read and write capacity scale horizontally.
Complexities: Cross-shard operations, rebalancing challenges. Many teams at this stage consider distributed SQL solutions like CockroachDB or Vitess.

Key Scalability Principles

  1. Vertical scaling provides simplicity but has inherent limits. It's ideal for early-stage systems and components where simplicity outweighs infinite growth requirements.

  2. Horizontal scaling enables limitless growth but introduces complexity. It demands stateless services and sophisticated data management strategies.

  3. Different system components scale differently. Application servers scale relatively easily; databases require careful planning due to state management.

  4. Always identify bottlenecks before scaling. Adding application servers won't help if the database is the constraint.

  5. Standard patterns emerge across scalable systems: load balancing, caching, asynchronous processing, and database optimisation.

Beyond Scalability: The Availability Imperative

A system that scales to millions but fails weekly still disappoints users. Scalability addresses capacity, but what happens when components fail? How do you maintain service during outages, network partitions, or data center failures?

This brings us to our next critical dimension: Availability—ensuring your system remains operational despite inevitable failures, providing the reliability users expect from modern applications.


Help Us Shape Core Craft Better

TL; DR: Got 2 minutes? Take this quick survey to tell us who you are, what you care about, and how we can make Core Craft even better for you.

Take Core Craft Survey


Thank You For Reading!

Loved this? Hit ❤️ to share the love and help others find it!

Have ideas or questions? Drop a comment—I reply to all! For collabs or newsletter sponsorships, email me at souravb.1998@gmail.com

Stay connected: Follow me on X | LinkedIn | YouTube

Fundamentals of System Design

Part 2 of 2

A practical, beginner-to-intermediate series covering the core principles of system design. This series explains how to design scalable, reliable, and maintainable systems using real-world examples.

Start from the beginning

Chapter 1: The Fundamentals of System Design

Designing Scalable Systems: A Full Guide from Basics to Blueprints