Chapter 2: Scalability

👋 Hi, I'm Sourav Bandyopadhyay, a passionate software developer with a love for coding and a hunger for learning. 🚀 I thrive on solving complex problems and delivering efficient software solutions. 💡 Check out my newsletter and blog for more insights: corecraft.substack.com
Every thriving digital product eventually faces the same challenge: what worked perfectly for your first thousand users begins to struggle at ten thousand, and collapses completely at a million. Your application that smoothly handled 100 requests per second now falters at 10,000. This universal growth barrier is where scalability becomes your most critical architectural consideration.
What Is Scalability?
Scalability is a system's capability to accommodate increased load by efficiently adding resources. The operative word is "capability"—a truly scalable system can expand to meet rising demand without requiring complete redesign or disruptive overhauls.
Think of scalability as the foundation of robust system design. The principles you'll learn here directly enable the availability and reliability we'll explore in subsequent chapters, forming the bedrock of production-ready applications.
Measuring Scalability: From General Goals to Clear Metrics
You cannot improve what you cannot measure. Vague aspirations like "we need to handle more traffic" are useless without specific, measurable targets. Scalability is evaluated through several key dimensions:
Critical Load Metrics
| Metric | Description | Real-World Example |
| Requests Per Second (RPS) | API calls processed | 15,000 RPS |
| Concurrent Users | Simultaneously active users | 75,000 concurrent |
| Data Volume | Storage and processing requirements | 25 TB dataset |
| Throughput | Data transfer rate | 2.5 GB/second |
| Query Rate | Database operations per second | 80,000 QPS |
| Message Rate | Queue processing capacity | 250,000 messages/second |
Performance Under Increasing Load
A system scales effectively when it maintains acceptable performance as load increases. Here's what scaling success and failure look like:
| Load Increase | Response Time | System Behaviour | Interpretation |
| 1Ă— (Baseline) | 50ms | Normal operation | Expected baseline performance |
| 2Ă— | 55ms | Excellent scaling | Sublinear growth indicates efficient caching |
| 5Ă— | 75ms | Good performance | System handling increased load well |
| 10Ă— | 160ms | Acceptable degradation | Linear, predictable performance decline |
| 10Ă— | 650ms | Concerning bottleneck | Superlinear degradation signals architectural limits |
| 10Ă— | Timeout | Critical failure | System has reached breaking point |
The ideal scenario maintains near-linear performance degradation—doubling load shouldn't double response time. When response times spike exponentially or systems begin timing out, you've encountered a scalability barrier that requires architectural intervention.
Scaling Strategies: Vertical vs. Horizontal Approaches
Vertical Scaling (Scaling Up)
Vertical scaling enhances capacity by adding resources to existing machines—upgrading to more powerful hardware rather than increasing machine count.
This approach is often the initial response to performance issues since it requires minimal architectural changes.
Common Vertical Scaling Actions:
Adding CPU cores for computation-intensive tasks
Increasing RAM for enhanced in-memory caching
Upgrading to faster SSDs to reduce I/O bottlenecks
Implementing higher-bandwidth network interfaces
Advantages:
Simplicity: Typically requires no code changes
Reduced Latency: All components reside locally without network hops
No Distributed Complexity: Avoids synchronization and partitioning challenges
Limitations:
Hardware Ceilings: Maximum capacity limited by largest available machine
Single Point of Failure: Entire system depends on one machine
Exponential Cost Curve: Doubling capacity often more than doubles cost
Service Disruption: Upgrades typically require downtime
When Vertical Scaling Excels:
Databases requiring strong data locality
Applications with strict consistency requirements
Early-stage startups prioritizing simplicity
Workloads with predictable, moderate growth patterns
Important Note: Never dismiss vertical scaling as "not truly scalable." Many production systems operate successfully on vertically scaled databases for years. The key is recognizing when horizontal scaling becomes necessary.
Horizontal Scaling (Scaling Out)
When vertical scaling reaches its limits—either through hardware constraints or fault tolerance requirements—horizontal scaling becomes essential. This approach adds more machines rather than upgrading existing ones, distributing load across multiple commodity servers.
This is the foundational architecture behind tech giants like Google, Netflix, and Amazon, enabling them to handle billions of daily requests.
Advantages:
Theoretically Unlimited: Continue adding servers as needed
Built-in Fault Tolerance: Failure of one server doesn't collapse the system
Cost Efficiency: Multiple smaller machines often cost less than equivalent single large machines
Geographic Distribution: Place servers closer to users for reduced latency
Challenges:
Architectural Complexity: Distributed systems are harder to design, debug, and maintain
Data Consistency: Synchronising state across servers introduces complexity
Network Overhead: Inter-server communication adds latency
Stateless Requirement: Servers typically must not maintain local session state
Stateless vs. Stateful Architectures
Horizontal scaling demands stateless services—servers that don't store session data locally, allowing any server to handle any request.
Stateless Architecture (Easily Scalable)

Stateful Architecture (Challenging to Scale)

In stateful models, once a user's session resides on Server 1, all subsequent requests must route to that same server, creating hotspots and complicating server management.
Achieving Statelessness:
Store session data in shared caches (Redis, Memcached)
Implement token-based authentication (JWT)
Utilize object storage (S3, Cloud Storage) for uploaded files
Component-Specific Scaling Strategies
Modern applications comprise multiple layers, each with distinct scaling characteristics and requirements.
Application Tier Scaling
Application servers are typically the easiest to scale horizontally when stateless:
Key Strategies:
Implement stateless service design
Deploy load balancers for intelligent traffic distribution
Configure auto-scaling based on CPU, memory, or custom metrics
Distribute across multiple availability zones for resilience
Database Tier Scaling
Databases present the greatest scaling challenge due to state management. Unlike application servers, you cannot simply add database instances behind a load balancer without considering consistency, durability, and transaction isolation.
Identify Your Bottleneck Pattern:

1. Read Replicas For predominantly read workloads (common in most applications), create database copies that handle read operations:

When to Use: Read-to-write ratio exceeds 10:1 and writes aren't the bottleneck.
Advantages:
Simple implementation (especially with managed services)
Offloads read traffic from primary database
Provides read availability during primary failures
Minimal application changes required
Considerations:
Doesn't address write-heavy workloads
Introduces replication lag (eventually consistent reads)
Replicas require full data copies
Failover may cause temporary inconsistency
2. Sharding (Horizontal Partitioning) When write volume exceeds single-database capacity or datasets grow unwieldy, partition data across multiple databases using a shard key:

Sharding Strategies:
Range-based: Partition by value ranges (A-H, I-P, Q-Z)
Hash-based: Apply hash function to key, mod by shard count
Directory-based: Maintain lookup table mapping keys to shards
Advantages:
Distributes both read and write operations
Enables near-linear horizontal scaling
Individual shards remain smaller and faster
Supports geographic distribution
Complexities:
Implementation requires careful design
Cross-shard queries become expensive or impossible
Shard rebalancing is operationally intensive
Multi-shard transactions are challenging
3. NoSQL Databases NoSQL systems like Cassandra, MongoDB, and DynamoDB incorporate horizontal scaling as a foundational design principle:
Key Characteristics:
Built-in automatic sharding
Emphasis on availability over strong consistency
Schema flexibility without joins
Optimised for specific access patterns
Considerations:
Requires different query approaches than SQL
Often necessitates data denormalization
Typically employs eventual consistency models
May have less mature tooling ecosystems
Caching Tier Scaling
Caching dramatically reduces database load while improving response times. Modern cache systems like Redis can handle 100,000+ operations per second—orders of magnitude beyond typical databases.

Effective Caching Strategies:
Redis Cluster: Automatic data partitioning across nodes
Consistent Hashing: Even key distribution with minimal redistribution during scaling
Cache-Aside Pattern: Application checks cache first, falls back to database on misses, then repopulates cache
Message Queue Tier Scaling
Message queues enable scalable asynchronous processing by decoupling producers from consumers:

How Queues Enable Scalability:
Decouple components for independent scaling
Buffer traffic spikes to prevent consumer overload
Enable parallel processing through topic partitioning
Scaling in Practice: A Social Media Platform's Evolution
Theory informs practice, but real-world scaling journeys provide invaluable insights. Let's trace a social media platform's evolution from launch to millions of users.
Stage 1: Single Server (0-10,000 Users)
Characteristics: Simple, cost-effective, adequate for initial traction.
Emerging Bottleneck: Application and database compete for shared resources.

Stage 2: Separated Database (10,000-100,000 Users)

Improvement: Independent resource allocation for each component.
Emerging Bottleneck: Database becomes the primary constraint under increased query load.
Stage 3: Caching Layer (100,000-500,000 Users)

Improvement: 80-90% of reads served from memory, dramatically reducing database load.
Emerging Bottleneck: Single application server cannot handle request volume.
Stage 4: Multiple Application Servers (500,000-2 Million Users)

Improvement: Horizontal scaling at application tier with stateless servers.
Emerging Bottleneck: Database write capacity becomes saturated.
Stage 5: Read Replicas (2-10 Million Users)

Improvement: Read capacity multiplied through replication.
Consideration: Acceptable replication lag for most use cases.
Emerging Bottleneck: Write throughput limitations on single primary.
Stage 6: Database Sharding (10+ Million Users)

Improvement: Both read and write capacity scale horizontally.
Complexities: Cross-shard operations, rebalancing challenges. Many teams at this stage consider distributed SQL solutions like CockroachDB or Vitess.
Key Scalability Principles
Vertical scaling provides simplicity but has inherent limits. It's ideal for early-stage systems and components where simplicity outweighs infinite growth requirements.
Horizontal scaling enables limitless growth but introduces complexity. It demands stateless services and sophisticated data management strategies.
Different system components scale differently. Application servers scale relatively easily; databases require careful planning due to state management.
Always identify bottlenecks before scaling. Adding application servers won't help if the database is the constraint.
Standard patterns emerge across scalable systems: load balancing, caching, asynchronous processing, and database optimisation.
Beyond Scalability: The Availability Imperative
A system that scales to millions but fails weekly still disappoints users. Scalability addresses capacity, but what happens when components fail? How do you maintain service during outages, network partitions, or data center failures?
This brings us to our next critical dimension: Availability—ensuring your system remains operational despite inevitable failures, providing the reliability users expect from modern applications.
Help Us Shape Core Craft Better
TL; DR: Got 2 minutes? Take this quick survey to tell us who you are, what you care about, and how we can make Core Craft even better for you.
Thank You For Reading!
Loved this? Hit ❤️ to share the love and help others find it!
Have ideas or questions? Drop a comment—I reply to all! For collabs or newsletter sponsorships, email me at souravb.1998@gmail.com
Stay connected: Follow me on X | LinkedIn | YouTube




