System Design & Scalability

Overview

Types of Scaling & Load Balancing
Data Storage Design
Message-oriented Middleware (MOM)
Fault Handling
Performance Metrics
MapReduce

Types of Scaling & Load Balancing

Horizontal vs. Vertical Scaling

Load Balancing

Round Robin
Source IP Hash - a given client IP address will always go to the same server
Request Hash - a given request type will always go to the same server/cache; avoids cache duplication
Least Connections
Least Traffic
Least Latency

Reference

Data Storage Design

Database - Structure

Relational - general purpose for tabular/table-based data
Specialized - for data structures that don't easily fit the tabular format (e.g. multi-level nesting & hierarchies)
- NoSQL
- Others

Database - Read vs. Write Performance

Normalization - the process of organizing data to minimize redundancy (removing duplicate columns in different tables). In relational DBs, normalization involves dividing a DB into several tables and defining relationships between those tables using foreign keys.
Normalize vs. Denormalize
- Normalize - ↓duplicate data ⇨ ↑write perf but ↓read perf
- Denormalize - ↑duplicate data ⇨ ↑read perf but ↓write perf
Have your cake and… - Use an append-only structure for writes; then asynchronously restructure data into a read-optimized format[*]

Cache

DB reads are expensive; i.e. hold as much of it in memory as possible
Cache Hit - data were found in cache; Cache Miss - data not found, so retrieve it from DB[*]
Local vs. Distributed Rule of Thumb - use local cache for small data sets, with predictable number of immutable records[*]
Cache Warming - anticipate queries and "prime" the cache not only on startup but also in real-time (e.g. load surrounding tiles of a recently-requested map)

Cache - Replacement Policy

Replacement Policy - algorithm used to maximize cache performance by choosing which data to eject & which data to add in its place[*]

LRU - ejects the most Least Recently Used data
advanced - considers access frequency, size of items, latency & throughput

Data Store Sharding

Sharding - partition data across multiple nodes

Message-oriented Middleware (MOM)

MOM Considerations

Used by distributed systems to communicate amongst nodes[*]
Abstracts OS & network intricacies (e.g. endian format, sockets, etc.)

MOM Types

Fault Handling

Fault Handling Types

High Availability (HA) - delayed recovery to secondary
Fault Tolerant - immediate recovery
- Active/Passive - primary fails over to secondary
- Active/Active - no primary vs. secondary; when 1 fails, the other(s) takes the additional load
Great YouTube video on the subject!

Avoid SPOF

A Single Point of Failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working.[*]

Performance Metrics

Performance Metrics - Terminology

Bandwidth - The maximum amount of data that can be transferred in a unit of time (e.g. 100Mbps)[*]
Throughput - The actual amount of data that is transferred in a unit of time (e.g. 88MBps)
Latency - The time it takes to send & receive (round-trip) a packet of data (e.g. 20ms)[*]

Performance Metrics - Analogy

Given a water pipe, its diameter determines its throughput, and its length determines its latency. Therefore, to improve:

Throughput - Get a fatter pipe
Latency - Colocate to reduce distance or reduce network hops (point-to-point circuit), which also reduces distance that data have to travel

Performance Metrics - Tail Latencies

A tail latency, like P99 or P95, is used to represent the worst-case scenario.[*]
e.g. P99 = 0.75ms: For the 99th percentile, the request latency was 0.75ms. Therefore, 99% of the requests were faster than 0.75ms.

MapReduce

MapReduce - Explanation

Uses parallel & distributed systems to process large data sets[*]

Implementations - Spark, Hadoop, etc.[*]
YouTube presentation

MapReduce - Steps

Fundamentally, consists of two steps, Map & Reduce, but Shuffle step is also prevalent:

Map - Organizes/filters/sorts. Think of putting elements into a typical Map Interface with key-value pairs (e.g. <key, value>)
Shuffle - Redistributes data so that all data pertaining to a given key reside on the same node
Reduce - Summary/aggregation (e.g. sum all values for a given key)

John DiFini's Blog

Categories

Authors

May 19 System Design & Scalability

Overview

Types of Scaling & Load Balancing

Horizontal vs. Vertical Scaling

Load Balancing

Data Storage Design

Database - Structure

Database - Read vs. Write Performance

Cache

Cache - Replacement Policy

Data Store Sharding

Message-oriented Middleware (MOM)

MOM Considerations

MOM Types

Fault Handling

Fault Handling Types

Avoid SPOF

Performance Metrics

Performance Metrics - Terminology

Performance Metrics - Analogy

Performance Metrics - Tail Latencies

MapReduce

MapReduce - Steps

Latest Article

Follow Us

@tudorinsidestyle

Ads can be placed here

My Photography on Flickr

Terms

Privacy

John DiFini's Blog

Categories

Authors

May 19 System Design & Scalability

Overview

Types of Scaling & Load Balancing

Horizontal vs. Vertical Scaling

Load Balancing

Data Storage Design

Database - Structure

Database - Read vs. Write Performance

Cache

Cache - Replacement Policy

Data Store Sharding

Message-oriented Middleware (MOM)

MOM Considerations

MOM Types

Fault Handling

Fault Handling Types

Avoid SPOF

Performance Metrics

Performance Metrics - Terminology

Performance Metrics - Analogy

Performance Metrics - Tail Latencies

MapReduce

MapReduce - Steps

Jun 4 Slack to the Rescue

May 9 3 Founding Principles of Big Tech

Related Posts

Terms

Privacy