System Design & Scalability
Overview
Types of Scaling & Load Balancing
Data Storage Design
Message-oriented Middleware (MOM)
Fault Handling
Performance Metrics
MapReduce
Types of Scaling & Load Balancing
Horizontal vs. Vertical Scaling
Load Balancing
Round Robin
Source IP Hash - a given client IP address will always go to the same server
Request Hash - a given request type will always go to the same server/cache; avoids cache duplication
Least Connections
Least Traffic
Least Latency
Data Storage Design
Database - Structure
Relational - general purpose for tabular/table-based data
Specialized - for data structures that don't easily fit the tabular format (e.g. multi-level nesting & hierarchies)
NoSQL
Others
Database - Read vs. Write Performance
Normalization - the process of organizing data to minimize redundancy (removing duplicate columns in different tables). In relational DBs, normalization involves dividing a DB into several tables and defining relationships between those tables using foreign keys.
Normalize vs. Denormalize
Normalize - ↓duplicate data ⇨ ↑write perf but ↓read perf
Denormalize - ↑duplicate data ⇨ ↑read perf but ↓write perf
Have your cake and… - Use an append-only structure for writes; then asynchronously restructure data into a read-optimized format[*]
Cache
DB reads are expensive; i.e. hold as much of it in memory as possible
Cache Hit - data were found in cache; Cache Miss - data not found, so retrieve it from DB[*]
Local vs. Distributed Rule of Thumb - use local cache for small data sets, with predictable number of immutable records[*]
Cache Warming - anticipate queries and "prime" the cache not only on startup but also in real-time (e.g. load surrounding tiles of a recently-requested map)
Cache - Replacement Policy
Replacement Policy - algorithm used to maximize cache performance by choosing which data to eject & which data to add in its place[*]
LRU - ejects the most Least Recently Used data
advanced - considers access frequency, size of items, latency & throughput
Data Store Sharding
Sharding - partition data across multiple nodes
Message-oriented Middleware (MOM)
MOM Considerations
Used by distributed systems to communicate amongst nodes[*]
Abstracts OS & network intricacies (e.g. endian format, sockets, etc.)
MOM Types
Fault Handling
Fault Handling Types
High Availability (HA) - delayed recovery to secondary
Fault Tolerant - immediate recovery
Active/Passive - primary fails over to secondary
Active/Active - no primary vs. secondary; when 1 fails, the other(s) takes the additional load
Great YouTube video on the subject!
Avoid SPOF
A Single Point of Failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working.[*]
Performance Metrics
Performance Metrics - Terminology
Bandwidth - The maximum amount of data that can be transferred in a unit of time (e.g. 100Mbps)[*]
Throughput - The actual amount of data that is transferred in a unit of time (e.g. 88MBps)
Latency - The time it takes to send & receive (round-trip) a packet of data (e.g. 20ms)[*]
Performance Metrics - Analogy
Given a water pipe, its diameter determines its throughput, and its length determines its latency. Therefore, to improve:
Throughput - Get a fatter pipe
Latency - Colocate to reduce distance or reduce network hops (point-to-point circuit), which also reduces distance that data have to travel
Performance Metrics - Tail Latencies
A tail latency, like P99 or P95, is used to represent the worst-case scenario.[*]
e.g. P99 = 0.75ms: For the 99th percentile, the request latency was 0.75ms. Therefore, 99% of the requests were faster than 0.75ms.
MapReduce
MapReduce - Explanation
Uses parallel & distributed systems to process large data sets[*]
Implementations - Spark, Hadoop, etc.[*]
MapReduce - Steps
Fundamentally, consists of two steps, Map & Reduce, but Shuffle step is also prevalent:
Map - Organizes/filters/sorts. Think of putting elements into a typical Map Interface with key-value pairs (e.g. <key, value>)
Shuffle - Redistributes data so that all data pertaining to a given key reside on the same node
Reduce - Summary/aggregation (e.g. sum all values for a given key)