John DiFini's Blog

Categories

Authors

May 12

May 12 Cassandra Overview

Technology, Data Architecture

What is Cassandra?

Free Slide Deck

Database Management System (DBMS)
Open-source (Apache)
Distributed - runs across many commodity servers[*]
Designed to handle large amounts of data
Fault-tolerant
- No Single Point of Failure (SPOF)
- Runs across multiple data centers
- asynchronous, masterless replication

Fully Replicated[*]

Replication Factor

Example Write[*]

Flush Process[*]

Compaction[*]

With each Flush, SSTables accumulate
Compaction periodically consolidates SSTables into a single file using merge sort
Merges duplicate keys using last-write-wins policy
Removes records marked for deletion

Disk IO[*]

Cassandra uses sequential instead of random IO, particularly important for HDDs vs SSDs
Relational DBMSs front-load IO but Cassandra back-loads it - defers IO until Compaction
File System needs to keep up with Compaction and its sequential IO; therefore, use local storage, NOT shared storage

Partitioning/Sharding[*]

Data are partitioned by hashing the primary key into a token
Each node is responsible for a range of tokens
Tokens can range from -2^63 to 2^63 but in this example, we use 1 to 100

Partitioning Example

Running Cassandra on Mesos[*]

Mesos

An Operating System for the Data Center (i.e. DCOS)
Pools machine resources (CPU, memory, storage) across the data center and manages them holistically

How Uber Does It[*]

Components

Zookeeper - a popular leader election system. Leader election is the process of designating a single service as the leader; all backups recognize the leader[*]
Aurora - a scheduler for batch jobs and long-running services
Persistent Volume - a storage volume that exists outside the task’s sandbox and will persist on the node even after the task dies or completes[*]

Deeper Dives

Leave a comment

Latest Article

Follow Us

@tudorinsidestyle

IMG_7655+copy.jpg

Ads can be placed here

Product Management

Jun 12 Product Management

Binary Reference Sheet

May 12 Binary Reference Sheet