EP4: 11 Data Structures Powering Your Database
In this week’s episode of System Design Weekly, we discuss a popular geo-encoding algorithm called Geohashing! It offers a powerful and flexible way to convert geographic coordinates into compact 1-dimensional strings.
Next, we explore 11 key data structures that power modern databases. From B-Trees and WAL to Skip Lists, Suffix Trees, MemTables, and LSM Trees, understanding these underlying structures is crucial for choosing the right database to solve your engineering challenges!
Towards the end, in Tech Spotlight, we will learn about Airbnb’s Mussel key-value store for handling Petabytes of data with low-latency.
🌏 The Algorithm Powering ‘Find Restaurants Near Me’!
Geocoding is the process of converting location coordinates on Earth into a format that can be easily used for Geographic Information Systems (or GIS). Geohashing is one of the most popular method of Geocoding. It translates any location coordinate to a 1-dimensional string of alpha-numeric characters.
There are some interesting properties of Geohash that makes it suitable for GIS ~
The longer the Geohash is, the more precise it becomes
Nearby locations often have similar Geohash prefixes
It is computationally cheaper to convert between Geohashes and coordinate-location
The entire Earth is divided into four parts, with each part assigned a length 2 binary string. We start by choosing the part containing the location to be encoded. Next, we split the chosen part into four and repeat the process. The resultant binary string is the Geohash of the location.
≫ Checkout my Medium blog for a deep dive
🖥️ 11 Data Structures That Power Your Database
Databases use a number of data structures to enhance read/write performance. Here are such 11 key data structures that every developer should know ~
Hash Indexes - Used for mapping keys to value using hash functions, enabling fast lookups.
Skip Lists - A probabilistic data structure supporting fast search, insertion, and deletion, used for MemTable implementations.
Inverted Indexes - Store a mapping from content keywords to their locations in a database, widely used in full-text search engines.
B-Trees - A self-balancing tree structure that maintains sorted data on disk, enabling efficient range queries and ordered data retrieval.
Bloom Filters - A space-efficient probabilistic filter used to test whether an element is a member of a set, often used for quick membership checks.
MemTables - In-memory data structures used to buffer writes before flushing them to SSTables, improving write performance in databases like RocksDB.
BitMap Indexes - Use bitmaps to represent the presence of values in a database, providing fast querying for columns with low cardinality.
SSTables - Sorted files used for persistent storage in key-value stores, where data is written in sorted order for efficient retrieval.
Prefix Trees - Organize data by common prefixes, enabling fast search and retrieval of strings or keys, often used in autocomplete and dictionary searches.
Write Ahead Logs - A logging mechanism the ensures data consistency and durability by recording all changes in an immutable append-only log.
Suffix Trees - Store suffixes of strings in a tree structure, used for string matching and pattern searching in applications like DNA sequence analysis.
≫ Checkout my Medium blog for a deep dive
🔦 Tech Spotlight!
In this week’s Tech Spotlight, I bring to you an amazing article by Airbnb’s engineering team discussing their home-grown key-value datastore called Mussel.
➥ Mussel - Airbnb’s Key-Value Store for Derived Data
Mussel is used for serving petabytes of derived and real-time data to hundreds of other services inside Airbnb. At its core, it makes use of a number of open source technologies like Apache Helix, Apache Spark, Apache Kafka, and HRegion.
At System Design Weekly, we've just released a concise summary of Mussel’s design and evolution. If you’re looking to quickly grasp the key concepts, check out the case study here.
And that is it for this week! If you enjoyed what you read, consider clicking the ❤️ button below and leaving a comment.
See you next week in yet another release of System Design Weekly! Till then, happy learning!