PC#17 - 4 Types of NoSQL Databases

Airbnb's Internationalization System and More...

Hello, this is Saurabh…👋

Welcome to the 131 new subscribers who have joined us since last week.

If you aren’t subscribed yet, join 1500+ curious Software Engineers looking to expand their knowledge by subscribing to this newsletter.

In this edition, I cover the following topics:

🖥 System Design Concept → 4 Types of NoSQL Databases

🧰 Case Study → Dissecting Airbnb’s Internationalization System

🍔 Food For Thought → The Importance of Side Projects

So, let’s dive in.

🖥

System Design Concept

4 Types of NoSQL Databases

NoSQL databases have skyrocketed in popularity over the last few years.

Even though the lines have blurred between NoSQL and SQL databases over the years in terms of features, NoSQL databases have seen significant industry adoption.

Two major reasons for this are scalability and flexibility.

👉 NoSQL databases are scalable horizontally, allowing them to handle large amounts of data distributed across multiple servers. Most NoSQL databases support techniques like sharding & replication out-of-the-box.

👉 Also, NoSQL databases are more flexible because of their schema-less structure. This makes them great for storing unstructured as well as semi-structured data such as text, images, and videos.

By the way, none of these features are now exclusive to NoSQL databases. Even relational databases support them.

But of course, NoSQL databases have been doing it from the beginning.

If the application is new and the requirements are not mapped out clearly, the pull of NoSQL databases becomes quite strong.

However, NoSQL databases also come in multiple flavors and it’s important to make a suitable choice for the type of application you are building.

Document Databases

This is probably the most popular category of NoSQL databases.

Examples include names like MongoDB, Couchbase and RavenDB.

Chances are that you might have already come across document databases as many new projects tend to start with them.

MongoDB is particularly popular in big as well as small companies.

In a document database, the data is stored in the form of documents (JSON, BSON, or XML).

These documents can be designed to be much closer to the domain-level data objects within the application.

Check out the below illustration:

Key-Value Store

This is probably the simplest type of NoSQL database out there.

Examples include Redis, etcd, and DynamoDB.

Every data element is stored as a key-value pair.

The key is an attribute name. Value is the actual data or object.

At a fundamental level, a key-value store looks a lot like a relational database with just 2 columns - key and value.

There are many use cases where a key-value store is an ideal choice:

  • caching frequently used data

  • shopping carts

  • user profile information

Here’s an illustration that describes a key-value store.

Column-Oriented Database

In a column-oriented database, the data is stored as a set of columns.

This is unlike a relational database where data is stored in rows and read row by row.

Examples of column-oriented databases include Apache Cassandra & Apache HBase.

The advantage of this comes when you have to run analytics on a small number of columns for aggregation or calculation.

The columns can be read directly without consuming memory in fetching unwanted data. An example is calculating the total salary paid out in a year.

Of course, column-oriented databases have downsides.

They are not strongly consistent and writes for each column require multiple write events on the disk.

See the below illustration that compares the data storage approach between a row-oriented and column-oriented database.

Graph Databases

A graph database deals with relationships between data elements.

Each element is a node and is connected to other elements.

The connections between these nodes are called links.

Think of social media as a collection of people (nodes) that are connected with each other.

Examples of graph databases are Neo4j, Amazon Neptune, etc.

Unlike a relational database where links are implied, a graph database stores connections as first-class elements. This helps avoid the overhead of joining multiple tables in a typical SQL database.

Despite their advantages, the use of graph databases is not as common. A few places where they really shine are building knowledge graphs of information and other map-like applications

Check the below illustration that shows the concept of Graph databases.

Takeaway

Having looked at the various types of NoSQL databases, the obvious question is - when to use what type of NoSQL database?

Here’s a quick reference that can help make the decision.

Document DB: Great for almost all types of applications that rely on SQL databases.

Key-Value: Shopping carts, user profiles and caching.

Column-Oriented: Analytics-based requirements

Graph: Maps, knowledge graphs and so on.

🧰

Case Study

Dissecting Airbnb’s Internationalization Platform

Airbnb operates globally with a vision to allow people to “Belong Anywhere”. 

This means that it has a big need to cater to different languages depending upon the location of the property and the guests.

To support this objective, Airbnb built a comprehensive Internationalization (I18N) Platform.

This platform helps them serve translated content across multiple product lines all over the world.

Here’s a high-level illustration of the entire platform.

There are 3 main parts to the Internationalization Platform.

Content Management System

The Content Management System allows developers and content managers to create and modify content.

This is where content is submitted for translation along with relevant metadata.

The metadata consists of supporting details such as descriptions and screenshots that can potentially improve the quality of the translation.

The overall content is divided into phrases.

Their storage is managed by a Content Service that assigns a unique string key to each piece along with a last updated timestamp.

Translation Pipeline

Once the content phrases are added or modified, they are marked as ready to translate.

Such phrases are sent at regular intervals to an external Translation Vendor. The target locales are also provided to the vendor.

Once the vendor performs the translations, it notifies a callback service with the batch of translations. These translations are sent as translation events to Airbnb’s Event Bus.

An event consumer listens to these events and writes them to a Translation Service.

The job of this service is to persist the translations and send them to client applications. The translations are keyed by [phrase key, locale, timestamp] and are immutable so that a historical audit trail can be maintained.

Translation Delivery

In the Delivery phase, a Snapshotter component loads the latest translations for each locale and creates a JSON blob snapshot.

The snapshot is stored with the associated timestamp to an object store (also known as snapshot store). The role of the snapshot store is to keep a copy of the phrase translations at a specific point in time.

On each client application instance (microservice or web server), the translation data is downloaded and stored in a Local Store by the I18n agent.

The I18n agent runs as a separate process that:

  • fetches the latest translations from the snapshot during app initialization

  • performs pre/post-processing

  • manages on-disk storage

  • continuously pulls in new incoming translations and updates the Local Store.

The Local Store helps resolve translation requests locally without network calls to the server. This is great for reducing request latency and also provides loose coupling between the Translation Service and its clients.

P.S. This post is inspired by the explanation provided on the Airbnb Engineering Blog. However, the diagrams have been drawn or re-drawn based on the information shared to make things clearer. You can find the original article over here.

🍔

Food For Thought

👉 Side Projects for Developers

I’m a strong believer in the power of side projects.

While you gain a lot of experience from your day-to-day work, you can also learn a great deal from your side projects.

But I never considered that side projects could help you combat burnout at your workplace.

Not until I saw this post.

What do you think about it? Can side projects help you deal with burnout?

👉 Don’t Fix Code That’s Not Broken

I asked whether people followed this rule on X (Twitter) and got a bunch of amazing answers.

Do check it out👇

That’s it for today! ☀️

Enjoyed this issue of the newsletter?

Share with your friends and colleagues

Also, send them over here to subscribe.

In case you want me to cover any specific topic in future editions, please don’t hesitate to fill out the below idea-suggestion form.

See you later with another value-packed edition — Saurabh.

Reply

or to participate.