forbestheatreartsoxford.com

Total Order Broadcast in System Design Interviews Explained

Written on

Introduction

Total order broadcast, often referred to as atomic broadcast, is a protocol designed for the reliable exchange of messages among nodes in a distributed system. It's important to clarify that "atomic" in this context does not relate to atomicity as understood in transactional operations. This broadcasting method adheres to two fundamental principles:

  1. Guaranteed Delivery: If one node receives a message, every other node in the system will also eventually receive that same message.
  2. Consistent Order: All nodes must receive messages in the identical sequence. No node can retroactively add a message into the sequence of previously received messages. This can be visualized as an append-only log, where new messages can only be added to the end of an existing list. Consequently, all nodes read this log and find the messages arranged in the same order.

These guarantees are particularly beneficial when duplicating a database. If every node processes each write directed at the database in the same order, they will eventually converge to the same database state, ensuring consistency among them. Temporary delays may occur due to network issues, which is also known as state machine replication.

The algorithm implementing total order broadcast must uphold the aforementioned conditions. It should be robust, capable of retrying message delivery in the face of network disruptions until successful. Zookeeper has established a protocol named ZAB (Zookeeper Atomic Broadcast) to ensure orderly replication across all nodes and to manage leader election.

There are various types of broadcast protocols, though this discussion will focus primarily on total order broadcast:

  1. Best-effort broadcast
  2. Reliable broadcast
  3. FIFO broadcast
  4. Causal broadcast
  5. Total order broadcast
  6. FIFO-total order broadcast

The broadcast protocol is executed through a broadcast algorithm. A node engaged in this protocol transmits a message to all other nodes, thereby constituting a broadcast. The following image illustrates nodes participating in a broadcast:

When a node broadcasts a message, it also receives it back through the broadcast algorithm. However, the node is unaware of the message's position in the total order and cannot simply add it to its maintained broadcast sequence.

Let's consider an example involving three nodes: A, B, and C.

Assume node A broadcasts two messages, m1 and m3, while node B sends out message m2. Let's say the established total order is m1, m3, and m2, but the chronological sending sequence is m1, m2, and m3.

  1. Node A sends out m1, which is received by itself and other nodes.
  2. Node B broadcasts m2, which reaches all nodes on the network. However, since m2 is later in the total order, the broadcast algorithm delays its delivery.
  3. Node A then broadcasts m3, which is successfully delivered across the network.
  4. Finally, m2 is delivered to all nodes, thus fulfilling the total order broadcast requirement.

Equivalence to Consensus

Achieving total order or atomic broadcast is synonymous with reaching consensus among the participants of the broadcast protocol. For atomic broadcast conditions to be met, participants must effectively "agree" on the sequence of message receipt. Participants recovering from failures, after others have "agreed" on the order and begun receiving messages, must be able to understand and comply with the established order.

We can construct linearizable storage using total broadcast and vice versa. In fact, both building a linearizable compare-and-set operation and total order broadcast address the consensus problem. However, while total order broadcast and linearizability are related concepts, they are not identical. Total order broadcast ensures reliable message delivery in a predetermined sequence but does not guarantee the timing of delivery. In contrast, linearizability provides a recency guarantee, indicating that a read will always reflect the most recent committed write.

To illustrate, consider a scenario where two users attempt to register the same username concurrently. A solution using total order broadcast involves a variable (or register) initialized to false, indicating whether a username is taken. If a linearizable compare-and-set operation exists, it can be used to claim a username and set the register to true, marking it as taken. Once set to true, any subsequent reads will return true, signaling to nodes that the username is already claimed. The challenge lies in implementing the compare-and-set operation. Interestingly, with total order broadcast established, we can create such a linearizable compare-and-set operation as follows:

  1. The node wishing to register a username broadcasts a message with the intended username. This can also be viewed as the node appending a message to an append-only log, with the understanding that delivery may be delayed to honor the broadcast order.
  2. The node waits for its own message to be delivered back.
  3. Once its message is received, the node reviews the messages to determine if it is the first instance of that username. If so, the node claims it; if a prior message exists from another node, the registration request is aborted.

Since all messages are delivered in the same sequence to every node, the first write in the broadcast order wins in the event of concurrent writes, while others are aborted. This mechanism allows for linearizable writes, although linearizable reads require different approaches:

  1. Similar to the linearizable write method, a linearizable read can be achieved by broadcasting a message and waiting for its delivery back. The read occurs once the message is received.
  2. Reads can be directed to the leader or a replica that synchronously updates on writes, ensuring the latest value is always returned.
  3. If it's feasible to ascertain the sequence number of the latest broadcasted message in a linearizable manner, a node can await receipt of all messages up to that sequence number before executing a read.

If we start with linearizable storage, we can construct total order on top of it. Imagine a simple counter variable/register that can be incremented using a compare-and-set operation. Nodes involved in the broadcast protocol access this register to obtain the next sequence number. This number is assigned to the broadcasted message, which is then delivered in order based on sequence. If a node receives messages numbered up to 10 but then receives a message numbered 15, it knows it must wait for messages numbered 11 through 14 before processing message 15 due to potential network delays.

Your Comprehensive Interview Kit for Big Tech Jobs

  1. Grokking the Machine Learning Interview

    This course equips you with essential skills and covers frequently asked interview problems at major tech firms.

  2. Grokking the System Design Interview

    Learn preparation strategies and practice common system design interview questions.

  3. Grokking Dynamic Programming Patterns for Coding Interviews

    Streamlined preparation for coding interviews.

  4. Grokking the Advanced System Design Interview

    Explore system design through real-world architectural reviews.

  5. Grokking the Coding Interview: Patterns for Coding Questions

    Accelerate your coding interview preparation.

  6. Grokking the Object Oriented Design Interview

    Prepare for object-oriented design interviews and practice related questions.

  7. Machine Learning System Design

  8. System Design Course Bundle

  9. Coding Interviews Bundle

  10. Tech Design Bundle

  11. All Courses Bundle

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

New Era of AI Text Generation Tools on the Horizon

Explore the upcoming innovations in AI text generation tools, powered by OpenAI’s API, and what they mean for users and developers.

The Remarkable Geometry of Ancient Hominid Tools

A study reveals that early hominids may have intentionally shaped tools, suggesting advanced cognitive skills in geometry 1.4 million years ago.

Unlocking Developer Success: Strategies for Growth and Progress

Explore key strategies for developers to enhance their careers, focusing on hard work, smart work, and leveraging new technologies.

Mastering Input Validation: Positive vs. Negative Testing

Explore the differences between positive and negative testing in input validation using Python, focusing on best practices and edge cases.

Building Wealth: An Effective Strategy for Financial Success

Discover a comprehensive approach to growing your wealth from scratch, emphasizing mindset, discipline, and strategic investment.

Giving Space for Life-Changing Decisions: An Essential Guide

Understanding the importance of giving people space while making significant life choices is crucial for their growth and independence.

The Surprising Truth Behind America's Obesity Crisis

Despite healthier habits, obesity rates soar. Discover the hidden factors affecting weight gain and learn how to embrace self-love.

Healthy Habits: A Practical Formula for Success

Explore how to cultivate healthy habits using effective strategies that blend personal desires with essential tasks.