Guides

Polkadot’s Solution to the Byzantine Generals Problem: Grandpa Protocol

Gossamer is a Golang implementation of the Polkadot host, and recently, we've been working on implementing the GRANDPA protocol in Go.

Jimmy Johnson

Apr 10, 2024 — 7 min read

Gossamer is ChainSafe's Golang implementation of the Polkadot host. Recently, we (the Gossamer team) have been working on implementing the GRANDPA Protocol in Go.

We've implemented it as a standalone package and are working on integrating it into our codebase to enable Gossamer to participate in Polkadot consensus. Below, I'll break down Polkadot's solution to the Byzantine General's Problem, GRANDPA.

A general of the Byzantine army

Sits in his tent, impatient with the bureaucracy required to form a plan of invasion. The city to be sieged lies but miles away, yet here he sits, sending message after message to countless other generals.

The city is surrounded by each general stationing his division, so escape is impossible. Coordinating an attack is not an easy business, especially when he is aware that there are traitorous generals in his army. Our general lies awake, unable to sleep, pondering how he can reach an agreement with the other loyal generals on a plan of attack without letting the traitors disrupt him.

There is a name for this problem: the Byzantine Generals Problem. It is a classic game-theoretic problem. It describes a situation in which multiple divisions of the Byzantine army station outside of an enemy city, where they can only communicate via messenger and need to coordinate a plan of attack.

The issue is that there may be traitors among them who will try to disrupt the planning and stop them from reaching an agreement. The generals then need to devise a way to do two things:

1) All agree on a plan of attack, and 2) make sure that a small number of traitorous generals cannot trick the loyal generals into a bad plan.

In 1978, Robert Shostak, through a NASA-funded project, conceived and formalized the problem of obtaining Byzantine consensus. Back then, he dubbed it the interactive consistency problem. By 1982, through collaboration with Leslie Lamport and Marshall Pease, it became the Byzantine Generals Problem. This breakthrough led to the development of Byzantine fault-tolerant (BFT) systems in various industries.

Skip ahead four decades, and it turns out this problem also applies to blockchain and distributed systems where nodes need to reach an agreement in the presence of arbitrary node failures (byzantine faults).

In an open network, such as a blockchain, where any node can participate in consensus on the state of the network, the network as a whole needs to handle a small amount of misbehavior. Hence, the solution to this problem is very applicable to blockchain consensus to ensure network integrity and valid state transitions.

Consensus, sybil, block...

Before diving into the protocol, I want to clarify a few terms that are often used loosely: consensus, sybil resistance mechanisms, and block author selection algorithms.

Proof of Stake (PoS) and Proof of Work (PoW) are not consensus algorithms. In the context of blockchain systems, consensus involves the network deciding on the chain's current state. PoS and PoW, however, are sybil resistance and block author selection mechanisms.

Sybil resistance mechanisms protect networks from sybil attacks.

In a sybil attack, one node operates many sybil identities to gain control of a disproportionately large portion of the system. In Bitcoin, requiring nodes to perform actual work in PoW is a sybil resistance mechanism because a node must complete the necessary work to propose a block. In Ethereum, instead of requiring nodes to do the work, the system requires nodes to stake money to propose a block, thereby making a node’s control in the network proportional to its stake.

Block author selection algorithms tend to be used in conjunction with sybil resistance mechanisms, and often, sybil resistance is a property of block author selection algorithms.

In Polkadot, BABE acts as a block author selection algorithm in combination with proof of stake. In BABE, validator nodes are selected pseudorandomly via a Verifiable Random Function (VRF) to propose a block at a given height. In this case, the block author selection mechanism, BABE, is used with the sybil resistance mechanism (PoS) to secure the network.

GRANDPA..Wat Do?

GRANDPA stands for GHOST-based Recursive Ancestor Deriving Prefix Agreement and is Polkadot's consensus protocol, which runs alongside BABE, its block production mechanism.

GRANDPA is a classic BFT-style algorithm with only a few notable differences from other BFT algorithms. For one, GRANDPA finalizes chains, not blocks. This difference means it doesn't operate one block at a time but can batch process and finalize multiple blocks simultaneously.

Every node in the network running GRANDPA runs GRANDPA rounds, each containing multiple phases and resulting in a new finalized chain. Like all BFT-style algorithms, the network assumes that (2/3)n + 1 nodes (where n is the number of nodes) are honest. There is a fixed number of nodes per GRANDPA round because measuring the threshold of (2/3)n + 1 n must be defined. The protocol also supports weighted voting, although all nodes have equal weight in Polkadot.

First, in every round, the network selects a "primary" node through a round-robin approach, where all the authorities in the authority list take turns being the primary. The primary node broadcasts the highest block it thinks can be final from the previous round.

In Substrate’s GRANDPA implementation, nodes will move on from round r when r is completable, meaning that nodes need to be able to process multiple rounds simultaneously.

For a round to be considered completable, the number of observed votes must be greater than the threshold of (2/3)n + 1. However, just because a round is considered completable does not mean it is ready to be finalized. The node must consider potential votes, which are votes that the node may still receive for that round and consider if these votes can change the outcome of the round. Once the potential votes cannot change the outcome of the round, then the round is considered finalizable.

Before round r gets finalized, if round r+1 begins, the primary for round r+1 will send their best estimate of what will get finalized in round r. Of course, if round r gets finalized, the primary will broadcast it.

A point here is that, unlike BFT algorithms that rely on a leader, GRANDPA can still function if the primary crashes, avoiding the need for responses like a view change.

Next, after a network delay (due to the network being partially synchronous), the network enters the pre-vote phase of the GRANDPA round. In this phase, each validator casts a "prevote" for the highest block it thinks should get finalized. This step is because every node tells the network what chain they believe should be finalized. If the primary broadcasts its chain at the beginning of the round, then the pre-vote chains should extend the chain proposed by the primary.

Afterward, the network moves onto the pre-commit phase of GRANDPA, where, based on the set of pre-votes the nodes in the network have received, each validator will compute the highest block that should get finalized.

This block is known as the GRANDPA ghost. The GRANDPA ghost is the block with the highest number and a supermajority ((2/3)n + 1) of pre-votes. Every node calculates this for themselves and broadcasts the result to the network in a "pre-commit" message.

Lastly, the validators gather pre-commit messages until they have a chain with a supermajority of votes. Once they reach the threshold, a "commit" message gets sent to the network representing the newly finalized chain. Then the round ends, and the next GRANDPA round begins!

Accountable safety

Now that you understand what a GRANDPA round looks like and how the protocol works, another exciting feature worth touching on is that GRANDPA provides accountable safety.

Accountable safety means that if the honesty assumption gets violated and equivocating nodes (nodes that vote for conflicting chains at the same block height) have been able to partition the network, the protocol can determine the nodes who equivocated and, assumedly, punish them. At a high level, this happens by asking all validators a series of questions.

Imagine if blocks A and B get finalized at height h. First, the protocol will ask, "Why did you, a node who voted for B, not think block A was final when you voted for B?"

Honest validators should respond with their set of pre-votes (or pre-commits) from the round where B was finalized, which contains a supermajority for B.

Second, the protocol asks, "Which pre-votes for the round that finalized A did you see?" After answering both questions, the union of these answer sets can reveal the nodes that voted maliciously.

Nodes essentially rat on their misbehaving peers by revealing the votes they received. In this way, GRANDPA can identify malicious nodes and pass that data to the protocol to handle as it will, though, unfortunately, a hard fork is still required.

Answering the Byzantine General's Problem

That night, in his sleep, our general has a vision where Leslie Lamport appears and gives him the knowledge of distributed consensus. He awakes in the morning, knowing that victory will be his. FIN

More Gossamer

Chat directly with the team → Discord
Learn more by checking out the → Docs
Keep a tab on the progress → Twitter

You can also check out Axays Sagathiya's recent talk on Gossamer at sub0.

About ChainSafe

ChainSafe is a leading blockchain research and development firm specializing in protocol engineering, cross-chain interoperability, and web3 gaming. Alongside its contributions to major ecosystems such as Ethereum, Polkadot, and Filecoin, ChainSafe creates solutions for developers across the web3 space utilizing expertise in gaming, interoperability, and decentralized storage. As part of its mission to build innovative products for users and improved tooling for developers, ChainSafe embodies an open-source and community-oriented ethos to advance the future of the internet.