Building an Alternative Polkadot Client: Gossamer 2023 Year-in-Review
A year-end update on Gossamer, ChainSafe's Go implementation of the Polkadot Host.
Built by ChainSafe, with support from the Web3 Foundation, Gossamer is a Polkadot host written in Go. That is, a framework to build and run nodes for different blockchain protocols compatible with the Polkadot ecosystem. As an alternative client, Gossamer is intended to provide greater network security and access to more developers.
Over the last twelve months, the Gossamer team has been hard at work! As the year winds down, we wanted to recap some of our achievements. Let’s go!
Migrating to PebbleDB
Gossamer switched from BadgerDB to PebbleDB due to recurring panics and poor performance. PebbleDB, developed by the CockroachDB organization, is also used in go-ethereum.
The migration process involved the following steps:
- Implementing PebbleDB: A new implementation of our storage interface using PebbleDB was created and placed under the internal/database directory.
- Ensuring compatibility: The PebbleDB implementation fulfills our storage interface, allowing for seamless integration with existing applications.
Our migration to PebbleDB proved to be a success! We have not experienced any of the recurring panics or memory issues that we experienced with BadgerDB.
Synching and block finalization
Another significant milestone was surpassing the 9.4 million block sync mark. This impressive feat is the result of a major refactor of the sync package, leading to a multitude of improvements that enhance stability, speed, and overall user experience.
A few key improvements:
- Faster synchronization & enhanced error handling: The new sync approach is more stable and consistent, eliminating bugs and discrepancies that previously bogged down the process. Blocks can now be retrieved more efficiently, resulting in significantly faster synchronization times. This is achieved through the use of worker pools and an optimized peer selection system, translating to a quicker and more efficient experience for users syncing their nodes. The updated sync package can also identify and punish peers who send bad blocks, ensuring data integrity and preventing the spread of corrupted information.
- Seamless syncing and improved logging & transparency: You can now seamlessly resume syncing after a node restart, saving time and effort. Enhanced logging provides users with better visibility into the sync process, allowing for faster identification and resolution of any issues.
- Optimized state trie loading: Loading the state trie from the latest block is now available, preventing delays and ensuring a smooth synchronization process. Thanks to several runtime layer fixes, a few critical issues have been resolved, correcting wrong values and missing implementations in the child-trie package.
Wazero
During the syncing process from a Gossamer staging node to Westend, our team encountered several significant challenges related to memory management which we isolated to Go-Wasmer. Go-Wasmer is a package of CGO bindings to the rust Wasmer package. This led us to discover a new native go WASM interpreter called Wazero that didn’t exhibit the same issues we had with Go-Wasmer.
Our team is working on a fork of Wazero that restores the ability to export memory, which was available in older versions of Wazero but was removed in a later version.
Without going too far into details, a FRAME-based runtime calls out to the Polkadot host to allocate memory, which requires the host to be able to export memory to the WASM blob that imports the exported memory module. With further developments in WebAssembly, memory allocation is now handled internally, which is why the Wazero team removed this ability to export memory in later versions. To ensure compatibility with FRAME-based runtimes, we needed to restore this feature to make it work in Gossamer using Wazero.
New heap allocator
Investigation & resolution:
- A bug appeared as an "out of bounds memory access" error related to memory block #9412261. It was not caused by memory growth directly, but instead by a mismatch between the type and value of the memory page size. This caused Wazero, the runtime instance runner in charge of memory management, to return an incorrect page size.
- Tests were performed using a specific Wazero runtime configuration. When the memory size was increased, a different issue related to the Wazero runtime instance runner occurred. Wazero returned an incorrect page size because of a type mismatch (uint32 overflow).
- This problem has been reported to the Wazero team, and a workaround is available through the ongoing heap allocator refactor.
Testing in high-speed
We noticed in prior tests that testStreamHandler, had a time.Sleep(time.Second). This meant it waited for the messages sent to the stream handler to arrive before calling the asserts to check the results, which contributed to the slow test execution.
However, adding a "messageArrived" channel to testStreamHandler offered us a more efficient approach. Whenever a message arrives, it triggers an alert through the channel, instantly resuming the test and eliminating the unnecessary wait.
How we improved wait times:
- Eliminating time.sleep by utilizing a dedicated channel for message notifications.
- Creating individual servers and channels for each test to improve performance.
- Using dedicated loggers for each test to simplify management and avoid conflicts.
- Leverage parallel table-driven testing to run multiple tests concurrently and speed up execution.
We also looked into our top six packages that were the slowest to run and made some modifications to our testing methods. Here are the results:
Package | Before | Now |
---|---|---|
github.com/ChainSafe/gossamer/dot | ~350.508s | ~3.989s |
github.com/ChainSafe/gossamer/dot/core | ~311.674s | ~0.693s |
github.com/ChainSafe/gossamer/lib/babe | ~571.752s | ~2.647s |
github.com/ChainSafe/gossamer/dot/sync | ~297.813s | ~9.034s |
github.com/ChainSafe/gossamer/lib/runtime/wasmer | ~225.374s | ~29.667s |
github.com/ChainSafe/gossamer/dot/rpc/modules | ~614.548s | ~0.938s |
GRANDPA
As we were debugging block authoring in a cross-client devnet, we encountered a temporary roadblock. The finality kept stalling on the Gossamer node. The root cause of the issue was that the Gossamer implementation of GRANDPA did not support multiple concurrent voting rounds. When later rounds are estimable, the previous rounds should be pruned. We also noticed that the Gossamer implementation of GRANDPA didn't support the Primary propose message.
Based on our assessment, we lacked clarity on how we would modify the current implementation to reflect the intended behavior of the GRANDPA protocol. When examining the substrate code, we found that it actually references a standalone GRANDPA package that contains all the voter logic and round handling.
We were able to successfully translate the standalone package while also using Go generics. We are currently incorporating the standalone GRANDPA package into Gossamer.
V1 Trie
Necessary modifications were made to the encoding of the trie. These changes were needed to implement the V1 trie since this upgrade has been included as a runtime upgrade for both Kusama and Polkadot. This encoding is used when we need to calculate the trie root and generate/verify merkle proofs.
Results achieved with V1:
- Network compatibility that allows us to sync to the chain's tip
- Reducing the value size of the data transferred in a merkle proof nodes to a maximum of 32bytes
Furthermore, we have implemented snapshots in our staging environment. These snapshots serve as "checkpoints" during our sync process, allowing us to resume the process from the point where the last version failed.
Our team has utilized this feature to continue working while addressing any remaining syncing issues. This is very helpful to us as every time an issue is resolved, we can easily resume the sync process using the last snapshot available.
ZombieNet
ZombieNet is a tool created and used by paritytech for testing Substrate-based blockchains. It has two main functions: spawning and testing ephemeral networks. It can work with Polkadot nodes, but integrating Gossamer nodes comes with its challenges.
ZombieNet usually generates a chain spec for Polkadot nodes using the build-spec command. However, Gossamer's build-spec command requires different arguments. To bypass this, we can use the chain_spec_path attribute to specify a pre-built chain spec file.
Another problem we faced was with the default_command and command attributes, as they include unrecognized arguments for Gossamer, causing startup failures. To solve this, we can use the command_with_args attribute to specify only the desired arguments and remove the unrecognized ones.
Modifications were implemented to allow Gossamer nodes can be started successfully using chain_spec_path and command_with_args
Integrating Gossamer nodes with ZombieNet requires careful attention to attribute configurations and potential differences in logging. By utilizing modifications like chain_spec_path and command_with_args in ZombieNet, and potentially implementing logging of the startup terms in Gossamer, we can achieve better integration and enhance our testing capabilities.
Support for parachain consensus
An essential component of the Polkadot protocol is to author and finalize parachain blocks. We are actively developing on this front and wanted to share some of our key highlights:
- Gossamer now supports all ParachainHost runtime API calls
- Our team has implemented an overseer for inter-subsystem communication. This is inspired by polkadot-sdk's overseer. Although modified, it was built with all the benefits of Golang in mind.
- Currently, Gossamer is able to validate parachain candidates, including testing a candidates Proof-of-Validity against a Parachain's own runtime.
- Gossamer now has Parachain service (dot/parachain) amongst other parallell running services. This service runs Validation and Collation Protocols, and our team is hard at work to get this fully operational in the coming year.
- In addition, as the Polkadot codebase for parachain consensus is divided into multiple subsystems, our team is diligently working on subsystem components such as collator protocol, candidate backing, and availability.
Get involved
To stay up to date on Gossamer in 2024, follow ChainSafe on Twitter.
You can also dig into the Gossamer code here or read the docs. Have a question or comment for us? Hop into our Discord #Gossamer-general channel👋
Thanks to Timothy Wu, Kirill Pisariev, Jimmy Johnson, Ed Mack, Kanishka Nagaraj, Kishan Sagathiya, Eclésio Melo, Diego Romero, and Axay Sagathiya for their contributions to this article.
About ChainSafe
ChainSafe is a multichain research and development firm that supports the decentralized web through high-impact contributions to leading protocols. Our work comprises node implementations, interoperability infrastructure, gaming solutions, distributed systems research, blockchain applications, tools, audits, and much more. Everything we work on is open source and community-oriented.
Website | Twitter | Linkedin | GitHub | Discord | YouTube | Newsletter