Forest: Unlocking Data Accessibility on the Filecoin Network

Forest: Unlocking Data Accessibility on the Filecoin Network

Forest is making waves in the Filecoin ecosystem with efforts to enhance data accessibility on the blockchain.

Forest is a Filecoin client written in Rust. Built by ChainSafe with support from Protocol Labs, some of Forest’s new features are a case study on how diversity of thought and implementation can push the ecosystem forward. We think these ideas are especially relevant to users doing historical blockchain analysis. Let's explore the updates!

Publishing the entire history of Filecoin 

The Filecoin mainnet was launched in August 2020. Since then, there have been three million rounds of blockchain messages (epochs), including transactions, storage deals, smart contract executions, and storage proofs. This is a non-trivial amount of data.

In building Forest, it’s critical for us to ensure compatibility with the current network state. To do this, we wanted to go from genesis, run through the entire blockchain history, and see if we generated matching results.

However, locating the data turned out to take more work than expected. Apart from the difficulty of providing a fully public, accessible, eternal, and verifiable blockchain, Filecoin faces two challenges:

  • Public nodes typically store messages from only the last two or three months - if you want anything further back, you won't find it here. 
  • Protocol Labs' Sentinel project does have this data, but it's an unwieldy 40 terabytes, and downloading it also comes with hefty egress fees.

So, we decided to collect the data ourselves, deduplicate it, and make it freely accessible without egress fees using Cloudflare R2. Starting now, you can browse the mainnet and calibnet histories here.

We also have a Forest archival node running internally - a single Filecoin node that can serve historical data to other nodes. Importantly, this runs on commodity hardware, or in other words, computers or components that are readily available, inexpensive, and easily interchangeable.

Introducing diff snapshots

Snapshots are copies of (some of) the chain data at specific points in time, and they're often exchanged for offline analysis or when a new Filecoin node is starting up.

We mentioned the Sentinel team had 40TB of snapshot data, a significantly large quantity. While looking for a way to reduce this size, we spotted a lot of duplicate information in how lightweight chain snapshots were used and came up with the concept of diff snapshots.

You can read the documentation to learn more, but the upshot is that we were able to reduce Forest's copy of the blockchain to less than 20TB - reducing the archive size by 50%. 

Changing CARs 

Snapshots are distributed as Content Addressable aRchives, or CAR files. These are simple mappings from Content ID to Filecoin data, which makes them great for data exchange, but Filecoin nodes typically need to also use a database, meaning:

  • The data is duplicated. That is, 100GB of snapshot data (the typical mainnet size) is stored in the CAR, and an additional 100GB (typically more) is stored in the database.
  • Loading data at, e.g., node startup is incredibly slow.

To save disk space, snapshots are usually compressed: snapshot.car.zst. This ~halves the disk usage.

To speed up loading data, CARv2 is being worked on - these snapshots contain an in-file index. With such an index, the data doesn't have to be loaded into the database and can be used in place.

But the two regimes above are exclusive. You can't have low disk usage and fast loading times, which is why we developed snapshot.forest.car.zst files for Forest. This provides the best of snapshot loading times and disk usage. 

Importantly, these files are backward-compatible with other implementations like Lotus, and the team’s ultimate goal is to merge them into a wider format. Forest can also use multiple snapshot files at once - this is, in fact, how the internal archival node works.

Bridging the gap with Web3Mine

While Forest is upgrading the way we access and store data on the Filecoin network, it's essential to highlight other innovative teams working alongside us.

One example is Web3Mine, a group focusing on pioneering deal orchestration via smart contracts, enabling a more decentralized storage network and empowering individual storage providers to specialize in their domain of expertise.

Together with Forest, they're exploring Filecoin's archival data requirements for the Filecoin community. Currently, Web3Mine is assisting Forest in its beta test endeavors. A heartfelt thanks to their joint efforts!

Wrapping up

Forest has a number of key features that together help our client satisfy a particular niche in the Filecoin ecosystem. As our implementation evolves, we hope to offer more efficient solutions and new ideas, ensuring that blockchain data remains accessible and manageable for all!

Recent Forest presentation:

FIL Dev Summit 2023

Want to get involved?

Join the Forest open 👉 Slack channel if you're interested in becoming a beta user, or message us at: forest@chainsafe.io

Also, be sure to check out the latest version of Forest on 👉GitHub.


About ChainSafe

ChainSafe is a multichain research and development firm that supports the decentralized web through high-impact contributions to leading protocols. Our work comprises node implementations, interoperability infrastructure, gaming solutions, distributed systems research, blockchain applications, tools, audits, and much more. Everything we work on is open source and community-oriented.

Website | Twitter | Linkedin | GitHub | Discord | YouTube | Newsletter