Authored by Willem Olding
An exploration of our methods with loose-typed deserialization
Mina-rs is ChainSafe's Rust implementation of the Mina Protocol, originally written in OCaml. Mina-rs, a tentatively named project, was awarded to ChainSafe through a community grant from the Mina Foundation in May 2021. You can learn more from our official announcement [here,](https://medium.com/chainsafe-systems/realizing-the-mina-vision-in-rust-453f6f522205) and get up to speed from our community updates here.
Very early on in the development of Mina-rs, we realized we would have to write our own implementation of the binary serialization protocol in Rust. The reference implementation of Mina is written in OCaml and uses a serialization protocol called bin-prot. Bin-prot comes from the high-frequency trading world of Jane Street and is efficient and relatively simple. That said it is far from popular. Outside of OCaml it is rarely seen.
Coming from the Rust ecosystem we are used to having a serde implementation for any serialization protocol we might want to use and not having one came as a bit of a shock. Fortunately the similarity between the OCaml and Rust type systems and the simplicity of the protocol allowed us to write our own serde implementation in around a week. We were even able to port the existing Jane Street test suite to ensure our implementation is totally compliant. You can read about it in a previous article. All in all a great serde success story. Job done right? Oh yeah but there is the whole issue that...
The Data Structures in Mina are Really Complex!
This crazy looking tangle is the type graph for a block in the Mina protocol. Each node represents an OCaml type and a directed edge encodes the fact that a type contains another type. You can browse the graph yourself using GraphViz by following the link here.
In order to achieve our milestone goal of deserializing an entire block, we would need to implement a data structure in Rust that more-or-less matches the OCaml one. To make matters trickier, each type in Mina is versioned - a single version field per type - so the structure of our types needs to match pretty much exactly with those found in serde to handle these correctly.
Conventional wisdom would suggest we tackle the task with a bottom-up approach. Starting with the data structures that wrap primitive types, testing those against binary examples, and working our way up the graph until we have a fully implemented and tested data structure.
This was unappealing for a few reasons:
- It is difficult to obtain serialized examples of all the subtypes
- We would be unable to work on algorithms using higher level types at the same time
- We would be unable to deserialize blocks and thus interact with the existing Mina nodes until we had implemented every single sub-type. Even for the fields we didn't immediately need.
Our Alternative Approach - Loose Typed Deserialization
Time for some background.
The Serde library is sufficiently general that you can deserialize supported data formats into any Rust type that you like. By defining a recursive enum that captures the type system of the protocol you are deserializing from, it is possible to define a loose type that can represent any valid data from the given protocol. The best example of this is the
Value type from
[serde-json](https://docs.serde.rs/serde_json/enum.Value.html). Lets take a look at it now:
Value enum can represent any data structure that is valid JSON. Now there is one further consideration. JSON is an example of a self-describing protocol. That is to say, given some JSON, it is possible to know exactly how this maps to its loose Value type.
Bin_prot is an example of a non-self-describing binary protocol. It needs the type hints provided by the deserializer to know if it should, for example, interpret
Some(()) or perhaps the length of a string encoded in the following bytes. There is some missing information which we usually assume is provided by the destination type in Rust, but could in fact come from elsewhere as well.
Through some compile-time trickery, the Mina team was able to capture the structure of the data types in the OCaml code and write it to a file in JSON format. An example layout file looks as follows:
As you can probably see this describes a type that is a record with three fields: the first is an integer, the second is another record containing single boolean field, the third is a sum type (equivalent to a Rust enum) with two variants. This is a simple layout, but they can can be extremely complex. The layout for a Mina block is over 300,000 lines!
A layout file provides the missing type information that allows us to deserialize bin_prot encoded binary data into a loose
Value type. The
Value type for bin_prot is a little more complex than JSON as it also includes Option and Sum types. The general idea is the same though. Take a look:
Implementing the Deserializer
As we have established, the deserializer needs to accept both a layout and a binary reader, and use both together to interpret the binary data into the Value type above. We might define and construct such a deserializer as follows:
BinProtRule type is a recursive enum that can effectively capture the rules described above in JSON form. A
BinProtRuleIterator is a derived type that allows traversing the type tree defined by the layout and conforms to the Rust iterator trait.
At first glance, it might seem like a simple matter of traversing the type tree defined by the layout file and reading the given types from the binary reader. In actuality, it is slightly more complex due to the existence of Sum/Enum types, and variable length lists.
For Sum/Enum types we do not need to traverse each variant, only the one that exists in the serialized data. How do we know which one to traverse? We need to read the variant index from the binary reader (the first byte of serialized sum types is the index of the variant) and then pass this information back to the iterator so it knows how to proceed. A similar process applies for lists - we need to read the length of the list from the binary and then pass it back to the iterator so it knows to loop over the field element layout the correct number of times.
With some omissions for clarity, the implementation of the serde
Deserializer trait for our deserializer looks as follows:
There is a lot to take in here, but be sure to note:
deserialize_anyis implemented. This is the method called when no type hints from the destination type are provided.
For primitive types (char, float, int, bool, unit), the correct type can be read from the reader and then passed to the visitor. The custom
Valueensures that the nested Value data structure is created correctly on calls to
Sum, the values read from the reader are used to mutate the
layout_iterby repeating elements or selecting which rule to use for a variant.
You can view the full implementation here.
Putting it to use
Having implemented the above, we can use the standard serde API for deserialization. Check it out:
As long as we set the type of
result in line 17 to
Value, serde will correctly use the layout and deserialize to our loose type. Following the example of serde JSON we implemented
Value allowing us to do things like:
assert_eq!( result['second']['inner], false, ) // passes
Furthermore, some modifications were made to the deserializer to allow it to support a combination of strong and loose types. For example this allows for structs to be defined with fields of type
Value. The deserializer can correctly switch between and ensures that the layout is correctly followed:
Use in Mina-rs
The ability to write these partially defined types has added a powerful new tool to our repertoire. It allows us to iteratively implement new types in a top-down as well as a bottom-up approach.
Testing against a known layout generated from the OCaml code allows us to be confident that any changes made to the block type are consistent with the reference implementation. Furthermore, we can use these higher level types to make progress on other milestones (e.g. chain selection, block validation) while working on the nitty-gritty aspects of deserializing the low level types that are not yet required.
As we progress further in implementing the required types, the loose deserialization feature of serde-bin-prot will eventually no longer be required for Mina-rs. Once this milestone has been reached, we will be able to benefit from the low-cost deserialization provided by serde. In the meantime, it is an invaluable tool that has unblocked our progress on deserialization and allowed us to continue development with confidence.
What does the future look like? Nobody knows, but it starts with infrastructure. Join us at ChainSafe to build the brazen future. Check out the careers page of our website and our open positions, and get in touch with us at firstname.lastname@example.org if you are interested!
Also, be sure to check out and follow ChainSafe's Twitter and YouTube Channel. If you would like to get in contact with one of the Mina implementation team members from ChainSafe, feel free to drop by on our Discord.