R&D

(Yet another) Circle STARK tutorial

Dive into the intricacies of Circle STARKs, their efficiency over Mersenne 31 prime, and the challenges of FFT/FRI compatibility.

Timofey Yaluhin

Aug 8, 2024 • 15 min read

Recently, there has been a lot of excitement and interest around circle STARKs. This didn't pass me by either, and my interest was piqued. Not only by the loud "breakthrough" announcements and the prospects of unlocking verifiable computation on Bitcoin, but also by the subtle elegance of applied algebraic geometry to address limitations and simplify previous solutions.

At this point, we have several excellent introductory and explainer articles by zkSecurity, Vitalik, and LambdaClass. I have tried my best not to repeat their words and instead focus on the unexplored aspects and elements that I personally found very interesting. I wrote this as I was learning, so I cannot rule out inaccuracies. If you spot any mistakes, please comment away in this mirror on HackMD.

Advantages of circle STARKs over Mersenne 31 field

Circle STARKs enable the prover to work over the finite field modulo Mersenne 31 prime (M31), which has very efficient arithmetic. Namely, it is 1.3 times faster than BabyBear, the previous most efficient field used in SP1, Valida, and RISC Zero. For more insight on why working over M31 is so desirable, refer to this article.

However, simply plugging M31 into existing univariate STARKs won't work. The reason is that Mersenne primes are not FFT/FRI friendly. Let's unpack why.

The efficiency of the FFT "divide-and-conquer" algorithm relies on the ability to recursively divide the problem size by powers of 2. When working over a finite field, this requirement necessitates an evaluation domain to have the "structure" of a multiplicative group whose order's factorization contains a large power of two. We refer to this as high 2-adicity. The reason we want this is that at each step of the FFT, we reduce the size of the domain by a factor of two by squaring each number in it.

For example, consider the domain \([1,85,148,111,336,252,189,226]\) in the finite field of integers modulo \(337\). This domain forms a multiplicative group with \(8=2^3\) elements. The elements of this group are powers of \(ω=85\), which is a generator of the group. If you square every number in the domain, you reduce the size of the domain by a factor of two: \([1,148,111,336]\). If you take the same domain but with, say, 6 elements, you don't get this property anymore.

Another relevant way of framing the 2-adicity requirement is to search for a group whose order is a product of small primes. We call such numbers smooth numbers. Since the multiplicative group of a field includes every element except \(0\), the order of the group is exactly one less than the number of elements in the field. Fields \(F_p\) that satisfy this condition are referred to as \(p−1\) smooth.

An exemplary prime is the BabyBear prime \( p = 2^{31} - 2^{27} + 1 \). The largest multiplicative group in the BabyBear field has \( p - 1 = 2^{27} \cdot 3 \cdot 5 \) elements. You can clearly see that it is both smooth and highly 2-adic. It's perfect.

What about the Mersenne-31 prime \(p=2^{31}−1\)? Unfortunately, \(2^{31}-2\) can be divided by 2 only once. Thus, the conventional Cooley-Tukey FFT algorithm would be very inefficient for this field group. The authors solve this by devising a group from a unit circle.

Circle as a group

A circle group is generated from a point \((x,y)\) such that \( x^2 + y^2 = 1 \) (i.e., it lies on a unit circle) by applying a different multiplicative law:

\[ (x_0, y_0) \cdot (x_1, y_1) = (x_0 x_1 - y_0 y_1, x_0 \cdot y_1 + y_0 \cdot x_1) \]

Instead of generating subgroup elements as simply powers of a generator \(g\), we move from a point \((x_i,y_i)\) on the circle to the point:

\[ (x_{i+1}, y_{i+1}) = (g_x \cdot x_i - g_y \cdot y_i, g_x \cdot y_i + g_y \cdot x_i) \]

It turns out that the number of points lying on the circle, defined over the Mersenne prime \(2^{31}−1\), is quite large \((2^{31})\). One can generate all \(2^{31}\) group elements by starting with \((x_0,y_0)=(1,0)\) and applying the generator—

\[(g_x,g_y)=(2,1268011823)\]

—using the law above.

For the circle FFT/FRI, we need two more group operations: the group squaring map \(π\) and the inversion map \(J\).

Squaring is the quadratic map defined as:

\[pi(x,y) := (x,y) \cdot (x,y) = (x^2 - y^2, 2 \cdot x \cdot y) = (2 \cdot x^2 - 1, 2 \cdot x \cdot y)\]

This transformation reduces the set size by half.

The inverse is given by the degree-one map:

\[ J(x,y) := (x, -y) \]

This operation maps each point \((x,y)\) to its reflection \((x,−y)\).

Map \(J\) is an involution, i.e., \( J(J(P)) = P \).

Maps \(J\) and \(π\) commute, i.e.:
\[ pi(J(P)) = J(\pi(P)) \qquad \text{for every } P \in C(\mathbb{F}_p) \]

FFT over the circle domain

Same as in the classical STARK, the circle FFT is used to evaluate some polynomial on a special domain. In regular FFT, the domain consists of \(n\)-th roots of unity, i.e. \({ 1,ω,ω^2,…,ω^{n−1} }\). In circle FFT, the domain is a set of points on a circle curve, generated as follows.

First, we take a circle group generator \(g\) and square it log \(n−1\) times to create a subgroup \(G_{n−1}\). Then, we create two twin cosets, which are formed by taking an element \(Q\) that is not in \(G_n\) and creating two disjoint sets: \(Q \cdot G_{n−1}\) and \(Q^{-1} \cdot G_{n-1}\). The union of these sets forms the circle FFT domain containing \(2^n\) elements. A self-descriptive implementation can be found here in Plonky3.

This is an interactive plotter you can run yourself to demonstrate the distribution of domain points, which uses a simple TS implementation of the circle group over Mersenne-17 field. Even though it's modulo prime p, you can still see a regular circular patterns with symmetry about the line p/2. This phenomenon also exists in elliptic curves (genus 1), but is much more apparent on circle curves (genus 0) and complex roots of unity.

The FFT works by recursively splitting the larger problem into two smaller ones. In the context of polynomial evaluation, this means decomposing the polynomial into "even" and "odd" parts. In regular FFT, these sub-polynomials are simply formed from even and odd coefficients of the original polynomial. In circle FFT, this is a bit more involved, as we'll see, but the underlying mechanics is the same — the original polynomial \(f\) is a linear combination \(f(x)=f_{even}(x)+x \cdot f_{odd}(x)\). At each split, we also reduce the domain by a factor of two.

In the first step, we "halve" the domain by simply taking an \(x\) projection of each point. This is justified by the use of the inverse map when performing a decomposition \(f(x,y)=f_0(x)+y \cdot f_1(x)\) such that:

\[\vec{f}_0\left[\text{index}_{D_1}(x)\right] = \frac{\vec{f}\left[\text{index}_{D_0}(x,y)\right] + f\left[\text{index}_{D_0}(J(x,y))\right]}{2}\]

\[\vec{f}_1\left[\text{index}_{D_1}(x)\right] = \frac{\vec{f}\left[\text{index}_{D_0}(x,y)\right] - \vec{f}\left[\text{index}_{D_0}(J(x,y))\right]}{2\cdot y}\]

I have modified the notation to demonstrate how in practice we treat polynomials \(f,f_0,f_1\) as vectors of their coefficients. Compare the original notation here with Vitalik's implementation here.

You can think of the inverse map \(J(x,y)=(x,−y)\) as a way to identify vertically mirrored pairs of points so that they can be treated as a single entity. This allows us to proceed with only the \(x\) coordinate.

In later steps, we continue to halve the domain using the univariate squaring map \(π(x)=2 \cdot x^2−1\). This is analogous to squaring the \(k-th\) root of unity such that \(ω^{2k}=ω_k\). The even-odd decomposition (in the original notation) \(f_j(x) = f_{j+1}(\pi(x)) + x \cdot f_{j+1}(\pi(x))\) — now looks like this:

\[f_0(\pi(x))=\frac{f(x)+f(-x)}{2}\]

\[f_1(\pi(x))=\frac{f(x)+f(-x)}{2 \cdot x}\]

Although speeding up the FFT is not the main point of the paper, as author points out here, it serves as an effective demonstration of the core principles of working over the circle group. When studying the paper, I made the mistake of skipping it and rushing to circle FRI only to hit a wall of confusion. So, I encourage you to take some time to appreciate this mechanic. If you want to play around with it, I made this Python notebook (trigger warning: shitty code).

How Regular FRI Reduces Polynomial Degree

Structure of the FRI algorithm is a lot like FFT. But instead of recursively dividing the polynomial into many smaller ones, in FRI, the prover iteratively reduces the degree of a polynomial until it gets to some small constant-size one. It does so via random linear combination, i.e., by combining "even" and "odd" sub-polynomials against a random weight.

In STARK, we use FRI for something called "low-degree testing"—by knowing the final degree and the number of reduction steps, the verifier can work backwards and check that the degree bound of the original polynomial is as claimed. More formally, FRI enables an untrusted prover to convince a verifier that a committed vector is close to a Reed-Solomon codeword of degree \(d\) over the domain \(D\).

Here, being "close" is defined by the relative Hamming distance \(δ\) such that the number of points where the committed vector and the codeword disagree is at most \(δ⋅∣D∣\). The distance \(δ\) is such that \(δ∈(0,1−√ρ)\) where \(0<ρ<1\) is the rate of the Reed-Solomon code. In turn, the rate \(ρ\) is defined by the blow-up factor \(2^B,B≥1\) so that \(ρ=2^{−B}\). Finally, the domain size is \(∣D∣=2^{n+B}\) where \(2^n\) is the size of the original vector.

To make this definition more sensible, let's assign standard values: A commonly used blow-up factor is \(2^B=4\), so the rate is \(ρ=0.25\). The worst possible distance is \(δ=0.5\), so for a vector with 1024 elements, the codeword over a \(2^{B+n}=4096\) sized domain can disagree on at most half of the points. In practice, \(δ\) is much smaller to give better soundness guarantees.

Another interesting property is the decoding regime. In the unique decoding regime, the goal is to identify a single low-degree polynomial that is close to the committed vector. The unique decoding radius is typically defined as: \(\theta \in \left[0, \frac{1-\rho}{2}\right]\).

STARKs are shown to be sound outside this regime as the goal is to demonstrate the existence of such polynomials, even if multiple polynomials fit the given points. We refer to this simplified requirement as the list decoding regime. The list decoding radius is typically defined as: \(\theta \in \left[ \frac{1-\rho}{2}, 1-\sqrt{\rho} \right]\). HAB23

Low-Degree Testing Using Circle FRI Method

The general mechanics behind low-degree testing over the circle domain remain unchanged from the regular FRI. As a reminder, see this blog post for an in-depth theoretical discussion or this one if you only care about the implementation.

Circle FRI operates on codewords in the circle code \(C\), which is a generalization of the Reed-Solomon code defined over elements of the circle group and special polynomials in Riemann-Roch space. Oversimplified, they are bivariate polynomials modulo the unit circle equation \((x^2+y^2=1)\), so whenever a polynomial has \(y^2\), it's replaced by \(1−x^2\).

For a given proximity parameter \(\theta \in \left( 0, 1-\sqrt{\rho} \right)\), the interactive oracle proof of a function \(f:X→F^D\) (mapping the committed vector) being \(θ\)-close to the circle codeword \(C\) consists of \(r\) rounds of a commit phase and a subsequent query phase, which are as follows:

Commit phase

P decomposes \(f\) into \(f=g+λ⋅v_n\) and sends \(λ\) to \(V\)
\(V\) picks random weight \(λ_j\) for layer \(j\)
For each \(j=1,…,r, P\) decomposes \(g_{j−1}\) into "even" and "odd" parts; sends a commitment \([g_j]\) to \(V\)
In the last round, \(P\) sends \(g_{r+1}\) in plain.

1. \(P\) decomposes \(f\) into \(f=g+λ⋅v_n\) and sends \(λ\) to \(V\)

We start with the extended domain coset \(D_{n+B}\). First, we want to find the component of \(f\) that is aligned with \(v_n\). This can be done using vector projection: given two functions (or vectors) \(f\) and \(v_n\), the projection of \(f\) onto \(v_n\) is given by:

\[\text{proj}_{v_n}(f) = \frac{\langle f, v_n \rangle _D} {\langle v_n, v_n \rangle _D} \cdot v_n\]

Note: angle brackets denote the inner product \(⟨vn,f⟩_D=∑_{x∈D}v_n(x)⋅f(x)\).

Taking \(λ\) as the magnitude of this projection will ensure that \(λ⋅v_n\) is the part of \(f\) that lies along \(v_n\).

\[\lambda = \frac{\langle f, v_n \rangle_D} {\langle v_n, v_n \rangle_D}\]

The vanishing polynomial vn has an alternating behavior over the domain \(D\), e.g., if \(D\) has size \(N=2^n\), then vn alternates as \((1,−1,1,−1,…)\). This significantly simplifies the inner product calculations as \(v_n(x)∈\{1,−1\}\), each term \(v_n(x)^2=1\) so:

\[\langle v_n, v_n \rangle_D = \sum_{x \in D} 1 = |D| = N\]

Knowing \(λ\), we can now find \(g\), which represents the component of \(f\) that is orthogonal to \(v_n\).

\[g=f-\lambda \cdot v_n\]

This ensures that \(g\) lies entirely in the FFT space \(L'_N(F)\), orthogonal to \(v_n\). This is because the inner product \(⟨g,v_n⟩_{D}=0\), making \(g\) and \(v_n\) orthogonal by construction.

2. \(V\) picks random weight \(λ_j\) for layer \(j\)

In practice, though, \(P\) can compute \(λ_j\) as a random linear accumulator starting with \(λ\) and using a single random weight \(α∈E\) picked by \(V\):

\[λ_j=λ_{j-1}\cdotα+λ\]

See this done in Stwo.

3. For each \(j=1,…,r, P\) decomposes \(g_{j−1}\) into "even" and "odd" parts

The "even-odd" decomposition follows the same progression as in FFT. In the first round, we work with 2D points \((x,y)\) and use the full squaring \(π(x,y)\) and inverse \(J(x,y)\) maps. The inverse map transformation allows us to identify points \((x,y)\) and their reflection \((x,−y)\) so we can treat them as a single point \(x\) on the \(x\)-axis in subsequent rounds. The squaring map \(π\) transforms the domain \(D_{j−1}\) into \(D_j\) by effectively halving the space of points:

\[g_{j-1,0}\bigl(\pi_j(x)\bigr) = \frac{g_{j-1} + g_{j-1}\bigl(J(x)\bigr)}{2}\]

\[g_{j-1,1}\bigl(\pi_j(x)\bigr) = \frac{t_j , g_{j-1} - g_{j-1}\bigl(J(x)\bigr)}{2 \cdot t_j}\]

—where \(π(x)=2⋅x^2−1\) and \(t_j\) is a twiddle factor. Then, fold into a random linear combination against \(λ_j\):

\[g_j(x) = g_{j-1,0} + \lambda_j , g_{j-1,1}\]

Commitment \([g_j]\) is a simple Merkle root. Under the Fiat-Shamir transformation, \(P\) also sends openings, i.e., Merkle branches for FRI queries.

Query phase

1. \(V\) samples \(s≥1\) queries uniformly from the domain \(D\)

For each query \(Q\), \(V\) considers its "trace" as the sequence of points obtained by repeatedly applying the squaring map and the transformations defined by the protocol. The initial query point \(Q_0=Q\) is transformed through several rounds, resulting in a series of points \(Q_j\) in different domains \(D_j\):

2. For each \(j=1,…,r\), \(V\) asks for the values of the function \(g_j\) at a query point \(Q_j\) and its reflection \(T_j(Q_j)\)

Given the commitment \([g_j]\), as in oracle access, \(V\) can ask the oracle for the values (openings) at query points. In other words, verify Merkle proofs at the points:

\[g_j(Q_j) \text{ and } g_j(T_j(Q_j))\]

—where at \(j=1\), \(T_j=J\) and for \(j>1\), \(T_j(x)=−x\).

3. \(V\) checks that the returned values match the expected values according to the folding rules

This involves checking that the even and odd decompositions are correct and that the random linear combinations used to form \(g_j\) from \(g_{j−1}\) are consistent.

Understanding DEEP Quotienting in STARK and PLONK Systems

Quotienting is the fundamental polynomial identity check used in STARK and PLONKish systems. Leveraging the polynomial remainder theorem, it allows one to prove the value of a polynomial at a given point. Vitalik's overview of quotienting is well on point, so I will focus on the DEEP-FRI over the circle group.

DEEP (Domain Extension for Eliminating Pretenders) is a method for checking consistency between two polynomials by sampling a random point from a large domain. In layman's terms, it lets FRI be secure with fewer Merkle branches.

In STARK, we use the DEEP method on a relation between the constraint composition polynomial (one that combines all the constraints) and the trace column polynomials, all evaluated on a random point \(z\). For more context, see this post and the ethSTARK paper.

The first part is DEEP algebraic linking: It allows us to reduce the STARK circuit satisfiability checks (for many columns and many constraints) to a low-degree test (FRI) on single-point quotients. By itself, DEEP-ALI is insecure because the prover can cheat and send inconsistent evaluations on \(z\). We fix this with another instance of quotienting—DEEP quotienting.

We construct DEEP quotients with a single-point vanishing function \(v_z\) in the denominator to show that a certain polynomial, say a trace column \(t\), evaluates to the claim \(y_j\) at \(z\), i.e., \(\frac{t-t(z)}{v_z}\)

In classical STARKs, \(v_z\) is simply a line function \(X−z\). In circle STARKs, in addition to not having line functions, we also run into the problem that single-point (or any odd degree) vanishing polynomials don't exist. To get around this, we move to the complex extension, i.e., decomposing into real and imaginary parts:

\[\frac{t-t(z)}{v_z}=Re\left(\frac{t-t(z)}{v_z}\right)+i \cdot Im\left(\frac{t-t(z)}{v_z}\right)\]

Using with this quirk, we follow the standard procedure constructing a composite DEEP quotient polynomial as a random linear combination. We use different random weights for the real and imaginary parts:

\[Q = \sum_{i=0}^{L-1} \gamma^i \cdot Re_z(t_i) + \sum_{i=0}^{L-1} \gamma^{L+i} \cdot Im_z(t_i) = Re_z\Bigg(\sum_{i=0}^{L-1} \gamma^i \cdot t_i \Bigg) + \gamma^L \cdot Im_z\Bigg(\sum_{i=0}^{L-1} \gamma^i \cdot t_i \Bigg)\]

💡

In the ethSTARK paper, the prover does batching using \(L\) random weights \(γ_1,…,γ_L\) provided by the verifier (affine batching). Here, the prover uses powers of a single random weight \(γ^1,…,γ^L\) (parametric batching).

Computing \(Q\) naïvely will suffer overhead due to the complex extension, essentially doubling the work due to real and imaginary decompositions. The authors solve this by exploiting the linearity in the above equation. The prover now computes:

\[Q = \big( Re(\frac{1}{v_z}) + \gamma^L \cdot Im(\frac{1}{v_z}) \big) \cdot (\bar{g} - \bar{v}_z)=\frac{ Re(v_z)-z^L \cdot Im(v_z)}{ Re(v_z)^2 +Im(v_z)^2} \cdot (\bar{g}- \bar{v}_z)\]

—where \(\bar{g} = \sum_{k=0}^{L-1} \gamma_k \cdot g_k\) and \(\bar{v} = \sum_{k=0}^{L-1} \gamma_k \cdot g_k(z)\).

Interestingly, Stwo and Plonky3 actually use different quotienting approaches: Plonky3 implements the method described in the paper and here, but Stwo chooses instead to use a 2-point vanishing polynomial, as described in Vitalik's note.

Finite Field and Extension Work in Circle STARK

In circle STARK, work is done over the finite field \(F_p\) and its extension \(E=F_{p^k}\) where \(k\) is the extension degree. For \(F_p\), we choose M31, but any other \(p+1\) smooth prime will suffice. The extension field \(E\) used in Stwo and Plonky3 is QM31, a degree-4 extension of M31, aka. the secure field.

In rare cases, it will be useful to work with the complex extension \(F(i)\), the field that results from \(F\) by extending it with the imaginary unit \(i\). Note that \(i\) may trivially be contained in some fields, e.g., if \(−1≡4 \text { mod }5\) then \(i\) in \(F_5\) is both \(√4=2\) and \(3\) (since \(2^3 \text{ mod } 5≡4\)). For those fields where this isn't the case, such as \(F_3\), we can devise a quadratic extension, \(F_9=\left\{a+bi∣a,b∈F_3\right\}\), which extends the original field in the same way complex numbers extend rationals.

💡

A more common notation is \(F[X]/(X^2+1)\), which means forming a field extension where \(X^2+1=0\). This results in a new field with elements of the form \(a+bi\) where \(a,b∈F\) and \(i\) is a root of \(X^2+1\), such that \(i^2=−1\).

Trace values are over the base field and trace domain are points over the base field \(C(F)\).
The evaluation domain consists of points over the base field \(C(F)\).
All random challenges and weights for linear combinations are drawn from the secure field.
When computing DEEP quotients, we move from the base field to secure fields, while briefly using complex extensions for vanishing denominators.
Circle FRI works mainly on the secure field. However, query values are base field elements since we Merkle commit to values in the base field.

The results

That's cool and all, but let's see how all this translates into actual performance differences. Below I have profiling charts comparing Plonky3 with regular STARK over BabyBear and circle STARK over M31.

The first thing to note is that the difference in prover time is indeed about 1.3x. You can see that circle STARK spends less time committing to the trace, as it does so over a more efficient base field. Consequently, since computing quotients and FRI involves extension field work, these are now account for a larger percentage of the total work.

Thanks to Shahar Papini for feedback and discussion.

References

[HLP24] Haböck et al. (2024) "Circle STARKs"
[HAB23] Ulrich Haböck (2023) "A summary on the FRI low degree test"
[ethSTARK] StarkWare Team (2023) "ethSTARK Documentation — Version 1.2"