[Proposal] Prover Coordination: Sidecar

Sidecar Proving Proposal

Summary

I propose that we facilitate a very basic commitment and slashing scheme to delegate and coordinate proving rights for a given Aztec block. It’s expected that the sequencer chooses among a variety of 3rd party proving quotes, facilitated via an out of protocol marketplace. The Prover gets decided and signaled to the community by a staked proverDeposit on L1, paid for by the current sequencer. Proving then runs as a “sidecar” to the protocol, eventually published to Ethereum for verification and finalization. In the event that the block doesn’t get proven, the collateralized proverDeposit will get slashed.

Design goals

  1. Enable the community to iterate on proving marketplace designs outside of the scope of Aztec governance or upgradability.
  2. Enable the network to dynamically adjust it’s proving capacity, e.g. if a large portion of the proving network stops working at block N, or many wish to join, any number of new provers can start working at block N+1.

Details

Expanding Fernet’s proposal phase

In order for this proposal to distribute clear incentives to participants, it suggests the sequencer should add a tip and percentage of block rewards that they will share with the prover of this particular block. The tip could be $ETH, or any ERC-20 token. These would be deposited and defined via the same L1 transaction, e.g. a proverTip, and proverRewardShare, in which sequencers put up block proposals. In the event that a sequencer with a higher VRF output does reveal their data and their block proposal does end up becoming canonical, the proverTip gets refunded.

Estimated duration: 3-5 Ethereum blocks

Prover Commitment Phase

This proposal suggests adding another phase, after Fernet’s proposal and reveal phases, called the Prover Commitment Phase. During the Prover Commitment Phase, it’s expected that the sequencer participates in an out of protocol decision making process, which would be provided via open source software that can be run ontop of sequencer software (sort of like mev-boost).

By the end of the Prover Commitment Phase, any sequencer that believes they have a chance to win block proposal rights must signal via an L1 transaction a particular Ethereum address (EOA or otherwise), called the proverAddress, and specify a proverDeposit in a particular ERC-20 token for the right to prove. The prover deposit is necessarily large, e.g. $1,000 - $10,000+ of an ERC-20 token, however the exact amount is tbd. This is due to the fact that the proverDeposit may practically becomes the cost to reorg or stall the network for the duration of the block (e.g. 10m), via the act of withholding or not producing proofs. For more information please refer to items #1 & #2 in the known issues and mitigations section.

Who pays the proverDeposit?

TLDR; the sequencer, given it’s their decision which marketplace or service provider to use.

The out of protocol decision making process would be up to the sequencers and opt-in. So, in a naive case, sequencers can build blocks of a size that they can self-prove during the proving window, and put down their own proverDeposits. It is likely that all clients would implement this as a fallback option, similar to geth and other proposer client implementations. It’s possible this is just a block of a handful of transactions, given proving costs. In a more sophisitcted case, sequencers can participate in an out-of-protocol first price auction, and make the participants of the auction put up the proverDeposits for the sequencers to publish to L1. It is expected that 3rd party proving marketplaces or services, such as Nil or Gevulot, compete directly in this process either via native integration, or a relayer.

To restate, as defined the current sequencer has discretion to decide who gets to prove the current block. This means that they have to put up the collateralized proverDeposit, which could get slashed if the sequencer makes a poor decision delegating proving rights. If they do not put up a proverDeposit by the end of the Prover Commitment Phase, they lose their block production rights to the next highest ranking sequencer who does.

In practice, I imagine a few categories or persona’s of sequencer operators, who would potentially handle this prover deposit differently. For a few examples: a larger entity would likely vertically integrate and self-prove the current block, and therefore put up the proverDeposit themselves. An average consumer sequencer may not be interested in taking further slashing risk and therefore may choose from a reasonable trusted list of service provers, relayers, etc. and require them to send the funds required to fulfill the proverDeposit out of protocol. A sophisticated sequencer may participate in the RFQ and due to their confidence in proof fulfillment take the slashing risk themselves and not worry about a proactive payment to cover the proverDeposit. It is likely there are other scenarios, but ultimately this should help paint the spectrum of options for paying the proverDeposit.

Estimated duration: 3-5 Ethereum blocks

Proving Phase

In this proposal, the actual “work of proving” happens outside of the Aztec network & protocol. Therefore there’s nothing else that needs to be done at the moment. The protocol simply waits a prespecified amount of time for the proof submission phase to begin.

Estimated duration: 40 Ethereum blocks (8 min)

Enforcing a maximum upper bound on proving times

One design consideration is whether or not to explicitly define the duration of the proving phase, or alternatively only enforce a maximum upper bound (e.g. 10 minutes). This would lead to variable or dynamical block sizes, proving durations, and block times. Which means the number of sequencers per day is no longer well known in advance, but rather can only be roughly estimated or approximated. Notably this may be aligned with other proposals and/or expectations already.

Proof Submission Phase

After the proofs have been completed, they are assumed to be shared via the L2 p2p network. Anyone can submit these proofs to L1 during the submission phase. It is expected that this will be done by either the sequencer, &/or the proving entity/network, however, this could be anyone. There is a token reward given to the first address to do so, to offset the L1 gas costs of submission. The specific quantity of the reward is tbd, but should reflect the cost of submission and the trouble of operating the relevant infrastructure.

Estimated duration: 3-5 Ethereum blocks

Notably the duration here is generally dependent on the ability to submit 4844 blobs. It is possible that allowing a prover to upload proofs as they go, i.e extending the proof submission phase or overlapping it with others, or some kind of proof batching may be necessary.

Rewards & Incentives

At a high level, rewards would work as follows: once the block has been validated on L1, the ERC-20 tokens (proverTip + proverRewardShare) that were “committed” to before during the proposal phase will be distributed via the Ethereum address defined earlier. The Sequencer will get a marjority of newly minted the block rewards, minus those that they’re sharing with the address that did the proving (which could be themselves, if vertically integrated). Lastly, there are additionally incentives to the ethereum address that published the completed proofs to L1, to offset submission costs Again this could be the randomly elected sequencer themselves, and in a very vertically integrated case, they could earn 100% of rewards and MEV for a block.

Estimated duration: 1 Ethereum block.

Total estimated block times (for finality): 48 - 72 Ethereum blocks

Another fancy diagram

Confirmation or finality rules

Users could potentially get different levels of “transaction status updates” via user interfaces, or broadly “finality guarantees” at various phases throughout this lifecycle. I think it can be generally understood as:

Executing

  1. Transaction is signed &/or executed locally

Sent &/or Processing

  1. Transaction is in the p2p mempool
  2. Transaction is included in a proposed block
  3. Transaction is in a revealed block
  4. Transaction is in a revealed block with a valid proving commitment
  • (reasonably long delay for proving…)
  1. Transaction is included in a completed proof on the L2 p2p network

Published &/or Verifying

  1. Transaction is published to L1
  2. Transaction is verified on L1

Finalized

  1. Transaction is finalized on L1

Users in this case could get application specific (depending on trust assumptions & user experience needs) notices of their transactions making progress within 30 seconds - 1 minute. Then further progress alerts via confirmation or increased probability of their transaction’s successfully finalizing throughout these other various points in the block production lifecycle, that may or may not be relevant. In extreme cases, users and application developers may want to wait until the rollup’s block has been verified “deep enough” in the Ethereum blockchain, to accmoodate for potential L1 reorgs, as well. e.g. 1-2 further epochs.

Comparisons

To existing Aztec proposals

Unlike Palla’s suggestion of a cooperative prover network, this design specifically attempts to create a competitive & dynamic economic marketplace for proving rights. This is also true for anon’s proposal, as well, which implements an in-protocol “bonded” auction. Generally, the design goals for Sidecar to ensure that the latest proving marketplace dynamics can be iterated on outside of the protocol, similar to MEV auctions, and can continue at the rapid pace of innovation within the industry.

To other projects

This design is similar to Taiko.

Some high level differences (among many others) between this & Taiko’s model include:

  • Sidecar does not (currently) attempt to define the specific interfaces that must be implemented by “Prover Pools” (or those participating in the prover commitment phase/out of protocol auction). This could be a nice addition, however, and is worth further consideration!
  • Sidecar suggests a generally different economic and incentive model. In Taiko, in the event that proofs are not completed 3/4 of their “bond” or insurance collateral is permanently burned. In this model, a specific staked proverDeposit is slashed and distributed to other network participants (dynamics of which are slightly tbd, but generally redistributed to the sequencers affected by the likely reorg that was caused).
  • Their documentation is worth reading.

Outstanding questions + known issues & mitigations

  1. Should the right to prove be all or nothing?
  • Currently the RFQ specifically delegates the entirety of proving to a single ethereum address (i.e prover or proving marketplace). It seems potentially desirable to be able to dynamically configure the portions of the proof tree (or subtree) that individual “provers” work on. But again, this could be done out of the protocol…
    • If going down this path, it becomes significantly closer to a cooperative proving network or the orignal B52 design of providing a block transcript. This leads to more coordination complexity and potential feasability of a withholding attack.
      • I would consider this a viable candidate for an alternative proposal.
  1. High price of proverDeposits
  • This is related to the above issue, as well. Currently this proposal delegates 100% of a block’s proving rights to a single ethereum address. It is then up to that entity to out of protocol further divide and conquor the necessary work, have a very large set of machines that can produce the proofs necessary, or alternatively, build very small blocks. One option to reduce this is to standardize the size of a block and the number of subtrees that need to be proven, e.g N subtrees would require N commitments by the end of the prover commitment phrase, reducing the cost by the same number. However, as I currently understand, this does increase cooridination complexity and the feasability of a (malicious or unintentional) withholding attack that could cause a reorg. Therefore I currently think a single out of protocol delegation as currently defined may be sufficient to begin with, and hope others iterate on the initial idea.
  1. Explicitly defining what happens in the event of a reorg.
  • The short answer is that the network would invalidate all blocks that were proposed during the reorgWindow and start a new proposal phase. The sequencers affected would be proportionately distributed the slashed stake, from the proverDeposit provided by the sequencer who caused the reorg.
  • TO DO: Need to spend some time on diagrams and explaining the possible scenarios.
  1. Impact on upgrade mechanisms
  • This design intends to ensure that the proving systems and marketplaces can continue iterating outside of the scope of the Aztec protocol and therefore the Aztec upgrade mechanism. Which is great and should satisfy that specific requirement. However, in the event that Aztec chooses to implement an upgrade mechanism like the republic or the return of the jedi I’d highlight that the “prover RFQ &/or commitment phase” should remains somewhat upgradable (as much as possible, likely via a separate contract referenced in the version registry). This is in the event that out of protocol proving RFQ’s have unintended or unknown consequences, cryptography innovation slows down, or it becomes desirable to further enshrine for whatever hypothetical reason (e.g. network and incentive alignment).
  1. Is it worth normalizing &/or predefining block times?
  • As discussed briefly above, “the protocol could not define the duration of the proof submission phase but rather a maximum upper bound. This would lead to variable or dynamical block sizes, proving durations, and therefore variable block times.” There are some tradeoffs here worth further consideration. Generally, dynamic blocks could lead to smaller blocks that could be produced quicker, and therefore potentially provide better user experience. It additionally may better reflect the ability to dynamically adjust and scale compute resources. Enforcing via an upper bound seems like a reasonable solution, and application developers can rely on that as a fallback/guarantee of when to query the network and update their state. This is something that should likely be considered in network simulations or modeling.
  • Another valid design consideration would be to remove preconfirmations, and have new proposal phases begin immediately after the last block is confirmed.
  1. The obvious question … enshrine the marketplace?
  • This design is predicated on some kind of out of protocol marketplace &/or decision making process for the sequencer. This proposal specifically does not attempt to design a marketplace! But many will likely ask, why not? And how do you know an out of protocol marketplace will be sufficient? My answer to this generally is… I don’t currently know what a good marketplace design would be, and if we start working towards implementing sidecar, we can continue working on marketplace designs all the way up to mainnet (or beyond). We follow the spirit of Vitalik’s “minimum viable enshrinment”.
  1. Factoring proverDeposits into sequencer scores
  • The currrent sequencer is ranked based on the RNG they generated via the VRF output in Fernet. It could make sense to consider the proverDeposit as a ranking/scoring factor as well, to enable a more competitive marketplace for the rights to block production, therefore potentially increasing the cost of stalling or reorging the network.

References

  1. https://boost.flashbots.net/
  2. https://nil.foundation/
  3. https://www.gevulot.com/
  4. [Proposal] Cooperative proving network for Fernet
  5. [Proposal] Provers: Bonded Prover Auction
  6. Proving Taiko blocks – Taiko
  7. Request for Proposals: Upgrade Mechanisms
  8. The Republic: A Flexible, Optional Governance Proposal with Self-Governed Portals
  9. The Return Of The Jedi
68 Likes

I like the proposal in that it keeps the protocol itself super simple. Two questions/concerns from me:

  • How does the integration with a marketplace work? Given there’s custom logic required for the marketplace to put down the deposit, feels like the marketplace itself should be handling the integration. Or do we expect third parties to act as intermediaries, acting as the glue between Aztec and the marketplace, and putting down the deposit themselves?

  • Wouldn’t this lead to prover centralization? Even if there are multiple integrations implemented, the proving marketplace with the lowest price will be consistently chosen by sequencers, which means any other marketplaces won’t have incentives to keep these integrations running and up to date. And if the single winning marketplace is not 100% decentralised and it suddenly decides to censor Aztec, we got a problem in our hands.

72 Likes

Having sequencers set their prover prices ahead of time requires them to have sophisticated market pricing information. This causes centralization pressure.

Having sequencers pay the proverDeposit means they have to run their own auction. This is a sophisticated activity, and causes centralization pressure.

Requiring two L1 txns from sequencers will exacerbate the Fernet L1 gazumping problem.

Because the prover has nothing at stake (in-protocol), the cost to bribe them not to produce a proof is only proverTip + proverReward, rather than the desired proverDeposit. Edit: This will be handled by the sequencer during their auction as a sophisticated activity.

You have to burn it, otherwise colluding sequencers can reorg for less than proverDeposit.

This would create centralization pressure on sequencers.

75 Likes

Sidecar Prover Coordination & Gevulot

We on the Gevulot(.com) team wholeheartedly agree with Cooper’s. It is impossible to control where proving actually takes place. In any prover network nodes could simply be using the same centralized API in the backend. Simultaneously reliance on any single solution, enshrined or external, is risky. Given this dynamic, we support Cooper’s proposal to not attempt to dictate how proof generation happens but rather create economic incentives and disincentives for participants to generate proofs as they see fit.

We would also encourage providing reasonable default options to ensure minimal cognitive overhead for sequencers and propose Gevulot as one such option (but it should not be the only one).

In the rest of this post, we’ll dive into what Gevulot is, how it can provide cheap and performant decentralized proving for Aztec, and our timelines going forward.

What is Gevulot?

Gevulot is a generic decentralized prover network. More specifically it is a new L1, which allows users to permissionlessly deploy arbitrary provers and verifiers as on-chain programs and run them to complete proving workloads, while still being cost-effective and performant enough to genuinely compete with centralized options.

In Gevulot, prover programs are only executed by a small, configurable subset of prover nodes and later verified by a subset of validators (primarily for the purpose of distributing rewards). The programs themselves are ELF binaries and run in a unikernel. More specifically Gevulot uses the Nanos unikernel, which supports 8+ languages (incl. Rust & C++), multithreading, CPU & GPU (support for new hardware such as FPGAs or ASICs can be added later), all in a highly performant sandboxed environment.

You can read more about the network design and find instructions for compiling and running a prover in Nanos here. We provide example provers for Marlin, Ed25519 and Filecoin SNARKs.

Gevulot Roadmap

Q1 2024 - Permissioned Devnet

Coinciding with Aztec’s plans to have decentralized sequencing on testnet by Q1, we will be launching a permissioned devnet for Gevulot. The devnet will operate with 4-5 nodes run by prominent ZK acceleration teams, validators, and the Gevulot core team.

The purpose of the testnet is to:

a) Facilitate testing of the core functionality, including proving, verifying, networking, orchestration, etc.
b) Allow external teams and projects to begin integration and testing.
c) Benchmark and optimize performance.

The devnet will allow users to:

  1. Deploy arbitrary provers and verifiers.
  2. Run proving workloads to generate proofs.
  3. Track workload status, cancel and retrieve outputs via an API.
  4. Store proofs.

Q3-Q4 2024 - Permissionless Testnet

We are aiming to launch a permissionless testnet during the second half of 2024, which will include all core functionality for validation, proving, and the incentives therein, in permissionless form. You can refer to the gevulot-overview.md document on our github regarding the planned scope for testnet.

TBD 2025 - Mainnet

We are targeting a mainnet launch in 2025.

Gevulot & Aztec

The first step to integrating into the Aztec network will be to deploy the finalized Aztec prover on Gevulot’s devnet, followed by granting permissions to sequencers.

Generating proofs for Aztec requires both network data and a financial stake. Gevulot simply generates proofs with the delivered inputs, so outsourcing proof generation to Gevulot can manifest in two primary ways:

  1. The sequencers themselves stake, outsource the proof generation to Gevulot and then post the completed proof to L1.
  2. An intermediary entity operates a proxy, which offers the outsourcing of proof generation to Gevulot and then posts the completed proof to L1.

We recommend the first option, where outsourcing to Gevulot could eventually be an option implemented directly in the sequencer node along with other options. However, the inherent permissionless structure of our system means that the community can also choose to embrace the second option and deploy proxy nodes for this purpose. If this is the direction the community wants to go, we are happy to help in implementing such a proxy node.

Incentives and Slashing

Gevulot does not affect Aztec incentives and those can be configured independently by the Aztec Network.

From the perspective of Aztec, the party taking responsibility for proving would be the sequencer or the proxy node (who can be slashed and rewarded), who in turn is relying on Gevulot’s incentives, which are described below.

Gevulot prover nodes will be staked and we employ a number of incentive mechanisms to ensure nodes perform their tasks in a timely manner:

  1. Slashing for liveness failures: In Gevulot, users define the maximum running time of a program. If the program does not produce a proof during that running time, the provers get no reward and fees are burned, but they do not get slashed. They only get slashed if they do not respond at all.

  2. Rewards for completing proofs: In Gevulot provers are always rewarded if they complete a proof in the running time specified by the user, both via network rewards as well as user fees. In executing a proving workload the user defines the desired amount of redundancy (how many provers will run the program) and the running time. The desired amount of provers is then chosen at random from the active prover set, who compete to complete the workload. The fastest prover gets the largest reward, the second fastest a slightly smaller reward, and so on. Distributing the workload randomly and rewarding all provers that manage to contribute a proof in the specified running time in this way, is extremely important to curb the natural tendency towards centralization.

  3. Active prover set dynamics: In addition to the reward structures described above we will likely be implementing certain rules for provers who want to remain in the active prover set. We will be using a kind of proof-of-work that provers will need to complete periodically in order to join or remain in the prover set. This proof of work would test for whether the provers are running the hardware they claim to be running and are consistently available. For example, a prover that wants to join the prover set and says they have a GPU that meets the requirements would need to complete a proof which can only be achieved in the time constraints if they actually have a GPU. Here again, if a prover fails the proof-of-work, they cannot join or will be dropped out of the prover set and have a cooldown period before they can unlock their stake.

Resilience and Recovery

From Aztec’s perspective the best way to ensure resilience and recover is to allow for a wide range of proving options, but slash parties who are unable to complete proofs. This will naturally drive participants to find the right amount of redundancy for resilience. Participants can then choose to implement this redundancy on an Aztec level, by choosing multiple parties/services to generate a proof or on Gevulot by increasing redundancy.

In terms of Gevulot’s recovery ability, we are designing the system to always generate a proof if it is possible within the time constraints provided by the user (running time of the program). If a prover declines a workload, that user’s spot in the workload allocation becomes a free-for-all where any prover node can then submit a proof in that prover’s place. This ensures that the end-user always gets the redundancy they pay for or more.

77 Likes

Your post’s concept kinda reminds me of Nakamoto’s winner-take-all model in Bitcoin, where participants compete to solve cryptographic puzzles.

The main advantage, as @spalladino highlighted, is simplicity imo. Competitive proving is straightforward to implement. Provers just need to submit the proof, and the validating bridge is just waiting for the proof to be computed → accept it and then go to the next round.

However, this approach presents some challenges imo:

Centralization Risks i.e without incorporating randomness, there’s a tendency toward centralization. In Bitcoin, due to the puzzle’s randomness, your odds of solving it roughly correlate with the hash power you control in the network. In contrast, in a proof system where the most efficient prover always wins, “if you ain’t first you are last”. Long term, if the same prover consistently wins, it could disincentivize others (why would they keep burning energy), leading to a very limited number of provers in the network.

Limitations with Sequential Proving i.e sequential proving has inherent challenges. If the proving time exceeds block time, then time to finality can increase superlinearly with block height. With the number of blocks growing, you might end up with a system that is slower and slower in terms of finalization (imagine becoming slower than optimistic rollups to finalize :smiling_face_with_tear:). Furthermore, there’s no straightforward method to boost execution capacity during high-demand periods.

One potential solution is, rather than sequentially assigning a single prover and waiting for the proof submission before moving to the next, you could simultaneously assign multiple provers to different blocks. They can then compute their proofs in parallel and, once done, aggregate these proofs to put them back on chain/Ethereum collectively.
Note that implementing a fallback mechanism in this approach could be relevant.

Lastly, while cooperative proving can enhance parallelization, the feasibility of doing so with competitive proving seems more complex/uncertain.

69 Likes