BLS Validating Training Wheels

Training Wheels

Software is known to have bugs. A decentralised network must have mechanisms to mitigate, respond and fix bugs – or it will struggle to gain adoption.

For L2’s with bridged funds, this presents an especially difficult challenge – the source of truth for assets is L1. If a bug is exploited and the funds leave the L1 portal contract, the rollup has no claim on them.

This problem is compounded when upgrades occur via public social consensus – the bug must be disclosed publicly, before it can be patched.

This document proposes adding validating training wheels to the rollup, via a signature scheme to give greater confidence in the security of the network in the first 12 months, especially while the zk proving system is not deemed to be secure.

BLS Validating Nodes

Aztec is a ZK rollup, it uses a zero-knowledge proof to verify a state transition.

Bugs in the circuit logic or proving system are exploitable to produce a mismatch between the correct state transition and the proof being made – an attacker can supply a valid proof for an invalid state transition.

To mitigate the effects of bugs in the cryptography, the block transition function on L1, can be modified to also perform a BLS signature aggregation check from an honest majority of stakers. Effectively add a type of consensus that can only agree or disagree with the ZK proving system.

BLS attestations will catch bugs in the proving system and rollup circuits and public kernel, but will not catch bugs in the private kernel circuit – these run on a users device.

Stakers should compute the state transition for a block by processing each transaction in order to the protocol specification (later on by processing the ACIR code of the rollup / public transaction circuits). A diverse set of execution clients should exist to increase security here. NB these are not proving clients, just execution clients.

Stakers attest to their computed state transition via a signature. This signature uses a homomorphically additive signature scheme like BLS for aggregation. The sequencer will collect attestations and aggregate them for transmission with a block.

Additional checks are performed by the L1 rollup contract to validate a block:

  1. A signature aggregation check using BLS. Stakers sign over the block transition.
  2. A check that the resultant signature is from more than 50% of the staked signatures
  3. Validation of the zk proof as normal

Estimated Gas Costs

Geometry defined a trick to reduce the gas cost of BLS verification to a fixed gas cost to verify a signature of ~16k gas + 150 gas per signer. This is achieved by pre-computing parts of the aggregated signing key to save computation at verification time. The aggregated key is pre-computed when stakers enter or exit the system.

Assuming additional EVM overhead to the quoted numbers, a conservative gas cost for an aggregated signature of 1000 signers is ~1,000,000 gas.

This cost can be subsidised by higher block rewards while the validating training wheels are active.

Prior to network launch, the foundation will declare the minimum staking thresholds required to activate the network for launch.

Example Staking Launch Criteria

Stake required

Main-net = 32 ETH equivalent
Canary Network = 5 Eth equivalent

Main-net = 1000 validators
Canary network = 100 validators

For main-net this means the cost of an attack will require ~$32M assuming an honest majority.

The additional checks imposed by BLS validating training wheels should only be active for a period of 1 year. Normal upgrades can optionally increase this time limit at the communities discretion.

Processing Valid Blocks

The following scenarios are possible:

  1. If the staking network signature is invalid, the bock is invalid.
  2. If the staking network signature is valid, but less than X% of the stake, the block is invalid.
  3. It the staking network signature is valid, greater than X% of the stake, and the zk proof is invalid, the block is invalid.
  4. It the staking network signature is valid, greater than X% of the stake, and the zk proof is valid, the block is valid.

If no valid block is produced for X hours, the network will enter emergency mode.

Emergency Mode

Whilst the rollup is in emergency mode the following measures are activated:

  1. The validation threshold for signatures for a valid block is increased e.g from 50% to 75%.
  2. An emergency upgrade path is enabled, detailed below.
  3. Any active upgrade proposals can not proceed until the rollup exits emergency mode.

If block production resumes, the rollup will exit emergency mode, but the emergency upgrade path will remain active.

Emergency Upgrade Path Details

Emergency Upgrade Path

The fast upgrade path is baked into the L1 contracts and allows for:

  1. Bypassing the normal upgrade path for critical bugs while emergency mode is active. Upgrades can change:
    • Rollup Logic
    • Verifier Address
    • Token Address
  2. Automatic de-activation after 7 days hard coded
  3. Pausing the rollup

Triggering a fast upgrade is governed by a faster consensus group than the slow upgrade path e.g:

  • a multisig
  • a security council with soulbound tokens
  • a permissioned validator set
  • tbc

Tradeoffs

Negatives

Forcing a majority of stakers to re-compute the state transition for each block, will harm network throughput and create redundant work.

Further research must be done on the effect this will have on throughput, it should not be a permanent fixture.

Positives

A bug in the rollup circuit or proving system can not be exploited, without acquiring a majority of the stake in this model.

Emergency mode can only be enabled by a bug not at will of something like a multisig, making it more resilient to:

  • wrench attacks
  • regulatory changes

Mitigation for bug types

Type Code Impact Exploit Timescale BLS Training Wheels Mitigation
Message Boxes Solidity Loss of L1 Funds Hours :x:
Rollup Solidity bug Solidity Total Loss of Funds + Liveness Failure Hours :x:
Kernel Circuit logic bug Noir / Barrentenberg Total Loss of Funds + Liveness Failure 1 - 30 Days :thinking:
Verifier bug Solidity Total Loss of Funds + Liveness 1 - 30 Days :white_check_mark:
Rollup Circuit Bug Barrentenburg / Noir Total Loss of Funds + Liveness Failure 1 - 30 days :white_check_mark:
Public Kernel Circuit Bug Barrentenburg / Noir Loss of Funds + Liveness Failure 1 - 30 days :white_check_mark:
73 Likes

Love the idea of having an extra layer of security and I like the proposal. Still, for the sake of it, should we explore an approach where the stakers need to submit a BLS signature only in the event of an error?

What I’m thinking is setting up a finalization delay, like any optimistic network has (possibly a bit lower though), in which stakers can submit a vote for blocking what they recognize as an invalid transition (instead of a fraud proof, which is more difficult to implement). This vote would trigger the emergency mode in which an update with a patch can be rolled out.

Would it be acceptable to introduce this delay during the first months of the network? I reckon that it’ll hurt integrations with L1, but it does not incur in any on-chain protocol overhead on the happy path.


After writing the above, I’m still not convinced about the delay, and I think your proposal is definitely better. I’m posting this anyway in case it spurs any other ideas.

69 Likes

I was thinking about adding a delay when I wrote this, but you loose the nice properties we currently have on portals being able to execute messages from L2 quickly with an optimistic style delay. I think this is dealbreaker IMO as it makes the UX way worse.

The one benefit of the delay approach though is it would be able to operate at a higher throughput. With the current validating nodes, you will be capped at ETH L1 style throughput while the training wheels are on as everyone has to execute everything…

74 Likes

How could we reduce the likelihood of having to engage emergency mode?

Like an intermediary mechanism between “no valid blocks are produced” and emergency mode activation. For instance, if the staking network rejects the zk proof from the elected prover, couldn’t a different prover step in to present another proof that the staking network might accept? (What could incentivize this prover to intervene – is altruism enough ser?)
The network could then progress with the new state update.

This could serve as a preliminary step before the activation of emergency mode, which, while beneficial, should be reserved for extremely critical situations (imho) such as a scenario where multiple provers persistently submit proofs that represent an invalid state transition.

To deter stakers from engaging in malicious activities (for example, signing without processing each transaction), what are the negative incentives imposed on stakers e.g slashing conditions?

That’s pretty much what I had in mind so far.

And yea, just wanted to mention that the fact that the activation of emergency mode can only be triggered by a bug and cannot be intentionally influenced by something like a multisig is remarkable.

65 Likes

Thanks for reading!

Emergency mode would only engage if there had been several hours of missed slots, not on the first failure. Both of the leading sequencer selection proposals (Fernet and B52) define slot lengths of 60 - 600 seconds. A valid block must be presented in this time or the slot starts again.

In the case where the BLS Validating training wheels and the ZK proof disagree, the network would miss a slot, resulting in a slight delay to user transaction confirmation times, but the network should resume processing blocks. If the network still can’t process blocks for several hours emergency mode would be enabled as the situation is more serious.

I can update the post to make this more clear, as I think its a great call out. Thanks!

For slashing conditions, this is more nuanced and needs further research, initial thoughts are:

Forcing stakers to attest that the supplied ZK proof is for the state transition they calculated, not just what the state transition is. Stakers could then be slashed if the L1 contract does not process a block as their attestation should have caught this.
(This would introduce additional latency and coordination issues as the builder would have to socialise the block proof to all attesters for signing and collect the signatures after block production.)

69 Likes

Could you combine BLS with non-governance so that non-governance social consensus could remove BLS training wheels at any time? (subject to 30 day time-delay)

2 Likes

Sorry for the delay here. Yes you could combine this with any other governance system, to have it be removed.

1 Like