How to do Contract Upgrades, Feb 2024 edition

We want to allow developers to make their Aztec contracts upgradeable. This means they’ll be able to change the code and behavior of their contracts while keeping the same contract address and state. For example, a token contract might add a new mechanism for transfers while retaining all prior balances.

There’s two alternatives under consideration to achieve this:

  • delegatecall and proxies, which is how Ethereum upgrades work. These come with a number of known issues, and are discussed in detail here.
  • enshrining upgrades themselves, which unlike Ethereum we can do as we have not finalized the protocol. The overall approach and open questions are discussed in this other document.

This document provides an overview of both approaches to allow us to come to a decision. You can think of it as

TL;DR, how do we decide

delegatecall and Proxies.

Pros

  • A shared mutable storage solution (eg SlowJoe) for storing class ids is not enshrined, so we could migrate to a better solution in the future.
  • Contracts that do not require “concurrent” access (eg account contracts) can use vanilla private storage for storing their class ids, at the expense of an extra commitment and nullifer per call.
  • It’s a similar model to Ethereum, though said model is not well-liked or easy to understand.

Cons

  • Need to develop and enshrine fallback functions (3 flavors of it), just for upgrades.
  • Need to develop and enshrine delegatecall (3 flavors of it), just for upgrades.
  • Proxies will remain obscure, an existing problem in Ethereum.
  • Extra function call in all upgradeable contracts (longer proofs, more expensive txs)
  • We still need some kind of shared mutable storage, same as when enshrining upgrades, and get all the undesired side-effects (e.g. ‘downtime’ close to an upgrade).
  • Tooling will be complex in order to deal with obscurity (e.g. in block explorers) and detect upgradeability.

Open Questions

  • Can we get rid of top-level unconstrained contract functions? If so, we’d only have two execution modes (private and public).

Enshrined Upgrades

Pros

  • Easy upgrade setup - just call the upgrade opcode.
  • Simple upgradeability detection.
  • Minimal runtime overhead.

Cons

  • Account contracts leak the fact that they are account contracts on an upgrade (if they reuse contract classes).
  • Requires a new upgrade opcode (could be pushed for later if we are fine with no detection and broken SlowJoe invariants).
  • Enshrined SlowJoe means all calls to upgradeable contracts have a max_block_number, restricting long proof use cases.
  • We need to choose a global delay for upgrades, or make it customisable per contract which makes SlowJoe more complex.
  • Slightly more complexity on the Kernel circuit for dealing with upgradeable contract classes.

Open Questions

  • Can we provide better privacy for Account Contracts? e.g. hashing with a secret salt
  • How much do we enforce network-wide upgrade time coordination?
  • Can we get away with not adding a new opcode?
  • How do we mark non-upgradeable contracts, so that these don’t require the max_block_number check? e.g. a metadata flag in the preimage

Our Suggestion

Enshrined upgrades feels like the cleaner solution. The added complexity to the kernel circuit is not significant, and the AVM team has said that adding an upgrade opcode would not have a big impact. The opposite is true for proxies, where both the fallback function and delegatecall require changes to multiple components, and the code of the proxy itself is relatively tricky.

image

63 Likes

I agree that enshrined upgrades feel cleaner.

For marking (non) upgradable contracts, I’d vote for an upgradable flag in the contract instance preimage. Do we see a downside to that?

I also like the suggestion for an enshrined minimum D.

Do we have any thoughts about emergency updates that will just invalidate outstanding transactions?

35 Likes

We haven’t carefully considered all scenarios, but I don’t think so. You’d prove that this flag is set in private, therefore allowing you to interact with the contract without setting max_block_number. Public would likely require for this data to be available somewhere, but the public usecase is much simpler (because there’s no max_block_number, you can just read the current class id, etc.).

My personal thoughts are that ideally you would not have emergency updates, and instead have some other less impactful mechanism (e.g. a pause) that you’d use in an emergency scenario. An upgrade requires lots of careful analysis and testing, and is ill-suited for these purposes. I expand a bit more on this topic in this talk at the DeFi Security Summit.

That said, we don’t have a great design for immediate pausing either. Most designs seem to result in the contract that one interacts with being leaked, and tackling this with an anonimity set of ‘pausable contracts’ has some other issues.

27 Likes

Another great write-up, thanks!

Re one of the Enshrined Upgrades cons:

  • Slightly more complexity on the Kernel circuit for dealing with upgradeable contract classes.

Something that I haven’t got comfortable with yet, is the suggestion that the kernel would be able to directly read a smart contract’s state. It still feels ugly to my brain, and seems to violate the neat encapsulation that only a contract should be able to read its own state.

Perhaps we’re already violating this encapsulation in limited ways already:

  • e.g. enabling contracts to read historic state of any contract via the archive tree (tangential note to self: we need the pxe to prevent access to other contracts’ private states via the archive tree);
  • e.g. enabling the kernel to read the nullifier preimages that are created by the enshrined “contract deployment contracts”. Although, in this latter case, it’s the kernel reading the state of an enshrined contract, which feels tidier and acceptable.

Are there any soothing counterarguments that @nventuro or @spalladino can provide, that assuage the ugliness?

30 Likes

Well, to begin with the kernel won’t have access to all public state, only the three slots that we allocate for this. We could (should?) further constrain this by siloing them into a separate ‘system’ storage that would only be accessible by the kernel and not the contract (except via an upgrade opcode).

And as you point out, contracts can already access historical state, and everyone can access all public state: the only limitation is that contracts cannot access current state.


As a side note, I’m personally not a fan of the limitation of not being able to read external state: it prevents creating after-the-fact specialized getters, and makes it so that each contract is responsible for providing getters for all foreseeable usecases, at deployment time. Inevitably some are left out, resulting in hacks, excessive transaction costs,.

To provide a real life example, the Balancer Vault does not provide a getter for its reentrancy guard. Reading this value was considered unnecessary and made bytecode larger, and we were already at the bytecode size limit. Two years later, a vulnerability was found for which the only mitigation was to read the reentrancy guard. The fix is quite ugly: contracts basically try to trip the reentrancy guard, and if they detect a revert they conclude the guard was indeed set.

The only use case I know that is enabled by inaccessible state is paid access to price oracles, in which only certain approved contracts are allowed to call getters that compute token prices based on internal contract state.

54 Likes

As a thought experiment, what if we used an enshrined contract for managing upgrades? Instead of a contract “storing” its contract class in a specialized storage slot and having to implement custom upgrade logic, we could have a ContractUpgradeManager (ideally with a less Java-y name) that accepts an enqueue_upgrade call, and enforces proper writes to its slowjoe storage. Kernel would then only have to check storage in this enshrined contract, and the storage of each app contract remains just for app usage, without the need for “system storage”.

The main downside I can think of this is upgradeability detection. Any contract that has code to issue a call to an arbitrary address (like an Account contract or the SubscriptionManager) is potentially upgradeable. We could mitigate this at the aztec-nr library level: call_function can only be used in non-enshrined or non-precompile contracts, and a different method is needed for calls to special contracts, to prevent accidents.

11 Likes

IMO this only holds for public-land. Inaccessible state makes much more sense when you add privacy into the mix. The classic example is a token contract: you want to only grant access to its state via the getters that the developer designed, so you can only query the total balance of a user, but not see the individual notes that comprise that balance (which would allow a malicious dapp to reconstruct the transfers graph).

10 Likes

This would allow for false negatives, though a contract being somewhat sneakily upgradeable is not a huge issue. The cooperative case would work just fine.

Definitely, private state should remain private. It’s current public state not being accessible that seems odd to me.

14 Likes

Could we have a macro that users use to denote their function that will call the enshrined manager? Then we could explicitly check in the kernels if the enshrined manager is getting called that the encapsulating function was flagged. Might be expensive to implement.

Side note: another example here will likely be asserting that fees were paid.

12 Likes

This would work I believe, but I’d like to keep the same kernel logic for all calls for simplicity’s sake if we can.

14 Likes