Issues with logs

Jan · December 5, 2023, 4:11pm

Unencrypted log issues

Currently the unencrypted logs are vulnerable to “source contract impersonation attack”. That is currently we don’t verify in the kernels that the contract address included in the logs is the address of a contract which actually emitted the logs and this makes it possible for a contract to emit a log on behalf of another one.

Proposed solution

A solution to this problem is simple but it will require us to to refactor how the logs are hashed. We currently hash the logs in app circuit and then we feed only a hash of the logs to the kernel. Then in each kernel iteration we accumulate the hash by computing sha256(previous_logs_hash, current_logs_hash) (relevant piece of code here). Then the logs get published along with an L2 block on L1 and there the Decoder library is used to recover the final logs hash of all the txs by once again performing the hashing of both the app circuit and the kernel circuits.

With this scheme sequencer is forced to make the logs available on-chain because now it is one of the conditions of block’s validity.

Now that we have the need to access the logs inside kernel circuits to verify the kernel address we will need to move the initial hashing out of the app circuit to the kernel circuit. This will require us to modify PrivateCircuitPublicInputs and treat the logs similarly to commitments and nullifiers.

The following:

struct PrivateCircuitPublicInputs {
	...
	unencrypted_logs_hash: [Field; NUM_FIELDS_PER_SHA256],
	unencrypted_log_preimages_length: Field,
	...
}

will become:

struct PrivateCircuitPublicInputs {
	...
	unencrypted_logs: [Field; MAX_NEW_LOGS_PER_CALL],
	...
}

Once this change is done we can check the address in a log against private_call_public_inputs.call_context.storage_contract_address.

Encrypted log issues

If we decide to generalize the encrypted logs to contain non-note logs then the issue from above applies to non-note encrypted logs as well. Since the logs are encrypted we can’t easily check the included contract address. One way to address this is to prefix each log with a hash of contract address with randomness. This randomness would then be fed as private input to the kernel circuit and there the private kernel would check that hash(private_call_public_inputs.call_context.storage_contract_address, randomness) matches the log prefix. Once pixie would obtain the log it would decrypt it and perform the same check. Since pixie needs to get a hold of the randomness the randomness would have to be part of the encrypted log.

We would need to inform a kernel circuit and a pixie whether the log currently being handled is note or not (this verification would not happen for notes). We could tackle this by prefixing the log with a boolean value.

A downside of this solution is that it would significantly increase the size of the encrypted log. On Ethereum logs are very cheap and I think our non-note logs will not get used much unless we use a very cheap data availability solution.

Note that it is desirable to support non-note encrypted logs not only for feature parity between private and public contexts but also because it would allow us to use event macros for notes which would be very elegant (see Rahul’s offsite summary) because it would allow us to robustly generate compute_note_hash_and_nullfiier function.

spalladino · December 5, 2023, 5:12pm

Then in each kernel iteration we accumulate the hash by computing sha256(previous_logs_hash, current_logs_hash) (relevant piece of code here).

Just curious: are we happy with this solution? Or is it too costly, in terms of L1 gas and proving time? Does it make sense to look for alternatives?

If we keep it, maybe we could keep the current scheme where app circuits just emit the hash of the log, but then that hash gets mixed in with the contract address. So instead of sha256(previous_logs_hash, current_logs_hash), we could do sha256(previous_logs_hash, current_logs_emitter_address, current_logs_hash). This would require fewer changes on the app circuit, and not enforce a protocol limit on the number of logs emitted.

Should we wait until we make a decision on keys? If a user ends up having a different encryption key for each contract, then that would provide the info of what contract emitted the encrypted log, assuming it’s enforced by the kernel.

Why is the randomness needed, if the entire contents of the note are encrypted?

Heads up we may need to include more “metadata” in a log, see this issue.

Jan · December 5, 2023, 7:00pm

I think that this largely depends on having a cheap data availability solution. I would expect EIP-4844 and calldata to be too expensive for non-financial use of logs.

How would then pixie verify that the contract address stored in log is legit? Would you somehow try to reproduce the logs hash there?

Will defer to Mike here as I don’t have enough context on the keys.

I didn’t explain this enough. The idea was that the prefixed hash would not be encrypted but it would simply be a hash of the randomness with contract address in unencrypted form. The randomness here removes the need for encryption.

Mike · December 6, 2023, 5:10pm

Thanks for this clear explanation Jan!

Preventing impersonation of unencrypted logs

What we do today:

I’m just repeating what you’ve already said, but it helped my brain

Suppose the app wants to output a log of 4 fields (or strings which are convertible to fields):

[ "hello0", "hello2", "hello3", "hello4" ]

The app sha256-hashes these fields:

unencrypted_logs_hash = sha256([ "hello0", "hello2", "hello3", "hello4" ]);
unencrypted_log_preimage_length = 4;

It passes these fields to the kernel (via the app circuit’s public inputs), and the kernel computes:

end_unencrypted_logs_hash = sha256(
    previous_unencrypted_logs_hash,
    current_unencrypted_logs_hash
);

Danger: it’s a separate topic from this thread, but the way the preimage lengths are being accumulated, by simply summing them, might be dangerous. Information is lost about the length of each individual log. So I having a feeling this could be subject to something akin to length extension attacks. We should probably encode each length in the end_unencrypted_logs_hash computation, somehow.
This might be my fault from the initial logs forum post many months ago.
@Jan how does our L1 rollup contract know the individual log preimage lengths, in order to know how to hash-together all of the log data on L1? Is an encoding of each individual log preimage length submitted to L1? I suspect this encoding might not be constrained sufficiently.

So that’s what we do today.

This was the original journey that led to the current design of lots of sha256 hashing:

The final zk-snark that’s submitted to L1 must have a single public input (because this greatly reduces the cost of on-chain verification). The only way to achieve this is the hash the data that would have been part of the final snark’s public inputs, had we been able to include them all as public inputs. Keccak and sha256 are the only cheap L1 hashes. We could use keccak, but sha256 is slightly fewer constraints in a snark.

In particular, let’s focus on log data from each tx (and ignore state data, etc). Why do we also need to hash the log data within a snark? Well, each layer of the rollup uses the same merge rollup circuit. It’s fixed in the number of constraints, so all data going in and out must be the same size as all other layers. At the very bottom of the “rollup tree” (the base rollup circuit layer), there are an unknown number of txs (because are rollups can be flexible in how big they are), let’s say t txs. Each tx might wish to emit l log fields. That’s t * l log fields that need to get squished into a single public input by the time the final snark is computed.

Now, perhaps the sequencer should do all of this log hashing (somehow) instead of the user doing it, because the sequencer (or rather the prover) has more compute power. That’s probably possible, if we created many different-sized ‘log hashing’ gadget circuits, that solely hash logs of varying length, for each tx.

But how would we get all of that (variable-length) log preimage data to the sequencer, and how would we constrain the sequencer to use the correct log preimage data in their hash computation? The private kernel proof coming from the user would need to somehow constrain the sequencer to use the correct log preimage data.

But the only way I can think to do that is the have the user prove the hashing of the log preimage data on their own device, to prove what the resulting hash should be. And so there’s no point in the sequencer doing the hashing in a sequencer circuit at all. The user’s hashing of the logs could either be in the kernel circuit or the app circuit. It’s more flexible if it’s done by app circuits, so that the logs may vary in size.

Now, what we could do, to reduce sha256 hashing costs for the user, is allow logs to be pushed to the data bus. The kernel circuit could read these logs from the bus, and somehow pass an ever-growing ‘array’ of logs, unhashed, to each iteration of the kernel circuit, eventually passing this large array of logs, unhashed, to the sequencer via the data bus. I.e. the sequencer would receive a load of log data, and a polynomial commitment to that data (because that’s how the data bus works). Then the sequencer would need to run an ‘equivalence’ circuit which proves that the polynomial commitment is equal to some ethereum-friendly commitment (such as a sha256 hash, or a 4844 bls12 blob). But of course the data bus has an upper bound, and so it wouldn’t be able to cope with arbitrarily-sized logs, or arbitrary recursion depths. We’d need a way of occasionally compressing the logs, if further recursion is needed. There are ideas to introduce a ‘reset’ circuit for this purpose (it wasn’t thought of for logs, but perhaps it could be used for logs). But this all sounds a bit complicated!!! Aah!

I don’t think we need to. And actually, if we want logs to be arbitrarily-sized (which greatly helps devs design custom notes), we can’t do this, because the kernel circuit can only deal with static sizes; that’s one of the main reasons it currently only handles hashes of logs (because the hashes are constant-sized).

Something like Palla’s suggestion might work:

Perhaps:

end_unencrypted_logs_hash = sha256(
    previous_end_unencrypted_logs_hash,
    current_unencrypted_logs_hash,
    current_unencrypted_log_preimage_length, // to possibly protect against the length extension attacks
    current_unencrypted_log_emitter_address, // as palla suggests
);

The kernel is injecting the correct current_unencrypted_log_emitter_address into the above hash. And since L1 also recomputes this hash. So as long as the log data is encoded in a clear, unambiguous way when it’s broadcast to L1, the PXE should be able to read the log and the contract address, and be convinced that the contract address is correct, without having to do any hashing. (I think?)

Encrypted logs for notes:

I drew a pic. Logs for notes don’t need extra protection against impersonation; they already have the contract address baked into the siloed note hash. Nice.

Encrypted logs for non-notes (i.e. for arbitrary encrypted messages):

I drew a pic. It’s as Jan suggests above.

Jan · December 6, 2023, 6:15pm

Yes, we prefix each log with it’s length. The data is serialized here and we don’t currently constrain the lengths it in the circuits (here we accumulate the value).

Makes sense. Yes if the Decoder extracts the addresses from the logs and feed them into the hash then we are good because otherwise we would get a different final calldatahash (logs hash is part of preimage) and the verification would fail.

Topic		Replies	Views
[Proposal] Spending notes which haven't-yet been inserted Aztec utxos	18	1746	June 13, 2023
ECDSA Signature Verification in Private Kernel Circuit Aztec	1	988	April 26, 2023
Privacy of Transactions Aztec	2	636	November 18, 2023
[Proposal] Forcing the sequencer to actually submit data to L1 Aztec	14	1230	August 24, 2023
Log subscriptions. Calldata retrieval by tx-hash Aztec	1	831	August 4, 2023