Thanks for this clear explanation Jan!
Preventing impersonation of unencrypted logs
What we do today:
I’m just repeating what you’ve already said, but it helped my brain
Suppose the app wants to output a log of 4 fields (or strings which are convertible to fields):
[ "hello0", "hello2", "hello3", "hello4" ]
The app sha256-hashes these fields:
unencrypted_logs_hash = sha256([ "hello0", "hello2", "hello3", "hello4" ]);
unencrypted_log_preimage_length = 4;
It passes these fields to the kernel (via the app circuit’s public inputs), and the kernel computes:
end_unencrypted_logs_hash = sha256(
previous_unencrypted_logs_hash,
current_unencrypted_logs_hash
);
Danger: it’s a separate topic from this thread, but the way the preimage lengths are being accumulated, by simply summing them, might be dangerous. Information is lost about the length of each individual log. So I having a feeling this could be subject to something akin to length extension attacks. We should probably encode each length in the
end_unencrypted_logs_hashcomputation, somehow.
This might be my fault from the initial logs forum post many months ago.
@Jan how does our L1 rollup contract know the individual log preimage lengths, in order to know how to hash-together all of the log data on L1? Is an encoding of each individual log preimage length submitted to L1? I suspect this encoding might not be constrained sufficiently.
So that’s what we do today.
This was the original journey that led to the current design of lots of sha256 hashing:
The final zk-snark that’s submitted to L1 must have a single public input (because this greatly reduces the cost of on-chain verification). The only way to achieve this is the hash the data that would have been part of the final snark’s public inputs, had we been able to include them all as public inputs. Keccak and sha256 are the only cheap L1 hashes. We could use keccak, but sha256 is slightly fewer constraints in a snark.
In particular, let’s focus on log data from each tx (and ignore state data, etc). Why do we also need to hash the log data within a snark? Well, each layer of the rollup uses the same merge rollup circuit. It’s fixed in the number of constraints, so all data going in and out must be the same size as all other layers. At the very bottom of the “rollup tree” (the base rollup circuit layer), there are an unknown number of txs (because are rollups can be flexible in how big they are), let’s say t txs. Each tx might wish to emit l log fields. That’s t * l log fields that need to get squished into a single public input by the time the final snark is computed.
Now, perhaps the sequencer should do all of this log hashing (somehow) instead of the user doing it, because the sequencer (or rather the prover) has more compute power. That’s probably possible, if we created many different-sized ‘log hashing’ gadget circuits, that solely hash logs of varying length, for each tx.
But how would we get all of that (variable-length) log preimage data to the sequencer, and how would we constrain the sequencer to use the correct log preimage data in their hash computation? The private kernel proof coming from the user would need to somehow constrain the sequencer to use the correct log preimage data.
But the only way I can think to do that is the have the user prove the hashing of the log preimage data on their own device, to prove what the resulting hash should be. And so there’s no point in the sequencer doing the hashing in a sequencer circuit at all. The user’s hashing of the logs could either be in the kernel circuit or the app circuit. It’s more flexible if it’s done by app circuits, so that the logs may vary in size.
Now, what we could do, to reduce sha256 hashing costs for the user, is allow logs to be pushed to the data bus. The kernel circuit could read these logs from the bus, and somehow pass an ever-growing ‘array’ of logs, unhashed, to each iteration of the kernel circuit, eventually passing this large array of logs, unhashed, to the sequencer via the data bus. I.e. the sequencer would receive a load of log data, and a polynomial commitment to that data (because that’s how the data bus works). Then the sequencer would need to run an ‘equivalence’ circuit which proves that the polynomial commitment is equal to some ethereum-friendly commitment (such as a sha256 hash, or a 4844 bls12 blob). But of course the data bus has an upper bound, and so it wouldn’t be able to cope with arbitrarily-sized logs, or arbitrary recursion depths. We’d need a way of occasionally compressing the logs, if further recursion is needed. There are ideas to introduce a ‘reset’ circuit for this purpose (it wasn’t thought of for logs, but perhaps it could be used for logs). But this all sounds a bit complicated!!! Aah!
I don’t think we need to. And actually, if we want logs to be arbitrarily-sized (which greatly helps devs design custom notes), we can’t do this, because the kernel circuit can only deal with static sizes; that’s one of the main reasons it currently only handles hashes of logs (because the hashes are constant-sized).
Something like Palla’s suggestion might work:
Perhaps:
end_unencrypted_logs_hash = sha256(
previous_end_unencrypted_logs_hash,
current_unencrypted_logs_hash,
current_unencrypted_log_preimage_length, // to possibly protect against the length extension attacks
current_unencrypted_log_emitter_address, // as palla suggests
);
The kernel is injecting the correct current_unencrypted_log_emitter_address into the above hash. And since L1 also recomputes this hash. So as long as the log data is encoded in a clear, unambiguous way when it’s broadcast to L1, the PXE should be able to read the log and the contract address, and be convinced that the contract address is correct, without having to do any hashing. (I think?)
Encrypted logs for notes:
I drew a pic. Logs for notes don’t need extra protection against impersonation; they already have the contract address baked into the siloed note hash. Nice.
Encrypted logs for non-notes (i.e. for arbitrary encrypted messages):
I drew a pic. It’s as Jan suggests above.

