diff options
author | Rasmus Dahlberg <rasmus@mullvad.net> | 2021-12-18 16:59:50 +0100 |
---|---|---|
committer | Rasmus Dahlberg <rasmus@mullvad.net> | 2021-12-18 16:59:50 +0100 |
commit | 774a7656d25977cdb3d3b7be37c7ca0ac0ec1f50 (patch) | |
tree | ca8e630a1c4c24e682581f2e8da056e47f13bbb6 /content/post/trillian-log-sequencing-demystified.md | |
parent | 704f51516cd905835956028dfd53d67add53d396 (diff) |
posts: Import stories from medium
Diffstat (limited to 'content/post/trillian-log-sequencing-demystified.md')
-rw-r--r-- | content/post/trillian-log-sequencing-demystified.md | 111 |
1 files changed, 111 insertions, 0 deletions
diff --git a/content/post/trillian-log-sequencing-demystified.md b/content/post/trillian-log-sequencing-demystified.md new file mode 100644 index 0000000..bf1da6d --- /dev/null +++ b/content/post/trillian-log-sequencing-demystified.md @@ -0,0 +1,111 @@ +--- +title: "Trillian log sequencing: demystified?" +date: 2021-02-09 +--- +One way to view + [Trillian](https://transparency.dev/) +is as + [a database with an append-only Merkle tree](https://www.rgdd.se/post/observations-from-a-trillian-play-date/). +That Merkle tree is managed by a separate component called a log signer. It runs +in a dedicated process that basically merges pending leaves that have yet to be +incorporated into the Merkle tree. This part of the log signer is structured as +a sequencing job that runs periodically. I spent a few hours learning more about +the details and thought shared knowledge is better. + +## Problem definition and overview +Trillian’s + [log signer](https://github.com/google/trillian/blob/3ae67195ffd778d37275c6972445f4e7f9e21410/cmd/trillian_log_signer/main.go) +comes with a whole bunch of configuration options that are spread across several +different files. Some of these options are more difficult to grasp than others, +such as `num_sequencers`, `sequencer_interval`, and `batch_size`. I don't mean +difficult as in understanding that there may be several sequencers that run +periodically, but rather what that actually means in terms of concurrency and +how much can be sequenced per time unit. + +The short answer is as follows: +1. Regardless of how many sequencers you configure there will be no concurrent +sequencing of a particular Merkle tree. The number of sequencers is only +relevant if there are multiple Merkle trees. +2. The sequencer interval tells you how often the log signer wakes up to do a +sequencing job. It sequences no more than the configured batch size before going +back to sleep. If the interval elapsed already there will be no sleep. + +In other words, to avoid building up a large backlog you need to consider how +often the sequencer runs and how much work it is willing to do every time. The +longer answer is detailed below, and it includes a little bit of additional +context regarding what you may (not) do with these three parameters. For +example, the sequencer job is most reliable if it runs often over small batches. + +## Log signer +The most central part of the log signer is probably its forever loop. For +reference it is implemented by Trillian’s + [operation manager](https://github.com/google/trillian/blob/3ae67195ffd778d37275c6972445f4e7f9e21410/log/operation_manager.go#L110), +see the `OperationLoop` function. In a nutshell, the log signer wakes up, +performs some jobs, and goes back to sleep. Pretty much a busy working day! + +### Coordination +Before proceeding you need to know that a log signer may manage multiple +different Merkle trees. It might sound odd at first, but hopefully less so if +you think about the existence of transparency applications that use + [temporal sharding](https://www.digicert.com/dc/blog/scaling-certificate-transparency-logs-temporal-sharding/): +the split of one transparency log into multiple smaller ones so that the leaves +are grouped by, say, yearly relevance like 2020 and 2021. This allows parts of +the log to be retired as soon as they are no longer needed. You can also run +multiple different log signers to avoid single points of failure. At most one +log signer is responsible for a particular Merkle tree at a time though. This +requires coordination, which is one part of the forever loop. + +### Sequencing +The other part is log sequencing. After waking up once per interval as defined +by the configured `sequencer_interval`, the log signer takes a pass over all +Merkle trees that it manages. A sequencing job is scheduled for each Merkle tree +that the log signer is currently the master for. It is determined by an election +protocol which is used for coordination in the case of multiple log signers, +otherwise master is assumed as there is nothing to coordinate. Next, these jobs +are distributed to a number of concurrent sequencers that work on _different +Merkle trees_. This means that there is no concurrency whatsoever when _a +particular Merkle tree_ is being sequenced. Moreover, + [what is selected for sequencing](https://github.com/google/trillian/blob/3ae67195ffd778d37275c6972445f4e7f9e21410/storage/mysql/queue.go#L33) +is deterministic based on Trillian’s internal timestamp order and the number of +leaves that the sequencer is willing to process per job. + +In other words, there is no point setting `num_sequencers` higher than the +number of Merkle trees that you manage. You may set it lower, in which case at +least one sequencer will move on to a different sequencing job (and thus a +different Merkle tree) once it is done. Now it is also evident that the +configured `sequencer_interval` and `batch_size` determines an upper bound for +the number of writes per time unit. It is an upper bound because your hardware +might not be capable of handling, say, 10k writes per second. + +## Concluding remarks +When I noticed the `sequencer_interval` parameter I wondered if it could be used +to configure a tree head frequency, such that the Trillian front-end personality +would only see a fixed number of updates per time interval. For example, you +might want that because the + [second version of Certificate Transparency requires control over it](https://datatracker.ietf.org/doc/html/draft-ietf-trans-rfc6962-bis-34#section-4.1) +and some gossip-audit models + [assume it](https://tools.ietf.org/html/draft-ietf-trans-gossip-05). +If not supported by the Trillian back-end, the front-end personality has to take +on the role. While it is trivial in the case of a single personality instance +(e.g., pick a tree head every hour), it might require some additional +coordination if there are multiple concurrent instances running. So, it would be +convenient if the _already coordinated_ log signer could enforce a tree head +frequency. + +In theory it is of course possible to set the sequencer interval and batch size +to accommodate both a frequency and an upper bound for writes per second. In +practice it is apparently recommended to use short intervals and batch sizes of +up to 1000 leaves. This recommendation involves quite a bit of nuance, and +relates to things like + [which back-end is used for the underlying database](https://github.com/google/trillian/issues/1845). +If tree head frequencies are generally useful it might land as a back-end +feature at some point. For now, it is better enforced by the front-end +personality. + +## Acknowledgments +This story is sponsored by my + [System Transparency](https://system-transparency.org/) +employment at Mullvad VPN. Martin Hutchinson and Pavel Kalinnikov provided +valuable insights as part of a conversation in the + [Trillian slack](https://gtrillian.slack.com). +Mistakes, if any, are my own. |