From 774a7656d25977cdb3d3b7be37c7ca0ac0ec1f50 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sat, 18 Dec 2021 16:59:50 +0100 Subject: posts: Import stories from medium --- .../post/observations-from-a-trillian-play-date.md | 130 +++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 content/post/observations-from-a-trillian-play-date.md (limited to 'content/post/observations-from-a-trillian-play-date.md') diff --git a/content/post/observations-from-a-trillian-play-date.md b/content/post/observations-from-a-trillian-play-date.md new file mode 100644 index 0000000..47b4f1d --- /dev/null +++ b/content/post/observations-from-a-trillian-play-date.md @@ -0,0 +1,130 @@ +--- +title: "Observations from a Trillian play-date" +date: 2020-11-23 +--- +Have you ever heard about + [Trillian](https://transparency.dev/) +in the context of transparency logging? Perhaps you view it as an integral part +of + [Certificate Transparency](https://certificate.transparency.dev/), +a solution for arbitrary transparency applications, or both. Even if you know +Certificate Transparency quite well the Trillian details might be a bit blurry +until you sit down and get some hands-on experience: at least that was the case +for me. Therefore, Trillian and I had a little play-date. I thought I would +share a few observations that in hindsight are obvious but at the same time +helpful. + +## Problem statement and overview +I agree with Daz Wilkin that + [it is somewhat daunting to get started with Trillian](https://medium.com/google-cloud/google-trillian-for-noobs-9b81547e9c4a). +Putting it all together involves many different components and configurations, +especially if you need the high reliability and scale that Trillian supports. It +does not have to be that complicated though. Trillian is pretty much a database +which includes an append-only Merkle tree: + +1. **Trillian log server:** exposes a gRPC API that is used by an +application-dependent front-end or so-called _Trillian personality_. Requests and +responses trigger operations on the underlying database, such as queuing new +data requests and assembling cryptographic Merkle tree proofs. +2. **Trillian log signer:** checks the database periodically and sequences it +into a Merkle tree. The term _log signer_ was confusing for me initially because +it is usually the front-end personality that adds externally visible signatures. +Therefore, I found it helpful to think of this component as a _log sequencer_. + +I will not talk much about the front-end personality. It is the part of Trillian +that you or your ecosystem has to implement. It will include definitions of +public endpoints, the data to be logged, who is allowed logging it, etc. + +## Trillian as a database abstraction +The simplest description of Trillian is probably as a regular database. You can +insert any item of your choice after serializing it as zeroes and ones, and come +back later on and retrieve it. In reality it is more accurate to say that +Trillian is hooked-up to a database, such as MariaDB using the schema over + [here](https://github.com/google/trillian/blob/master/storage/mysql/schema/storage.sql). +This means that before getting started a database must be configured such that +there is a record in the Trees table that identifies a particular Trillian +instance. + +``` +CREATE TABLE IF NOT EXISTS Trees( + TreeId BIGINT NOT NULL, + HashAlgorithm ENUM(‘SHA256’) NOT NULL, + SignatureAlgorithm ENUM(‘ECDSA’, ‘RSA’, ‘ED25519’) NOT NULL, + PrivateKey MEDIUMBLOB NOT NULL, + PublicKey MEDIUMBLOB NOT NULL, + ... +); +``` + +Initially I was confused by the public-key cryptography that is part of the +database schema: is it not the front-end personality that attaches signatures, +for, say, Signed Certificate Timestamps (SCTs) in Certificate Transparency? +Well, yes. But the scenario in mind here is that there might be a front-end +personality that runs in a different trust domain, such that the Trillian +back-end needs to sign some data to prove its origin. The front-end determines +what becomes externally visible regardless of if these signatures are used. + +New add-data requests are queued by the Trillian log server in an unordered +table of pending leaves. Each such leaf also has an optional appendix, which +allows extra data to be stored but without merging it into the Merkle tree. For +example, it might be reasonable to hold on to an associated signature if the +front-end personality requires that the data is signed as an admission criteria. + +## Trillian as a Merkle tree abstraction +The log signer sequences the pending leaf data periodically. By sequencing I +mean taking the unordered leaves that one or more log servers queued, and then +appending them to the current Merkle tree on specific indices. In other words, +not even the log servers know the index of an added leaf until it is merged. It +is important to keep in mind because several proposals in the past assumed that +Trillian logs are timestamp ordered, but strictly speaking there is no such +guarantee unless the front-end takes responsibility of sequencing (in which case +there is a specific pre-ordered Trillian API that can be used). + +The Merkle tree itself is viewed as many smaller sub-trees in the database, +where only the bottom layer of each sub-tree is stored physically. Any interior +node can be computed on the fly, which apparently + [saves up to 50% of space](https://github.com/google/trillian/blob/master/docs/storage/storage.md). +The log server accesses the database to interact with the sequenced Merkle tree, +e.g., to retrieve tree heads and build audit paths (hashes in the tree). As +such, there is no explicit communication between the log server and signer. + +## Trillian as an API +The final part of the puzzle is the interface that the front-end personality can +use while talking to Trillian. Fortunately, it is relatively straight forward. +You will only send requests and receives responses from the log server that +exposes a gRPC API. Possible requests and responses are documented + [here](https://github.com/google/trillian/blob/master/docs/api.md). +This is really the place to look if you want to know what will "just work". + +For example, you will notice that there is a `QueueLeafRequest` that takes as +input some data that goes into the Merkle tree and the leaf’s Appendix, as well +as an identity hash that tells Trillian what should be counted as a duplicate. +You may also take advantage of the built-in Trillian rate limiting by specifying +a custom `charge_to` string. You can think of this as saying "Dear Trillian, +this IP address requested to add a leaf and it is signed using a certificate +that ends in the following trust anchor". In response a resource exhaustion +error might be returned if too many requests were observed for a given quota +string. + +Other requests I would suggest you look into include retrieving a leaf, a signed +tree head, an inclusion proof, and a consistency proof. It goes a pretty long +way if you want to get what details are (not) in the front-end personality. + +## Concluding remarks +The view that Trillian is a database with an append-only Merkle tree is by no +means wrong, but it is also not a complete description. For example, there is +also a map mode that associates keys with values without being append-only. If +you look further into Trillian you will also realize that there are many details +that matter for deployment but not so much if we just want to get the hang of +things. For example, there is built-in functionality for running several log +server and signing instances, coordinating them, exporting health metrics, +choosing database back-ends, configuring rate limiting, and more. If that sounds +interesting you can get an enhanced intuition by reading the + [manual deployment scenario](https://github.com/google/certificate-transparency-go/blob/master/trillian/docs/ManualDeployment.md) +documentation for Certificate Transparency. + +## Acknowledgments +Fredrik Strömberg provided valuable feedback on this story, which is sponsored +by my + [System Transparency](https://system-transparency.org/) +employment at Mullvad VPN. -- cgit v1.2.3