diff options
author | Rasmus Dahlberg <rasmus@mullvad.net> | 2021-12-18 16:59:50 +0100 |
---|---|---|
committer | Rasmus Dahlberg <rasmus@mullvad.net> | 2021-12-18 16:59:50 +0100 |
commit | 774a7656d25977cdb3d3b7be37c7ca0ac0ec1f50 (patch) | |
tree | ca8e630a1c4c24e682581f2e8da056e47f13bbb6 | |
parent | 704f51516cd905835956028dfd53d67add53d396 (diff) |
posts: Import stories from medium
-rw-r--r-- | config.toml | 8 | ||||
-rw-r--r-- | content/post/hold-on-to-your-hat-and-learn-system-transparency-in-five-minutes.md | 146 | ||||
-rw-r--r-- | content/post/observations-from-a-trillian-play-date.md | 130 | ||||
-rw-r--r-- | content/post/trillian-log-sequencing-demystified.md | 111 | ||||
-rw-r--r-- | content/post/what-happened-at-ct-days-2020.md | 160 | ||||
-rw-r--r-- | static/img/ctdev.png | bin | 0 -> 91722 bytes |
6 files changed, 551 insertions, 4 deletions
diff --git a/config.toml b/config.toml index ac4186e..3326d20 100644 --- a/config.toml +++ b/config.toml @@ -63,10 +63,10 @@ metaDataFormat= "toml" name = "About" url = "about" - #[[menu.main]] - # identifier = "posts" - # name = "Posts" - # url = "post/" + [[menu.main]] + identifier = "post" + name = "Blog" + url = "post/" [markup] defaultMarkdownHandler = "goldmark" diff --git a/content/post/hold-on-to-your-hat-and-learn-system-transparency-in-five-minutes.md b/content/post/hold-on-to-your-hat-and-learn-system-transparency-in-five-minutes.md new file mode 100644 index 0000000..978791d --- /dev/null +++ b/content/post/hold-on-to-your-hat-and-learn-system-transparency-in-five-minutes.md @@ -0,0 +1,146 @@ +--- +title: "Hold on to your hat and learn System Transparency in five minutes!" +date: 2020-10-12 +--- +What do we really know about the systems that run our critical applications? +Not enough is probably a fair summary: much can go wrong between device reset +and execution of a user-land application. System Transparency helps you verify +that what you think is running remotely actually runs, and not, say, a modified +operating system that contains a secret backdoor. I will break it down +top-to-bottom after first motivating the rationale and objective briefly. + +## Rationale and objective +Anyone in a position of power should probably be subject to a proportional +amount of transparency. It is an important safeguard that deters malicious +activities, while at the same time making it possible to fix honest mistakes. +Such a principle can of course be applied in real life, but I mainly refer to +the different components that compose a digital system: hardware, firmware, +operating systems, applications, and so forth. Generally I would say that _power +is decreased by transparency because most abuse can be detected_. For example, +it would be proportional by Intel to open up their proprietary management engine +because it is powerful enough to + [hijack your system](https://www.wired.com/2017/05/hack-brief-intel-fixes-critical-bug-lingered-7-dang-years/). + + +The scenario to keep in mind is accordingly as follows. A remote server is +running a service somewhere that processes your data based on a policy. You +might have reason to believe that said policy is followed now, but will it be +in the future when + [intruders](https://www.eff.org/deeplinks/2020/07/after-weeks-hack-it-past-time-twitter-end-end-encrypt-direct-messages) +and + [law enforcement](https://www.eff.org/cases/apple-challenges-fbi-all-writs-act-order) +knock down the door? I, for one, would prefer if we could _verify_ that the +system in question works as intended (and not just trust that to be the case +blindly). The other benefit of such remote system verification is more subtle: +the service provider could use it to determine if intention matches deployment. +Of course there might be unknown bugs, but by making every part of the system +as transparent as possible it will be easier to find vulnerabilities and assess +trustworthiness. + +## Breaking it down, top-to-bottom +The idea is to first make transparent what is allowed running on a given system. +You can view this as the top-most layer that represents an operating system +package with installed programs, configurations, and so forth. Thereafter we +need to enforce that nothing else than the transparent operating system package +is allowed running with a bottom layer. Such enforcement is based on hardware +features that should be transparent as well. + +### Reproducible and publicly auditable operating system packages +Suppose that we have an operating system package that we would like to deploy. +As a first step we need to + [build it reproducibly](https://reproducible-builds.org/), such that anyone can inspect +the source code and determine if the resulting package lives up to the claimed +promises. A possible issue that one might find, for example, is that there is +interactive system access installed: pretty much anything could run after a +reconfiguration. Therefore, a transparent system should restrict arbitrary +access and provision updates as new operating system packages that, again, +build reproducibly. For those that are familiar with functional programming, it +is essentially an + [immutable infrastructure](https://web.archive.org/web/20200518230417/http://chadfowler.com/2013/06/23/immutable-deployments.html). +An independent benefit of such maintenance is that + [malware persistence](https://github.com/Karneades/malware-persistence#overview-of-often-and-less-often-used-persistence-mechanisms) +becomes trickier. + +A reproducible operating system package serves a limited purpose unless it is +publicly available. Therefore, we should insert it into a + [transparency log](https://transparency.dev/). +This means that anyone can verify whether a package builds reproducibly, and if +it contains, say, a secret backdoor that would be detected by source inspection. + +### Measured and remotely attested boot +Now we need to enforce that the publicly disclosed operating system packages +run on our servers and nothing else. At a first glance it might sound daunting, +but today’s hardware platforms ship some pretty useful security features. For +example, there is usually a separate hardware domain for key management, +cryptographic hashing, Platform Configuration Registers (PCRs), and digital +signatures. It is possible to measure code, data structures, and configurations +into a PCR before execution to form a hash chain, such that all initial system +states can be aggregated into a single value. The system’s boot process can be +aborted if a measurement diverges from the expected value, e.g., because the +boot loader did not enforce transparency logging as required by the top layer. +It is also possible to sign PCR values and attest them remotely. In other +words, if these features work we can prove to a third party how the system +booted. + +### Open source firmware and LinuxBoot +An immediate concern is that much trust is placed in the underlying hardware +platform. Naturally, it begs the question if such trust is misplaced. A + [talk by Ron Minnich](https://osseu17.sched.com/event/ByYt/replace-your-exploit-ridden-firmware-with-linux-ronald-minnich-google) +brings you up to speed on why the answer is probably "yes". Let us focus on +solutions instead: open hardware, firmware, and boot loaders. It is paramount +that these components are vetted thoroughly in the open because they may +compromise the system + [while running or before it is even started](https://securelist.com/mosaicregressor/98849/). + +So, System Transparency implements a flavor of + [LinuxBoot](https://www.linuxboot.org/) +called + [stboot](https://github.com/system-transparency/system-transparency/blob/master/README.md#bootloader-stboot). +It can replace much of the later-stage UEFI components with a Linux kernel and a +user-land environment in Go, such that a subset of proprietary firmware is +removed in favor of an open source option that is safer and customizable. For +example, one possible customization is to enforce transparency logging as a +criteria to boot into the host operating system. It is possible to eliminate +UEFI all together by re-flashing the firmware with + [coreboot](https://doc.coreboot.org/) +and specifying stboot as a payload. The TL;DR is that coreboot is (mostly) open +source firmware that does the bare minimum hardware initialization. It was +recently + [ported to a modern server platform](https://mullvad.net/en/blog/2019/8/7/open-source-firmware-future/). + +### Set-up ceremony and tamper-evident hardware +Assuming an open platform that enforces transparency logging as described +above, you can be somewhat sure that said operating system packages run. The +problem is that you cannot easily know if that assumption is true. I am not +claiming that there is a slam-dunk solution here, but measures can be taken to +reduce the risk of a broken setup. For example, assemble and install the +platform while witnessed live by several independent parties that write down +and publish a log book of events that occurred: + "[neutralized the management engine](https://github.com/corna/me_cleaner)", +"added open firmware with checksum XYZ", etc. We can also define some physical +security boundaries that, if breached, automatically activate defensive +mechanisms that preserve the system’s overall integrity after setup. + +## Concluding remarks +The described System Transparency design shows how a service provider can +facilitate trust by engineering a system that is more trustworthy. I would like +to emphasize _more trustworthy:_ all of the applied techniques have merit on +their own, and if one part does not fit the use-case or current practice it +might be reasonable to cut it. For example, if you lease cloud servers that +only allow starting stboot from UEFI: so be it. Simply assume that there will +be no firmware and physical attacks for the time being. It is still a +significant improvement when compared to obscure operating system packages +since the attack surface and overall trust domain is reduced. +[The growing problem of malicious Tor relays in the cloud](https://medium.com/@nusenu/how-malicious-tor-relays-are-exploiting-users-in-2020-part-i-1097575c0cac) +could benefit from such a solution because a class of real-world attackers would +not see any traffic (if enforced by Tor). As another example: suppose your +interest is mainly to harden your own internal infrastructure, and not so much +about making it transparent for everyone. It is not a strict requirement to put +the operating system package in the public, i.e., a hash is enough to convince +yourself that nothing else was allowed running. + +## Acknowledgments +Fredrik Strömberg provided valuable feedback on this story, which is sponsored +by my + [System Transparency](https://system-transparency.org/) +employment at Mullvad VPN. diff --git a/content/post/observations-from-a-trillian-play-date.md b/content/post/observations-from-a-trillian-play-date.md new file mode 100644 index 0000000..47b4f1d --- /dev/null +++ b/content/post/observations-from-a-trillian-play-date.md @@ -0,0 +1,130 @@ +--- +title: "Observations from a Trillian play-date" +date: 2020-11-23 +--- +Have you ever heard about + [Trillian](https://transparency.dev/) +in the context of transparency logging? Perhaps you view it as an integral part +of + [Certificate Transparency](https://certificate.transparency.dev/), +a solution for arbitrary transparency applications, or both. Even if you know +Certificate Transparency quite well the Trillian details might be a bit blurry +until you sit down and get some hands-on experience: at least that was the case +for me. Therefore, Trillian and I had a little play-date. I thought I would +share a few observations that in hindsight are obvious but at the same time +helpful. + +## Problem statement and overview +I agree with Daz Wilkin that + [it is somewhat daunting to get started with Trillian](https://medium.com/google-cloud/google-trillian-for-noobs-9b81547e9c4a). +Putting it all together involves many different components and configurations, +especially if you need the high reliability and scale that Trillian supports. It +does not have to be that complicated though. Trillian is pretty much a database +which includes an append-only Merkle tree: + +1. **Trillian log server:** exposes a gRPC API that is used by an +application-dependent front-end or so-called _Trillian personality_. Requests and +responses trigger operations on the underlying database, such as queuing new +data requests and assembling cryptographic Merkle tree proofs. +2. **Trillian log signer:** checks the database periodically and sequences it +into a Merkle tree. The term _log signer_ was confusing for me initially because +it is usually the front-end personality that adds externally visible signatures. +Therefore, I found it helpful to think of this component as a _log sequencer_. + +I will not talk much about the front-end personality. It is the part of Trillian +that you or your ecosystem has to implement. It will include definitions of +public endpoints, the data to be logged, who is allowed logging it, etc. + +## Trillian as a database abstraction +The simplest description of Trillian is probably as a regular database. You can +insert any item of your choice after serializing it as zeroes and ones, and come +back later on and retrieve it. In reality it is more accurate to say that +Trillian is hooked-up to a database, such as MariaDB using the schema over + [here](https://github.com/google/trillian/blob/master/storage/mysql/schema/storage.sql). +This means that before getting started a database must be configured such that +there is a record in the Trees table that identifies a particular Trillian +instance. + +``` +CREATE TABLE IF NOT EXISTS Trees( + TreeId BIGINT NOT NULL, + HashAlgorithm ENUM(‘SHA256’) NOT NULL, + SignatureAlgorithm ENUM(‘ECDSA’, ‘RSA’, ‘ED25519’) NOT NULL, + PrivateKey MEDIUMBLOB NOT NULL, + PublicKey MEDIUMBLOB NOT NULL, + ... +); +``` + +Initially I was confused by the public-key cryptography that is part of the +database schema: is it not the front-end personality that attaches signatures, +for, say, Signed Certificate Timestamps (SCTs) in Certificate Transparency? +Well, yes. But the scenario in mind here is that there might be a front-end +personality that runs in a different trust domain, such that the Trillian +back-end needs to sign some data to prove its origin. The front-end determines +what becomes externally visible regardless of if these signatures are used. + +New add-data requests are queued by the Trillian log server in an unordered +table of pending leaves. Each such leaf also has an optional appendix, which +allows extra data to be stored but without merging it into the Merkle tree. For +example, it might be reasonable to hold on to an associated signature if the +front-end personality requires that the data is signed as an admission criteria. + +## Trillian as a Merkle tree abstraction +The log signer sequences the pending leaf data periodically. By sequencing I +mean taking the unordered leaves that one or more log servers queued, and then +appending them to the current Merkle tree on specific indices. In other words, +not even the log servers know the index of an added leaf until it is merged. It +is important to keep in mind because several proposals in the past assumed that +Trillian logs are timestamp ordered, but strictly speaking there is no such +guarantee unless the front-end takes responsibility of sequencing (in which case +there is a specific pre-ordered Trillian API that can be used). + +The Merkle tree itself is viewed as many smaller sub-trees in the database, +where only the bottom layer of each sub-tree is stored physically. Any interior +node can be computed on the fly, which apparently + [saves up to 50% of space](https://github.com/google/trillian/blob/master/docs/storage/storage.md). +The log server accesses the database to interact with the sequenced Merkle tree, +e.g., to retrieve tree heads and build audit paths (hashes in the tree). As +such, there is no explicit communication between the log server and signer. + +## Trillian as an API +The final part of the puzzle is the interface that the front-end personality can +use while talking to Trillian. Fortunately, it is relatively straight forward. +You will only send requests and receives responses from the log server that +exposes a gRPC API. Possible requests and responses are documented + [here](https://github.com/google/trillian/blob/master/docs/api.md). +This is really the place to look if you want to know what will "just work". + +For example, you will notice that there is a `QueueLeafRequest` that takes as +input some data that goes into the Merkle tree and the leaf’s Appendix, as well +as an identity hash that tells Trillian what should be counted as a duplicate. +You may also take advantage of the built-in Trillian rate limiting by specifying +a custom `charge_to` string. You can think of this as saying "Dear Trillian, +this IP address requested to add a leaf and it is signed using a certificate +that ends in the following trust anchor". In response a resource exhaustion +error might be returned if too many requests were observed for a given quota +string. + +Other requests I would suggest you look into include retrieving a leaf, a signed +tree head, an inclusion proof, and a consistency proof. It goes a pretty long +way if you want to get what details are (not) in the front-end personality. + +## Concluding remarks +The view that Trillian is a database with an append-only Merkle tree is by no +means wrong, but it is also not a complete description. For example, there is +also a map mode that associates keys with values without being append-only. If +you look further into Trillian you will also realize that there are many details +that matter for deployment but not so much if we just want to get the hang of +things. For example, there is built-in functionality for running several log +server and signing instances, coordinating them, exporting health metrics, +choosing database back-ends, configuring rate limiting, and more. If that sounds +interesting you can get an enhanced intuition by reading the + [manual deployment scenario](https://github.com/google/certificate-transparency-go/blob/master/trillian/docs/ManualDeployment.md) +documentation for Certificate Transparency. + +## Acknowledgments +Fredrik Strömberg provided valuable feedback on this story, which is sponsored +by my + [System Transparency](https://system-transparency.org/) +employment at Mullvad VPN. diff --git a/content/post/trillian-log-sequencing-demystified.md b/content/post/trillian-log-sequencing-demystified.md new file mode 100644 index 0000000..bf1da6d --- /dev/null +++ b/content/post/trillian-log-sequencing-demystified.md @@ -0,0 +1,111 @@ +--- +title: "Trillian log sequencing: demystified?" +date: 2021-02-09 +--- +One way to view + [Trillian](https://transparency.dev/) +is as + [a database with an append-only Merkle tree](https://www.rgdd.se/post/observations-from-a-trillian-play-date/). +That Merkle tree is managed by a separate component called a log signer. It runs +in a dedicated process that basically merges pending leaves that have yet to be +incorporated into the Merkle tree. This part of the log signer is structured as +a sequencing job that runs periodically. I spent a few hours learning more about +the details and thought shared knowledge is better. + +## Problem definition and overview +Trillian’s + [log signer](https://github.com/google/trillian/blob/3ae67195ffd778d37275c6972445f4e7f9e21410/cmd/trillian_log_signer/main.go) +comes with a whole bunch of configuration options that are spread across several +different files. Some of these options are more difficult to grasp than others, +such as `num_sequencers`, `sequencer_interval`, and `batch_size`. I don't mean +difficult as in understanding that there may be several sequencers that run +periodically, but rather what that actually means in terms of concurrency and +how much can be sequenced per time unit. + +The short answer is as follows: +1. Regardless of how many sequencers you configure there will be no concurrent +sequencing of a particular Merkle tree. The number of sequencers is only +relevant if there are multiple Merkle trees. +2. The sequencer interval tells you how often the log signer wakes up to do a +sequencing job. It sequences no more than the configured batch size before going +back to sleep. If the interval elapsed already there will be no sleep. + +In other words, to avoid building up a large backlog you need to consider how +often the sequencer runs and how much work it is willing to do every time. The +longer answer is detailed below, and it includes a little bit of additional +context regarding what you may (not) do with these three parameters. For +example, the sequencer job is most reliable if it runs often over small batches. + +## Log signer +The most central part of the log signer is probably its forever loop. For +reference it is implemented by Trillian’s + [operation manager](https://github.com/google/trillian/blob/3ae67195ffd778d37275c6972445f4e7f9e21410/log/operation_manager.go#L110), +see the `OperationLoop` function. In a nutshell, the log signer wakes up, +performs some jobs, and goes back to sleep. Pretty much a busy working day! + +### Coordination +Before proceeding you need to know that a log signer may manage multiple +different Merkle trees. It might sound odd at first, but hopefully less so if +you think about the existence of transparency applications that use + [temporal sharding](https://www.digicert.com/dc/blog/scaling-certificate-transparency-logs-temporal-sharding/): +the split of one transparency log into multiple smaller ones so that the leaves +are grouped by, say, yearly relevance like 2020 and 2021. This allows parts of +the log to be retired as soon as they are no longer needed. You can also run +multiple different log signers to avoid single points of failure. At most one +log signer is responsible for a particular Merkle tree at a time though. This +requires coordination, which is one part of the forever loop. + +### Sequencing +The other part is log sequencing. After waking up once per interval as defined +by the configured `sequencer_interval`, the log signer takes a pass over all +Merkle trees that it manages. A sequencing job is scheduled for each Merkle tree +that the log signer is currently the master for. It is determined by an election +protocol which is used for coordination in the case of multiple log signers, +otherwise master is assumed as there is nothing to coordinate. Next, these jobs +are distributed to a number of concurrent sequencers that work on _different +Merkle trees_. This means that there is no concurrency whatsoever when _a +particular Merkle tree_ is being sequenced. Moreover, + [what is selected for sequencing](https://github.com/google/trillian/blob/3ae67195ffd778d37275c6972445f4e7f9e21410/storage/mysql/queue.go#L33) +is deterministic based on Trillian’s internal timestamp order and the number of +leaves that the sequencer is willing to process per job. + +In other words, there is no point setting `num_sequencers` higher than the +number of Merkle trees that you manage. You may set it lower, in which case at +least one sequencer will move on to a different sequencing job (and thus a +different Merkle tree) once it is done. Now it is also evident that the +configured `sequencer_interval` and `batch_size` determines an upper bound for +the number of writes per time unit. It is an upper bound because your hardware +might not be capable of handling, say, 10k writes per second. + +## Concluding remarks +When I noticed the `sequencer_interval` parameter I wondered if it could be used +to configure a tree head frequency, such that the Trillian front-end personality +would only see a fixed number of updates per time interval. For example, you +might want that because the + [second version of Certificate Transparency requires control over it](https://datatracker.ietf.org/doc/html/draft-ietf-trans-rfc6962-bis-34#section-4.1) +and some gossip-audit models + [assume it](https://tools.ietf.org/html/draft-ietf-trans-gossip-05). +If not supported by the Trillian back-end, the front-end personality has to take +on the role. While it is trivial in the case of a single personality instance +(e.g., pick a tree head every hour), it might require some additional +coordination if there are multiple concurrent instances running. So, it would be +convenient if the _already coordinated_ log signer could enforce a tree head +frequency. + +In theory it is of course possible to set the sequencer interval and batch size +to accommodate both a frequency and an upper bound for writes per second. In +practice it is apparently recommended to use short intervals and batch sizes of +up to 1000 leaves. This recommendation involves quite a bit of nuance, and +relates to things like + [which back-end is used for the underlying database](https://github.com/google/trillian/issues/1845). +If tree head frequencies are generally useful it might land as a back-end +feature at some point. For now, it is better enforced by the front-end +personality. + +## Acknowledgments +This story is sponsored by my + [System Transparency](https://system-transparency.org/) +employment at Mullvad VPN. Martin Hutchinson and Pavel Kalinnikov provided +valuable insights as part of a conversation in the + [Trillian slack](https://gtrillian.slack.com). +Mistakes, if any, are my own. diff --git a/content/post/what-happened-at-ct-days-2020.md b/content/post/what-happened-at-ct-days-2020.md new file mode 100644 index 0000000..486bb5e --- /dev/null +++ b/content/post/what-happened-at-ct-days-2020.md @@ -0,0 +1,160 @@ +--- +title: "What happened at CT days 2020?" +date: 2020-09-14 +--- +This year’s + [CT days](https://groups.google.com/a/chromium.org/forum/#!topic/ct-policy/JWVVhZTL5RM) +were hosted remotely on September 8–9. The agenda covered a wide range of +topics, such as making CT more newcomer friendly, updating user-agent policies, +and what it takes to operate a log at scale. I do not intend to write about all +of it, and especially not every little detail. You will be brought up to speed +on some highlights and get further reading. All credit obviously goes to the +people who presented sessions on this material. + +## New community website +You might be familiar with the + [current CT website](https://web.archive.org/web/20200829193007/https://www.certificate-transparency.org/). +It is a little bit like an explosion of technical details and links from several +years back that, if you digest it all, tell you how Google’s CT project works. +This is not particularly welcoming for newcomers that need to grasp what CT is +today and how it fits into the broader picture of the web’s public-key +infrastructure. For example, CT is no longer Google’s own logging project, but +rather an ecosystem of different people and organizations that come together +with one mission: to detect maliciously or mistakenly issued certificates. This +happens to be the first thing you will find when browsing the + [new community website](https://certificate.transparency.dev/). +It is nifty looking, and I encourage you to browse it yourself. You will notice +that the origin story of CT and its broader context is described, which helps +the reader pick up the fundamentals from a combination of text and +visualizations. + +![Working together to detect maliciously issued certificates](/img/ctdev.png) +**source**: https://certificate.transparency.dev/ + +[Ryan Hurst](https://unmitigatedrisk.com/) explained that we can think of the +new community website as a place where newcomers can be directed to understand +the value of CT and how it works. Moreover, it is our collective responsibility +to add anything that is missing and keep it up-to-date. Anyone can + [submit pull requests](https://github.com/google/certificate-transparency-community-site) +on GitHub. + +## Policy updates +I can second that it is not always easy to understand every nuance of CT +enforcement by different user agents. For example, I remember + [filing a bug](https://bugs.chromium.org/p/chromium/issues/detail?id=1053971#c1) +not too long ago when noticing that Chromium (not to be confused with Google +Chrome) disabled CT by default. In the future there might be a separate and +lighter-weight Chromium CT policy that embedders could use as a starting point, +but for now Google’s policy will be shaped solely for Chrome. This is reflected +by the new CT policy website that is being drafted: the so-called + [Chrome Certificate Transparency Policy](https://googlechrome.github.io/CertificateTransparency/ct_policy.html). +[Devon O’Brien](https://twitter.com/modyoloN) appropriately described it as a +complete overhaul of what the current policy states and how these requirements +are framed specifically for enforcement in Chrome. + +The CT community website will link the updated CT policy once it goes live in +the near future. In the mean time, you can enjoy the draft and provide feedback +to the + [Chrome CT team](mailto:chrome-certificate-transparency@google.com) +or the + [CT policy group](https://groups.google.com/a/chromium.org/g/ct-policy). +The recommended starting point is the different states that a CT log can be in. +I am not going to detail it, but it was mentioned as a pro-tip in multiple +sessions. You will also notice that informative reference material was added, +with more to come as the broader community identifies components that are +unclear or missing. + +There are several policy changes in the midst of the above updates. While not +particularly huge, they are there and worth pointing out. All new CT logs must +be temporally sharded. A log that is temporally sharded accepts logging of +certificates that fall into a range of expiry dates. For example, a 2020-shard +would not accept certificates that expire in 2021. The status of sharding moves +from allowed to required, solving the issue of logs that grow forever and ever. +A log must additionally not return multiple different SCTs for a single log +entry, which solves undefined behavior in + [RFC 6962](https://tools.ietf.org/html/rfc6962) +that could lead to unverifiable SCTs in quirky corner cases. Note that a +certificate that passes the Google Chrome CT policy is now said to be CT +compliant. It is analogous to the current wording of a certificate being CT +qualified, and avoids confusion with CT logs that can also be qualified in the +browser. + +[Clint Wilson](https://twitter.com/clintw_) announced + [Apple’s intended policy updates](https://groups.google.com/a/chromium.org/forum/?oldui=1#!topic/ct-policy/JWVVhZTL5RM). +They are also working towards definitions in undefined scenarios, such as +expressing their policy in terms of days as opposed to months. The most notable +policy update (in my opinion) is that the SCT diversity and quantity assumption +for a CT compliant certificate is about to change. Currently, Apple considers a +certificate CT compliant if accompanied by two SCTs from any pair of CT logs. +Longer-lived certificates will require an additional SCT in the future, and log +operator diversity will be added as well. This reduces the risk that a once CT +compliant certificate will become non-compliant during its life time. The other +benefit is that no single log operator can issue a certificate’s SCTs, which +raises the bar towards unnoticed certificate mis-issuance. What it means that +two log operators are diverse (also known as independent) is somewhat +non-trivial: it can span many different dimensions, such as organization, +country, and infrastructural providers. Google’s + [updated policy for log operators](https://googlechrome.github.io/CertificateTransparency/log_policy.html) +will require a self-assertion that you are independent of all other log +operators. My best guess is that Apple will rely on something similar to define +log diversity. + +## Removing the one-Google log requirement? +Google Chrome currently considers a certificate CT compliant if it is +accompanied by two SCTs. One of these SCTs must additionally be issued by a log +that Google runs. If Google’s CT logs are operated in good faith, we can be sure +that there are no mis-issued certificates that go unnoticed. + +To state the obvious, the point of CT is not to trust that Google keeps the +wider web safe from certificate mis-issuance. The current set-up is in fact +quite error-prone: sub-optimal trust assumptions aside, a Google outage would +essentially disable issuance of new CT compliant certificates on the web. Devon +O’Brien expressed a desire to remove Google from the critical path of +certificate issuance, and what considerations that need to go into such a +decision. The major part is that Google must accept that they lose their +privilege of observing all issued certificates up-front, which provides +significant proactive security for Chrome users but might impend broader +user-agent adoption. The result of such a change is that SCTs must be audited +reactively in the background from many diverse vantage points, such that +mis-issued certificates get noticed without the one-Google log policy. + +[Chris Thompson](https://notyetsecure.com/) presented Google’s plans on + [opt-in SCT auditing](https://docs.google.com/document/d/1G1Jy8LJgSqJ-B673GnTYIG4b7XRw2ZLtvvSlrqFcl4A/edit). +The basic idea is to submit a random subset of SCTs to a Google-operated CT +auditor. If an SCT is encountered that Google does not know about, that can be +investigated further by challenging the issuing CT logs to prove certificate +inclusion. The reason why this requires opt-in stems from the fact that the user +is essentially sharing a random subset of its browsing history with Google. You +might wonder who would opt-in for that, but it actually fits pretty well into +the existing Safe Browsing Extended Reporting (SBER) programme. My largest +concern is that opted-in users might be identifiable, and in that case the rest +of us could still be attacked without high likelihood of detection. + +[Sarah Meiklejohn](https://smeiklej.com/) presented a follow-up session on +privacy-preserving SCT auditing. It could be described as partial highlights of +a systematic literature study that considered 15 different proposals against the +following criteria: + +1. **Functionality**. Does it work and in what threat model? +2. **Privacy**. What information do which parties learn? +3. **Client-side performance**. Bandwidth, computation, and storage? +4. **Latency**. How much is added, if any? +5. **Server-side infrastructure costs**. What needs to be changed or added? +6. **Threat model**. Mainly in terms of which parties need to trust each other. +7. **Non-Google deployability**. Can it be deployed without Google-scale? +8. **Near-term deployability**. Can we roll it out sooner rather than later? + +A subtle message is that a proposal without a third-party CT auditor is +incomplete. I share this view because we cannot expect an end-user to take any +reasonable action if log misbehavior is suspected. Therefore, it is not just +proof fetching that needs to be private: also the process of reporting issues. + +## Acknowledgments +Thanks to everyone that contributed to CT days 2020, both in terms of +organization and putting in the actual work that the different sessions +presented. A detailed summary and follow-up discussion might appear on the + [CT policy list](https://groups.google.com/a/chromium.org/forum/#!forum/ct-policy). +Fredrik Strömberg provided valuable feedback on this story, which is sponsored +by my + [System Transparency](https://www.system-transparency.org/) +employment at Mullvad VPN. diff --git a/static/img/ctdev.png b/static/img/ctdev.png Binary files differnew file mode 100644 index 0000000..6153b40 --- /dev/null +++ b/static/img/ctdev.png |