From e4d01585d9802a256d754072bdce2b855ae7d354 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Fri, 7 Apr 2023 18:00:35 +0200 Subject: Add operations timeline --- README.md | 41 ++++++++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 15 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 830c024..1017c02 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,9 @@ A tool that downloads certificates from [CT logs][] [recognized by Google Chrome][], storing the encountered [Subject Alternative Names (SANs)][] to disk. -The final data set `sans.lst` is de-duplicated and contains one SAN per line. +The dataset can be assembled so that it is de-duplicated with one SAN per line. + +**Availability:** [2023-04-03-ct-sans dataset](./docs/operations.md) [CT logs]: https://certificate.transparency.dev/ [recognized by Google Chrome]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto/ @@ -12,15 +14,23 @@ The final data set `sans.lst` is de-duplicated and contains one SAN per line. ## Quick start -You will need a Go compiler and GNU sort on the local system: - $ which go || echo "Go compiler is not in $PATH" - $ which sort || echo "GNU sort is not in $PATH" +### Install + +You will need a [Go compiler][] and [GNU sort][] on the local system: + + $ which go >/dev/null || echo "Go compiler not PATH" + $ which sort >/dev/null || echo "GNU sort not PATH" + +[Go compiler]: https://go.dev/doc/install +[GNU sort]: https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html Install `ct-sans`: $ go install git.cs.kau.se/rasmoste/ct-sans@latest - $ which ct-sans || echo "ct-sans is not in $PATH" + $ which ct-sans >/dev/null || echo "ct-sans not in PATH" + +### Snapshot Download and verify the signature of Google's list of known logs, then download and verify the signatures of the logs' tree heads: @@ -49,6 +59,8 @@ then download and verify the signatures of the logs' tree heads: Subsequent uses of the `snapshot` command will update the signed list of known logs, then update the logs' signed tree heads after verifying consistency. +### Collect + Download and verify the logs' Merkle trees up until the current snapshot: $ ct-sans collect -d $HOME/ct-sans-demo @@ -75,13 +87,10 @@ Download and verify the logs' Merkle trees up until the current snapshot: This will take a while depending on the local system, configuration of the optional `collect` flags, as well as how heavily the logs apply rate-limits. -For good performance while respecting rate-limits, you may want to try -`--workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m`. This allowed -us to download the logs (March 2023) in approximately 10 days. Our machine was -located in EU with 2TiB SSD, 64GiB memory, 16 CPU cores, and 1Gbps line-speed. +For reference, we [downloaded the logs](./docs/operations.md) from scratch in +less than 11 days using a single-IP machine that respects the logs' rate-limits. -Of note is that it is safe to ctrl+C while collecting. Just wait for the -`collect` command to exit on its own so that things are persisted to disk. +### Assemble Once the collect phase is done, assemble the data set: @@ -117,11 +126,13 @@ Once the collect phase is done, assemble the data set: The SANs data set is sorted and de-duplicated, one SAN per line. -**Note:** the different `ct-sans` commands must not run at the same time. - -## Updating the data set +### Good to know -Simply run the same `snapshot`, `collect`, and `assemble` commands again. + - It is safe to ctrl+C while collecting. Just wait for the `collect` command + to exit on its own so that things are persisted to disk. + - The different `ct-sans` commands must not run at the same time. + - The dataset can be updated by running the same `snapshot`, `collect` and + `assemble` commands again. ## Contact -- cgit v1.2.3