aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md41
1 files changed, 26 insertions, 15 deletions
diff --git a/README.md b/README.md
index 830c024..1017c02 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,9 @@
A tool that downloads certificates from [CT logs][] [recognized by Google
Chrome][], storing the encountered [Subject Alternative Names (SANs)][] to disk.
-The final data set `sans.lst` is de-duplicated and contains one SAN per line.
+The dataset can be assembled so that it is de-duplicated with one SAN per line.
+
+**Availability:** [2023-04-03-ct-sans dataset](./docs/operations.md)
[CT logs]: https://certificate.transparency.dev/
[recognized by Google Chrome]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto/
@@ -12,15 +14,23 @@ The final data set `sans.lst` is de-duplicated and contains one SAN per line.
## Quick start
-You will need a Go compiler and GNU sort on the local system:
- $ which go || echo "Go compiler is not in $PATH"
- $ which sort || echo "GNU sort is not in $PATH"
+### Install
+
+You will need a [Go compiler][] and [GNU sort][] on the local system:
+
+ $ which go >/dev/null || echo "Go compiler not PATH"
+ $ which sort >/dev/null || echo "GNU sort not PATH"
+
+[Go compiler]: https://go.dev/doc/install
+[GNU sort]: https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html
Install `ct-sans`:
$ go install git.cs.kau.se/rasmoste/ct-sans@latest
- $ which ct-sans || echo "ct-sans is not in $PATH"
+ $ which ct-sans >/dev/null || echo "ct-sans not in PATH"
+
+### Snapshot
Download and verify the signature of Google's list of known logs,
then download and verify the signatures of the logs' tree heads:
@@ -49,6 +59,8 @@ then download and verify the signatures of the logs' tree heads:
Subsequent uses of the `snapshot` command will update the signed list of known
logs, then update the logs' signed tree heads after verifying consistency.
+### Collect
+
Download and verify the logs' Merkle trees up until the current snapshot:
$ ct-sans collect -d $HOME/ct-sans-demo
@@ -75,13 +87,10 @@ Download and verify the logs' Merkle trees up until the current snapshot:
This will take a while depending on the local system, configuration of the
optional `collect` flags, as well as how heavily the logs apply rate-limits.
-For good performance while respecting rate-limits, you may want to try
-`--workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m`. This allowed
-us to download the logs (March 2023) in approximately 10 days. Our machine was
-located in EU with 2TiB SSD, 64GiB memory, 16 CPU cores, and 1Gbps line-speed.
+For reference, we [downloaded the logs](./docs/operations.md) from scratch in
+less than 11 days using a single-IP machine that respects the logs' rate-limits.
-Of note is that it is safe to ctrl+C while collecting. Just wait for the
-`collect` command to exit on its own so that things are persisted to disk.
+### Assemble
Once the collect phase is done, assemble the data set:
@@ -117,11 +126,13 @@ Once the collect phase is done, assemble the data set:
The SANs data set is sorted and de-duplicated, one SAN per line.
-**Note:** the different `ct-sans` commands must not run at the same time.
-
-## Updating the data set
+### Good to know
-Simply run the same `snapshot`, `collect`, and `assemble` commands again.
+ - It is safe to ctrl+C while collecting. Just wait for the `collect` command
+ to exit on its own so that things are persisted to disk.
+ - The different `ct-sans` commands must not run at the same time.
+ - The dataset can be updated by running the same `snapshot`, `collect` and
+ `assemble` commands again.
## Contact