# ct-sans A tool that downloads certificates from [CT logs][] [recognized by Google Chrome][], storing the encountered [Subject Alternative Names (SANs)][] to disk. The dataset can be assembled so that it is de-duplicated with one SAN per line. [CT logs]: https://certificate.transparency.dev/ [recognized by Google Chrome]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto/ [Subject Alternative Names (SANs)]: https://www.rfc-editor.org/rfc/rfc5280#section-4.2.1.6/ **Warning:** research prototype. The source code may also be moved. ## Quick start ### Install You will need a [Go compiler][] and [GNU sort][] on the local system: $ which go >/dev/null || echo "Go compiler not PATH" $ which sort >/dev/null || echo "GNU sort not PATH" [Go compiler]: https://go.dev/doc/install [GNU sort]: https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html Install `ct-sans`: $ go install git.cs.kau.se/rasmoste/ct-sans@latest $ which ct-sans >/dev/null || echo "ct-sans not in PATH" ### Snapshot Download and verify the signature of Google's list of known logs, then download and verify the signatures of the logs' tree heads: $ ct-sans snapshot -d $HOME/ct-sans-demo 2023/03/23 12:43:49 cmd_snapshot.go:30: INFO: updating metadata file 2023/03/23 12:43:49 cmd_snapshot.go:47: INFO: updating signed tree heads 2023/03/23 12:43:49 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2023' log at tree size 862104911 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2024' log at tree size 55767940 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2023' log at tree size 990277299 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2024' log at tree size 66655425 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2023' Log at tree size 527018586 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2024' Log at tree size 34050592 2023/03/23 12:43:51 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2024 Log at tree size 38426463 2023/03/23 12:43:53 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2025 Log at tree size 697 2023/03/23 12:43:54 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2023 Log at tree size 200387219 2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2024 Log at tree size 40017666 2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2025 Log at tree size 704 2023/03/23 12:43:56 cmd_snapshot.go:82: INFO: bootstrapped Sectigo 'Sabre' CT log at tree size 229064032 2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2023' log at tree size 467618545 2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H1' log at tree size 34451205 2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H2' log at tree size 14680 2023/03/23 12:43:59 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2023 at tree size 388349 2023/03/23 12:44:01 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2024-2 at tree size 112771 Subsequent uses of the `snapshot` command will update the signed list of known logs, then update the logs' signed tree heads after verifying consistency. ### Collect Download and verify the logs' Merkle trees up until the current snapshot: $ ct-sans collect -d $HOME/ct-sans-demo ... INFO: status update before shutdown Google 'Argon2023' log | 162.5 entries/s | Estimated done in 1474.01 hours | Working on [11776, 862104911) Google 'Argon2024' log | 157.5 entries/s | Estimated done in 98.31 hours | Working on [11584, 55767940) Google 'Xenon2023' log | 472.6 entries/s | Estimated done in 582.01 hours | Working on [33888, 990277299) Google 'Xenon2024' log | 458.5 entries/s | Estimated done in 40.37 hours | Working on [32896, 66655425) Cloudflare 'Nimbus2023' Log | 276.1 entries/s | Estimated done in 530.24 hours | Working on [19328, 527018586) Cloudflare 'Nimbus2024' Log | 301.2 entries/s | Estimated done in 31.39 hours | Working on [20736, 34050592) DigiCert Yeti2024 Log | 379.1 entries/s | Estimated done in 28.14 hours | Working on [27520, 38426463) DigiCert Yeti2025 Log | 0.0 entries/s | Estimated done in 0.00 hours | Working on [697, 697) DigiCert Nessie2023 Log | 331.3 entries/s | Estimated done in 168.00 hours | Working on [23040, 200387219) DigiCert Nessie2024 Log | 329.8 entries/s | Estimated done in 33.68 hours | Working on [21120, 40017666) DigiCert Nessie2025 Log | 0.0 entries/s | Estimated done in 0.00 hours | Working on [704, 704) Sectigo 'Sabre' CT log | 275.7 entries/s | Estimated done in 230.78 hours | Working on [19456, 229064032) Let's Encrypt 'Oak2023' log | 462.8 entries/s | Estimated done in 280.67 hours | Working on [33664, 467618545) Let's Encrypt 'Oak2024H1' log | 121.4 entries/s | Estimated done in 78.79 hours | Working on [5248, 34451205) Let's Encrypt 'Oak2024H2' log | 0.0 entries/s | Estimated done in 0.00 hours | Working on [14680, 14680) Trust Asia Log2023 | 215.8 entries/s | Estimated done in 0.48 hours | Working on [15872, 388349) Trust Asia Log2024-2 | 246.2 entries/s | Estimated done in 0.11 hours | Working on [17664, 112771) This will take a while depending on the local system, configuration of the optional `collect` flags, as well as how heavily the logs apply rate-limits. For reference, we [downloaded the logs](./docs/operations.md) from scratch in less than 11 days using a single-IP machine that respects the logs' rate-limits. ### Assemble Once the collect phase is done, assemble the data set: $ echo "for demo-purposes, only Nessie2025 and Oak2024H2 are shown below"^C $ ct-sans assemble -d $HOME/ct-sans-demo 2023/03/23 13:05:12 cmd_assemble.go:54: INFO: merging and de-duplicating 2 input files with GNU sort 2023/03/23 13:05:12 cmd_assemble.go:67: INFO: created /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans/sans.lst (0.3 MiB) 2023/03/23 13:05:12 cmd_assemble.go:69: INFO: adding notice file 2023/03/23 13:05:12 cmd_assemble.go:87: INFO: adding README 2023/03/23 13:05:12 cmd_assemble.go:96: INFO: adding signed metadata file 2023/03/23 13:05:12 cmd_assemble.go:108: INFO: adding signed tree heads 2023/03/23 13:05:12 cmd_assemble.go:117: INFO: uncompressed dataset available in /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans $ cat $HOME/ct-sans-demo/archive/2023-03-23-ct-sans/README.md # ct-sans dataset Dataset assembled at Thu Mar 23 13:05:12 CET 2023. Contents: - README.md - metadata.json - metadata.sig - sths.json - notice.txt - sans.lst The signed [metadata file][] and tree heads were downloaded at Thu Mar 23 12:43:49 CET 2023. [metadata file]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto In total, 15377 certificates were downloaded from 2 CT logs; 0 certificates contained SANs that could not be parsed. For more information about these errors, see notice.txt. The SANs data set is sorted and de-duplicated, one SAN per line. ### Good to know - It is safe to ctrl+C while collecting. Just wait for the `collect` command to exit on its own so that things are persisted to disk. - The different `ct-sans` commands must not run at the same time. - The dataset can be updated by running the same `snapshot`, `collect` and `assemble` commands again. ## Running a measurement See how we collected the 2023-04-03-ct-sans dataset in [docs/operations.md](./docs/operations.md). ## Contact - IRC: room #certificate-transparency at [OFTC.net][] - Matrix: room [#certificate-transparency][] (bridged with IRC) - Email: rasmus (at) rgdd (dot) se [OFTC.net]: https://www.oftc.net/ [#certificate-transparency]: https://app.element.io/#/room/#sauteed-onions:matrix.org/ ## Licence BSD 2-Clause License