# ct-sans A tool that downloads certificates from [CT logs][] [recognized by Google Chrome][], storing the encountered [Subject Alternative Names (SANs)][] to disk. The final data set `sans.lst` is de-duplicated and contains one SAN per line. [CT logs]: https://certificate.transparency.dev/ [recognized by Google Chrome]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto/ [Subject Alternative Names (SANs)]: https://www.rfc-editor.org/rfc/rfc5280#section-4.2.1.6/ **Warning:** research prototype. The source code may also be moved. ## Quick start You will need a Go compiler and GNU sort on the local system: $ which go || echo "Go compiler is not in $PATH" $ which sort || echo "GNU sort is not in $PATH" Install `ct-sans`: $ go install git.cs.kau.se/rasmoste/ct-sans@latest $ which ct-sans || echo "ct-sans is not in $PATH" Download and verify the signature of Google's list of known logs, then download and verify the signatures of the logs' tree heads: $ ct-sans snapshot -d $HOME/ct-sans-demo 2023/03/23 12:43:49 cmd_snapshot.go:30: INFO: updating metadata file 2023/03/23 12:43:49 cmd_snapshot.go:47: INFO: updating signed tree heads 2023/03/23 12:43:49 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2023' log at tree size 862104911 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2024' log at tree size 55767940 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2023' log at tree size 990277299 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2024' log at tree size 66655425 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2023' Log at tree size 527018586 2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2024' Log at tree size 34050592 2023/03/23 12:43:51 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2024 Log at tree size 38426463 2023/03/23 12:43:53 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2025 Log at tree size 697 2023/03/23 12:43:54 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2023 Log at tree size 200387219 2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2024 Log at tree size 40017666 2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2025 Log at tree size 704 2023/03/23 12:43:56 cmd_snapshot.go:82: INFO: bootstrapped Sectigo 'Sabre' CT log at tree size 229064032 2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2023' log at tree size 467618545 2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H1' log at tree size 34451205 2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H2' log at tree size 14680 2023/03/23 12:43:59 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2023 at tree size 388349 2023/03/23 12:44:01 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2024-2 at tree size 112771 Subsequent uses of the `snapshot` command will update the signed list of known logs, then update the logs' signed tree heads after verifying consistency. Download and verify the logs' Merkle trees up until the current snapshot: $ ct-sans collect -d $HOME/ct-sans-demo ... INFO: status update before shutdown Google 'Argon2023' log | 162.5 entries/s | Estimated done in 1474.01 hours | Working on [11776, 862104911) Google 'Argon2024' log | 157.5 entries/s | Estimated done in 98.31 hours | Working on [11584, 55767940) Google 'Xenon2023' log | 472.6 entries/s | Estimated done in 582.01 hours | Working on [33888, 990277299) Google 'Xenon2024' log | 458.5 entries/s | Estimated done in 40.37 hours | Working on [32896, 66655425) Cloudflare 'Nimbus2023' Log | 276.1 entries/s | Estimated done in 530.24 hours | Working on [19328, 527018586) Cloudflare 'Nimbus2024' Log | 301.2 entries/s | Estimated done in 31.39 hours | Working on [20736, 34050592) DigiCert Yeti2024 Log | 379.1 entries/s | Estimated done in 28.14 hours | Working on [27520, 38426463) DigiCert Yeti2025 Log | 0.0 entries/s | Estimated done in 0.00 hours | Working on [697, 697) DigiCert Nessie2023 Log | 331.3 entries/s | Estimated done in 168.00 hours | Working on [23040, 200387219) DigiCert Nessie2024 Log | 329.8 entries/s | Estimated done in 33.68 hours | Working on [21120, 40017666) DigiCert Nessie2025 Log | 0.0 entries/s | Estimated done in 0.00 hours | Working on [704, 704) Sectigo 'Sabre' CT log | 275.7 entries/s | Estimated done in 230.78 hours | Working on [19456, 229064032) Let's Encrypt 'Oak2023' log | 462.8 entries/s | Estimated done in 280.67 hours | Working on [33664, 467618545) Let's Encrypt 'Oak2024H1' log | 121.4 entries/s | Estimated done in 78.79 hours | Working on [5248, 34451205) Let's Encrypt 'Oak2024H2' log | 0.0 entries/s | Estimated done in 0.00 hours | Working on [14680, 14680) Trust Asia Log2023 | 215.8 entries/s | Estimated done in 0.48 hours | Working on [15872, 388349) Trust Asia Log2024-2 | 246.2 entries/s | Estimated done in 0.11 hours | Working on [17664, 112771) This will take a while depending on the local system, configuration of the optional `collect` flags, as well as how heavily the logs apply rate-limits. For good performance while respecting rate-limits, you may want to try `--workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m`. This allowed us to download the logs (March 2023) in approximately 10 days. Our machine was located in EU with 2TiB SSD, 64GiB memory, 16 CPU cores, and 1Gbps line-speed. Of note is that it is safe to ctrl+C while collecting. Just wait for the `collect` command to exit on its own so that things are persisted to disk. Once the collect phase is done, assemble the data set: $ echo "for demo-purposes, only Nessie2025 and Oak2024H2 are shown below"^C $ ct-sans assemble -d $HOME/ct-sans-demo 2023/03/23 13:05:12 cmd_assemble.go:54: INFO: merging and de-duplicating 2 input files with GNU sort 2023/03/23 13:05:12 cmd_assemble.go:67: INFO: created /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans/sans.lst (0.3 MiB) 2023/03/23 13:05:12 cmd_assemble.go:69: INFO: adding notice file 2023/03/23 13:05:12 cmd_assemble.go:87: INFO: adding README 2023/03/23 13:05:12 cmd_assemble.go:96: INFO: adding signed metadata file 2023/03/23 13:05:12 cmd_assemble.go:108: INFO: adding signed tree heads 2023/03/23 13:05:12 cmd_assemble.go:117: INFO: uncompressed dataset available in /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans $ cat $HOME/ct-sans-demo/archive/2023-03-23-ct-sans/README.md # ct-sans dataset Dataset assembled at Thu Mar 23 13:05:12 CET 2023. Contents: - README.md - metadata.json - metadata.sig - sths.json - notice.txt - sans.lst The signed [metadata file][] and tree heads were downloaded at Thu Mar 23 12:43:49 CET 2023. [metadata file]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto In total, 15377 certificates were downloaded from 2 CT logs; 0 certificates contained SANs that could not be parsed. For more information about these errors, see notice.txt. The SANs data set is sorted and de-duplicated, one SAN per line. **Note:** the different `ct-sans` commands must not run at the same time. ## Updating the data set Simply run the same `snapshot`, `collect`, and `assemble` commands again. ## Contact - IRC: room #certificate-transparency at [OFTC.net][] - Matrix: room [#certificate-transparency][] (bridged with IRC) - Email: rasmus (at) rgdd (dot) se [OFTC.net]: https://www.oftc.net/ [#certificate-transparency]: https://app.element.io/#/room/#sauteed-onions:matrix.org/ ## Licence BSD 2-Clause License