# ct-sans

A tool that downloads certificates from [CT logs][] [recognized by Google
Chrome][], storing the encountered [Subject Alternative Names (SANs)][] to disk.
The dataset can be assembled so that it is de-duplicated with one SAN per line.

[CT logs]: https://certificate.transparency.dev/
[recognized by Google Chrome]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto/
[Subject Alternative Names (SANs)]: https://www.rfc-editor.org/rfc/rfc5280#section-4.2.1.6/

**Warning:** research prototype.

## Quickstart

### Install

You will need a [Go compiler][] and [GNU sort][] on the local system:

    $ which go >/dev/null || echo "Go compiler not PATH"
    $ which sort >/dev/null || echo "GNU sort not PATH"

[Go compiler]: https://go.dev/doc/install
[GNU sort]: https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

Install `ct-sans`:

    $ go install rgdd.se/ct-sans@latest
    $ which ct-sans >/dev/null || echo "ct-sans not in PATH"

List all options:

    $ ct-sans -h

### Snapshot

Download and verify the signature of Google's list of known logs,
then download and verify the signatures of the logs' tree heads:

    $ ct-sans snapshot -d $HOME/ct-sans-demo
    2023/03/23 12:43:49 cmd_snapshot.go:30: INFO: updating metadata file
    2023/03/23 12:43:49 cmd_snapshot.go:47: INFO: updating signed tree heads
    2023/03/23 12:43:49 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2023' log at tree size 862104911
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2024' log at tree size 55767940
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2023' log at tree size 990277299
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2024' log at tree size 66655425
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2023' Log at tree size 527018586
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2024' Log at tree size 34050592
    2023/03/23 12:43:51 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2024 Log at tree size 38426463
    2023/03/23 12:43:53 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2025 Log at tree size 697
    2023/03/23 12:43:54 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2023 Log at tree size 200387219
    2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2024 Log at tree size 40017666
    2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2025 Log at tree size 704
    2023/03/23 12:43:56 cmd_snapshot.go:82: INFO: bootstrapped Sectigo 'Sabre' CT log at tree size 229064032
    2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2023' log at tree size 467618545
    2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H1' log at tree size 34451205
    2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H2' log at tree size 14680
    2023/03/23 12:43:59 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2023 at tree size 388349
    2023/03/23 12:44:01 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2024-2 at tree size 112771

Subsequent uses of the `snapshot` command will update the signed list of known
logs, then update the logs' signed tree heads after verifying consistency.

### Collect

Download and verify the logs' Merkle trees up until the current snapshot:

    $ ct-sans collect -d $HOME/ct-sans-demo
    ...
    INFO: status update before shutdown
    
                Google 'Argon2023' log  |   162.5 entries/s  |  Estimated done in 1474.01 hours  |  Working on [11776, 862104911)
                Google 'Argon2024' log  |   157.5 entries/s  |  Estimated done in  98.31 hours  |  Working on [11584, 55767940)
                Google 'Xenon2023' log  |   472.6 entries/s  |  Estimated done in 582.01 hours  |  Working on [33888, 990277299)
                Google 'Xenon2024' log  |   458.5 entries/s  |  Estimated done in  40.37 hours  |  Working on [32896, 66655425)
           Cloudflare 'Nimbus2023' Log  |   276.1 entries/s  |  Estimated done in 530.24 hours  |  Working on [19328, 527018586)
           Cloudflare 'Nimbus2024' Log  |   301.2 entries/s  |  Estimated done in  31.39 hours  |  Working on [20736, 34050592)
                 DigiCert Yeti2024 Log  |   379.1 entries/s  |  Estimated done in  28.14 hours  |  Working on [27520, 38426463)
                 DigiCert Yeti2025 Log  |     0.0 entries/s  |  Estimated done in   0.00 hours  |  Working on [697, 697)
               DigiCert Nessie2023 Log  |   331.3 entries/s  |  Estimated done in 168.00 hours  |  Working on [23040, 200387219)
               DigiCert Nessie2024 Log  |   329.8 entries/s  |  Estimated done in  33.68 hours  |  Working on [21120, 40017666)
               DigiCert Nessie2025 Log  |     0.0 entries/s  |  Estimated done in   0.00 hours  |  Working on [704, 704)
                Sectigo 'Sabre' CT log  |   275.7 entries/s  |  Estimated done in 230.78 hours  |  Working on [19456, 229064032)
           Let's Encrypt 'Oak2023' log  |   462.8 entries/s  |  Estimated done in 280.67 hours  |  Working on [33664, 467618545)
         Let's Encrypt 'Oak2024H1' log  |   121.4 entries/s  |  Estimated done in  78.79 hours  |  Working on [5248, 34451205)
         Let's Encrypt 'Oak2024H2' log  |     0.0 entries/s  |  Estimated done in   0.00 hours  |  Working on [14680, 14680)
                    Trust Asia Log2023  |   215.8 entries/s  |  Estimated done in   0.48 hours  |  Working on [15872, 388349)
                  Trust Asia Log2024-2  |   246.2 entries/s  |  Estimated done in   0.11 hours  |  Working on [17664, 112771)

This will take a while depending on the local system, configuration of the
optional `collect` flags, as well as how heavily the logs apply rate-limits.  If
you are not bandwidth constrained, consider increasing the worker count (`-w`).

### Assemble

Once the collect phase is done, assemble the data set:

    $ echo "for demo-purposes, only Nessie2025 and Oak2024H2 are shown below"^C
    $ ct-sans assemble -d $HOME/ct-sans-demo
    2023/03/23 13:05:12 cmd_assemble.go:54: INFO: merging and de-duplicating 2 input files with GNU sort
    2023/03/23 13:05:12 cmd_assemble.go:67: INFO: created /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans/sans.lst (0.3 MiB)
    2023/03/23 13:05:12 cmd_assemble.go:69: INFO: adding notice file
    2023/03/23 13:05:12 cmd_assemble.go:87: INFO: adding README
    2023/03/23 13:05:12 cmd_assemble.go:96: INFO: adding signed metadata file
    2023/03/23 13:05:12 cmd_assemble.go:108: INFO: adding signed tree heads
    2023/03/23 13:05:12 cmd_assemble.go:117: INFO: uncompressed dataset available in /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans
    $ cat $HOME/ct-sans-demo/archive/2023-03-23-ct-sans/README.md
    # ct-sans dataset
    
    Dataset assembled at Thu Mar 23 13:05:12 CET 2023.  Contents:
    
      - README.md
      - metadata.json
      - metadata.sig
      - sths.json
      - notice.txt
      - sans.lst
    
    The signed [metadata file][] and tree heads were downloaded at
    Thu Mar 23 12:43:49 CET 2023.
    
    [metadata file]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto
    
    In total, 15377 certificates were downloaded from 2 CT logs;
    0 certificates contained SANs that could not be parsed.
    For more information about these errors, see notice.txt.
    
    The SANs data set is sorted and de-duplicated, one SAN per line.

### Good to know

  - It is safe to ctrl+C while collecting.  Just wait for the `collect` command
    to exit on its own so that things are persisted to disk.
  - The different `ct-sans` commands must not run at the same time.
  - The dataset can be updated by running the same `snapshot`, `collect` and
    `assemble` commands again.

## Running a measurement

See how we collected the 2023-04-03-ct-sans dataset in
[docs/operations.md](./docs/operations.md).

## Contact

  - IRC: room `#certificate-transparency` at [OFTC.net][]
  - Matrix: room `#certificate-transparency][]` at [matrix.org][]
  - Email: rasmus (at) rgdd (dot) se

[OFTC.net]: https://www.oftc.net/
[matrix.org]: https://matrix.to/#/#certificate-transparency:matrix.org

## Licence

BSD 2-Clause License