aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 830c024b90eeb2fffba5a6d608f98b2d3dc28935 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# ct-sans

A tool that downloads certificates from [CT logs][] [recognized by Google
Chrome][], storing the encountered [Subject Alternative Names (SANs)][] to disk.
The final data set `sans.lst` is de-duplicated and contains one SAN per line.

[CT logs]: https://certificate.transparency.dev/
[recognized by Google Chrome]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto/
[Subject Alternative Names (SANs)]: https://www.rfc-editor.org/rfc/rfc5280#section-4.2.1.6/

**Warning:** research prototype.  The source code may also be moved.

## Quick start

You will need a Go compiler and GNU sort on the local system:

    $ which go || echo "Go compiler is not in $PATH"
    $ which sort || echo "GNU sort is not in $PATH"

Install `ct-sans`:

    $ go install git.cs.kau.se/rasmoste/ct-sans@latest
    $ which ct-sans || echo "ct-sans is not in $PATH"

Download and verify the signature of Google's list of known logs,
then download and verify the signatures of the logs' tree heads:

    $ ct-sans snapshot -d $HOME/ct-sans-demo
    2023/03/23 12:43:49 cmd_snapshot.go:30: INFO: updating metadata file
    2023/03/23 12:43:49 cmd_snapshot.go:47: INFO: updating signed tree heads
    2023/03/23 12:43:49 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2023' log at tree size 862104911
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Argon2024' log at tree size 55767940
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2023' log at tree size 990277299
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Google 'Xenon2024' log at tree size 66655425
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2023' Log at tree size 527018586
    2023/03/23 12:43:50 cmd_snapshot.go:82: INFO: bootstrapped Cloudflare 'Nimbus2024' Log at tree size 34050592
    2023/03/23 12:43:51 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2024 Log at tree size 38426463
    2023/03/23 12:43:53 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Yeti2025 Log at tree size 697
    2023/03/23 12:43:54 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2023 Log at tree size 200387219
    2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2024 Log at tree size 40017666
    2023/03/23 12:43:55 cmd_snapshot.go:82: INFO: bootstrapped DigiCert Nessie2025 Log at tree size 704
    2023/03/23 12:43:56 cmd_snapshot.go:82: INFO: bootstrapped Sectigo 'Sabre' CT log at tree size 229064032
    2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2023' log at tree size 467618545
    2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H1' log at tree size 34451205
    2023/03/23 12:43:57 cmd_snapshot.go:82: INFO: bootstrapped Let's Encrypt 'Oak2024H2' log at tree size 14680
    2023/03/23 12:43:59 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2023 at tree size 388349
    2023/03/23 12:44:01 cmd_snapshot.go:82: INFO: bootstrapped Trust Asia Log2024-2 at tree size 112771

Subsequent uses of the `snapshot` command will update the signed list of known
logs, then update the logs' signed tree heads after verifying consistency.

Download and verify the logs' Merkle trees up until the current snapshot:

    $ ct-sans collect -d $HOME/ct-sans-demo
    ...
    INFO: status update before shutdown
    
                Google 'Argon2023' log  |   162.5 entries/s  |  Estimated done in 1474.01 hours  |  Working on [11776, 862104911)
                Google 'Argon2024' log  |   157.5 entries/s  |  Estimated done in  98.31 hours  |  Working on [11584, 55767940)
                Google 'Xenon2023' log  |   472.6 entries/s  |  Estimated done in 582.01 hours  |  Working on [33888, 990277299)
                Google 'Xenon2024' log  |   458.5 entries/s  |  Estimated done in  40.37 hours  |  Working on [32896, 66655425)
           Cloudflare 'Nimbus2023' Log  |   276.1 entries/s  |  Estimated done in 530.24 hours  |  Working on [19328, 527018586)
           Cloudflare 'Nimbus2024' Log  |   301.2 entries/s  |  Estimated done in  31.39 hours  |  Working on [20736, 34050592)
                 DigiCert Yeti2024 Log  |   379.1 entries/s  |  Estimated done in  28.14 hours  |  Working on [27520, 38426463)
                 DigiCert Yeti2025 Log  |     0.0 entries/s  |  Estimated done in   0.00 hours  |  Working on [697, 697)
               DigiCert Nessie2023 Log  |   331.3 entries/s  |  Estimated done in 168.00 hours  |  Working on [23040, 200387219)
               DigiCert Nessie2024 Log  |   329.8 entries/s  |  Estimated done in  33.68 hours  |  Working on [21120, 40017666)
               DigiCert Nessie2025 Log  |     0.0 entries/s  |  Estimated done in   0.00 hours  |  Working on [704, 704)
                Sectigo 'Sabre' CT log  |   275.7 entries/s  |  Estimated done in 230.78 hours  |  Working on [19456, 229064032)
           Let's Encrypt 'Oak2023' log  |   462.8 entries/s  |  Estimated done in 280.67 hours  |  Working on [33664, 467618545)
         Let's Encrypt 'Oak2024H1' log  |   121.4 entries/s  |  Estimated done in  78.79 hours  |  Working on [5248, 34451205)
         Let's Encrypt 'Oak2024H2' log  |     0.0 entries/s  |  Estimated done in   0.00 hours  |  Working on [14680, 14680)
                    Trust Asia Log2023  |   215.8 entries/s  |  Estimated done in   0.48 hours  |  Working on [15872, 388349)
                  Trust Asia Log2024-2  |   246.2 entries/s  |  Estimated done in   0.11 hours  |  Working on [17664, 112771)

This will take a while depending on the local system, configuration of the
optional `collect` flags, as well as how heavily the logs apply rate-limits.
For good performance while respecting rate-limits, you may want to try
`--workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m`.  This allowed
us to download the logs (March 2023) in approximately 10 days.  Our machine was
located in EU with 2TiB SSD, 64GiB memory, 16 CPU cores, and 1Gbps line-speed.

Of note is that it is safe to ctrl+C while collecting.  Just wait for the
`collect` command to exit on its own so that things are persisted to disk.

Once the collect phase is done, assemble the data set:

    $ echo "for demo-purposes, only Nessie2025 and Oak2024H2 are shown below"^C
    $ ct-sans assemble -d $HOME/ct-sans-demo
    2023/03/23 13:05:12 cmd_assemble.go:54: INFO: merging and de-duplicating 2 input files with GNU sort
    2023/03/23 13:05:12 cmd_assemble.go:67: INFO: created /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans/sans.lst (0.3 MiB)
    2023/03/23 13:05:12 cmd_assemble.go:69: INFO: adding notice file
    2023/03/23 13:05:12 cmd_assemble.go:87: INFO: adding README
    2023/03/23 13:05:12 cmd_assemble.go:96: INFO: adding signed metadata file
    2023/03/23 13:05:12 cmd_assemble.go:108: INFO: adding signed tree heads
    2023/03/23 13:05:12 cmd_assemble.go:117: INFO: uncompressed dataset available in /home/rgdd/ct-sans-demo/archive/2023-03-23-ct-sans
    $ cat $HOME/ct-sans-demo/archive/2023-03-23-ct-sans/README.md
    # ct-sans dataset
    
    Dataset assembled at Thu Mar 23 13:05:12 CET 2023.  Contents:
    
      - README.md
      - metadata.json
      - metadata.sig
      - sths.json
      - notice.txt
      - sans.lst
    
    The signed [metadata file][] and tree heads were downloaded at
    Thu Mar 23 12:43:49 CET 2023.
    
    [metadata file]: https://groups.google.com/a/chromium.org/g/ct-policy/c/IdbrdAcDQto
    
    In total, 15377 certificates were downloaded from 2 CT logs;
    0 certificates contained SANs that could not be parsed.
    For more information about these errors, see notice.txt.
    
    The SANs data set is sorted and de-duplicated, one SAN per line.

**Note:** the different `ct-sans` commands must not run at the same time.

## Updating the data set

Simply run the same `snapshot`, `collect`, and `assemble` commands again.

## Contact

  - IRC: room #certificate-transparency at [OFTC.net][]
  - Matrix: room [#certificate-transparency][] (bridged with IRC)
  - Email: rasmus (at) rgdd (dot) se

[OFTC.net]: https://www.oftc.net/
[#certificate-transparency]: https://app.element.io/#/room/#sauteed-onions:matrix.org/

## Licence

BSD 2-Clause License