Operations
This document describes our ct-sans data collection, including information about the local system and a timeline leading up to assembling the 2023-04-03 dataset.
Summary
The initial download time for the current CT logs was 11 days (March 2023). The time to assemble the final dataset of 0.91B unique SANs (25.2GiB) was 6 hours.
The assembled data set is available here:
- https://dart.cse.kau.se/ct-sans/2023-04-03-ct-sans.zip
Local system
We're running Ubuntu in a VM:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
Our VM is configured with 62.9GiB RAM, one CPU core with 32 CPU threads, and a ~2TiB SSD:
$ grep MemTotal /proc/meminfo
processor /proc/cpuinfoemTotal: 65948412 keand
$ grep -c processor /proc/cpuinfo
32
$ grep 'cpu cores' /proc/cpuinfo | uniq
cpu cores : 1
$ df -BG /home
Filesystem 1G-blocks Used Available Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 2077G 220G 1772G 12% /
This VM shares a 1x10Gbps link with other network VMs that we have no control
over. We installed vnstat
to track our own bandwidth-usage over time:
# apt install vnstat
# systemctl enable vnstat.service
# systemctl start vnstat.service
We also installed Go version 1.20, see install instructions:
$ go version
go version go1.20.2 linux/amd64
The versions of git.cs.kau.se/rasmoste/ct-sans@VERSION
are listed below.
Timeline
date | time (UTC) | event | notes |
---|---|---|---|
2023/03/18 | 20:05:30 | snapshot and start collect | running v0.0.1, see command notes [1] |
2023/03/27 | 14:53:59 | stop collect, bump version | install v0.0.2, see migrate notes [2] |
2023/03/27 | 15:03:12 | start collect again | mainly waiting for Argon2023 now [3] |
2023/03/29 | 10:22:24 | collect completed | |
2023/03/29 | 15:46:44 | snapshot and collect again | download backlog from last 10 days |
2023/03/30 | 05:52:38 | collect completed | |
2023/03/30 | 08:58:50 | snapshot and collect again | download backlog from last ~16 hours |
2023/03/30 | 09:53:34 | collect completed | bandwidth usage statistics [4] |
2023/03/30 | 10:05:40 | start assemble | still running v0.0.2 [5] |
2023/03/30 | 16:06:39 | assemble done | 0.9B sans (25GiB, 7GiB zipped in 15m) |
2023/04/02 | 23:31:37 | snapshot and collect again | download backlog, again |
2023/04/03 | 03:54:18 | collect completed | |
2023/04/03 | 08:52:28 | snapshot and collect again | final before assembling for real use |
2023/04/03 | 09:22:22 | collect completed | |
2023/04/03 | 09:30:00 | start assemble | [5] |
2023/04/03 | 16:12:38 | assemble done | 0.91B SANs (25.2GiB) from 3.74B certs |
2024/02/10 | 09:10:20 | snapshot and start collect | still running v0.0.2 [6] |
2024/02/12 | 03:54:13 | abort collection | not needed for our paper contribs [7] |
Notes
1
$ ct-sans snapshot >snapshot.stdout
$ ct-sans collect --workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m >collect.stdout 2>collect.stderr
2
In addition to adding the assemble command, v0.0.2
stores notice.txt files in
each log's directory automatically. This ensures that the output in stdout can
be discarded as opposed to being stored and managed manually in the long run
(e.g., grep for NOTICE prints when assembling data sets).
Commit ad9fb49670e28414637761bac4b8e8940e2d6770
includes a Go program that
transforms an existing collect.stderr
file to notice.txt
files.
Steps to migrate:
- [x] Stop (ctrl+c, wait)
- [x] Move collect.{stdout,stderr} to data/notes/
- [x]
grep NOTICE data/notes/collect.stdout | wc -l
gives 6919 lines - [x] run the program in the above commit with the appropriate
directory
andnoticeFile
paths. See output below. - [x] `wc -l $(find . -name notice.txt) -> total says 6919 lines
- [x] go install git.cs.kau.se/rasmoste/ct-sans@latest, downloaded v0.0.2
- [x] run the same collect command as in note (1); this will not overwrite the previous collect files because they have been moved to data/notes/. In the future we will not need to store any of this, but doing it now just in case something goes wrong.
- [x] The only two logs that had entries left to download resumed
Output from migrate program and santity check:
$ go run .
2023/03/27 14:57:41 Google 'Argon2023' log: 608 notices
2023/03/27 14:57:41 Google 'Argon2024' log: 101 notices
2023/03/27 14:57:41 Google 'Xenon2023' log: 2119 notices
2023/03/27 14:57:41 Google 'Xenon2024' log: 170 notices
2023/03/27 14:57:41 Cloudflare 'Nimbus2023' Log: 2194 notices
2023/03/27 14:57:41 Cloudflare 'Nimbus2024' Log: 164 notices
2023/03/27 14:57:41 DigiCert Yeti2024 Log: 17 notices
2023/03/27 14:57:41 DigiCert Yeti2025 Log: no notices
2023/03/27 14:57:41 DigiCert Nessie2023 Log: 155 notices
2023/03/27 14:57:41 DigiCert Nessie2024 Log: 19 notices
2023/03/27 14:57:41 DigiCert Nessie2025 Log: no notices
2023/03/27 14:57:41 Sectigo 'Sabre' CT log: 1140 notices
2023/03/27 14:57:41 Let's Encrypt 'Oak2023' log: 156 notices
2023/03/27 14:57:41 Let's Encrypt 'Oak2024H1' log: 14 notices
2023/03/27 14:57:41 Let's Encrypt 'Oak2024H2' log: no notices
2023/03/27 14:57:41 Trust Asia Log2023: 62 notices
2023/03/27 14:57:41 Trust Asia Log2024-2: no notices
$ wc -l $(find . -name notice.txt)
101 ./data/logs/eecdd064d5db1acec55cb79db4cd13a23287467cbcecdec351485946711fb59b/notice.txt
14 ./data/logs/3b5377753e2db9804e8b305b06fe403b67d84fc3f4c7bd000d2d726fe1fad417/notice.txt
155 ./data/logs/b3737707e18450f86386d605a9dc11094a792db1670c0b87dcf0030e7936a59a/notice.txt
62 ./data/logs/e87ea7660bc26cf6002ef5725d3fe0e331b9393bb92fbf58eb3b9049daf5435a/notice.txt
164 ./data/logs/dab6bf6b3fb5b6229f9bc2bb5c6be87091716cbb51848534bda43d3048d7fbab/notice.txt
608 ./data/logs/e83ed0da3ef5063532e75728bc896bc903d3cbd1116beceb69e1777d6d06bd6e/notice.txt
156 ./data/logs/b73efb24df9c4dba75f239c5ba58f46c5dfc42cf7a9f35c49e1d098125edb499/notice.txt
1140 ./data/logs/5581d4c2169036014aea0b9b573c53f0c0e43878702508172fa3aa1d0713d30c/notice.txt 19 ./data/logs/73d99e891b4c9678a0207d479de6b2c61cd0515e71192a8c6b80107ac17772b5/notice.txt 170 ./data/logs/76ff883f0ab6fb9551c261ccf587ba34b4a4cdbb29dc68420a9fe6674c5a3a74/notice.txt 2119 ./data/logs/adf7befa7cff10c88b9d3d9c1e3e186ab467295dcfb10c24ca858634ebdc828a/notice.txt 2194 ./data/logs/7a328c54d8b72db620ea38e0521ee98416703213854d3bd22bc13a57a352eb52/notice.txt 17 ./data/logs/48b0e36bdaa647340fe56a02fa9d30eb1c5201cb56dd2c81d9bbbfab39d88473/notice.txt 6919 total
3
For some reason Nimbus2023 is stuck at
{"tree_size":512926523,"RootHash":[41,19,83,107,69,253,233,106,68,143,173,151,177,196,60,228,22,57,246,105,184,51,24,50,230,153,233,189,214,93,132,186]}
while trying to fetch until
{"sth_version":0,"tree_size":513025681,"timestamp":1679169572616,"sha256_root_hash":"0SzzS0M2RP5BHC6M9bvOPySYJadPi9nnk2Dsav4NKKs=","tree_head_signature":"BAMARjBEAiBXrmT+W2Ct+32DX/XL+YwS9Ut4rnOG6Y+A4Lxbf/6TogIgYEM32vweDC0QStwMq1PzIvm97cQhj6bUSdZWq/wMkNw=","log_id":"ejKMVNi3LbYg6jjgUh7phBZwMhOFTTvSK8E6V6NS61I="}
These tree heads are not inconsistent, and a restart should resolve the problem.
(There is likely a corner-case somewhere that made the fetcher exit or halt. We should debug this further at some point; but have not happened more than once.)
4
Quick overview:
$ vnstat -d
ens160 / daily
day rx | tx | total | avg. rate
------------------------+-------------+-------------+---------------
2023-03-18 1.49 TiB | 17.07 GiB | 1.51 TiB | 153.44 Mbit/s
2023-03-19 3.77 TiB | 41.21 GiB | 3.81 TiB | 387.83 Mbit/s
2023-03-20 3.09 TiB | 36.67 GiB | 3.13 TiB | 318.26 Mbit/s
2023-03-21 3.11 TiB | 32.24 GiB | 3.14 TiB | 319.61 Mbit/s
2023-03-22 2.08 TiB | 25.98 GiB | 2.10 TiB | 213.89 Mbit/s
2023-03-23 1.16 TiB | 15.59 GiB | 1.18 TiB | 119.97 Mbit/s
2023-03-24 1.17 TiB | 15.44 GiB | 1.18 TiB | 120.44 Mbit/s
2023-03-25 1.18 TiB | 15.72 GiB | 1.19 TiB | 121.55 Mbit/s
2023-03-26 707.47 GiB | 9.64 GiB | 717.11 GiB | 71.30 Mbit/s
2023-03-27 448.80 GiB | 6.43 GiB | 455.23 GiB | 45.26 Mbit/s
2023-03-28 451.49 GiB | 6.49 GiB | 457.98 GiB | 45.53 Mbit/s
2023-03-29 1.01 TiB | 12.73 GiB | 1.03 TiB | 104.45 Mbit/s
2023-03-30 256.75 GiB | 3.40 GiB | 260.15 GiB | 59.59 Mbit/s
------------------------+-------------+-------------+---------------
estimated 591.55 GiB | 7.84 GiB | 599.39 GiB |
5
Use at most 58GiB RAM for sorting, 8 parallel sort workers. More than this does
not improve performance according to the GNU sort manual. We're also
setting the LC_ALL=C
variable to ensure consistent sort order (see man).
$ export LC_ALL=C
$ ct-sans assemble -b 58 -p 8 >assemble.stdout
(We don't need to change the default directories, because the collected data is stored in ./data and /tmp is a fine place to put things on our system.)
6
There are 0.91B unique SANs in the 25.2GiB dataset (6.1GiB compressed):
$ du -shb data/archive/2023-04-03-ct-sans
27050799992 data/archive/2023-04-03-ct-sans
$ python3 -c "print(f'{27050799992 / 1024**3:.1f}GiB')"
25.2GiB
$ du -shb data/archive/2023-04-03-ct-sans.zip
6526876407 data/archive/2023-04-03-ct-sans.zip
$ python3 -c "print(f'{6526876407 / 1024**3:.1f}GiB')"
6.1GiB
$ wc -l data/archive/2023-04-03-ct-sans/sans.lst
907332515 data/archive/2023-04-03-ct-sans/sans.lst
$ python3 -c "print(f'{907332515 / 1000**3:.2f}B')"
0.91B
These SANs were found in 3.74B certificates from 17 CT logs:
$ grep "In total," data/archive/2023-04-03-ct-sans/README.md
In total, 3743244652 certificates were downloaded from 17 CT logs;
$ python3 -c "print(f'{3743244652 / 1000**3:.2f}B')"
3.74B
6
$ ct-sans snapshot >snapshot.stdout
$ ct-sans collect --workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m >collect.stdout 2>collect.stderr
7
We decided to abort another round of ct-sans (and following onion-grab) measurements because it is not strictly needed to achieve our goals. If we want to make more measurements for the sake of making the ct-sans data set available, we should automate it rather than doing it manually as in this timeline.