aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/operations.md218
1 files changed, 218 insertions, 0 deletions
diff --git a/docs/operations.md b/docs/operations.md
new file mode 100644
index 0000000..458ec13
--- /dev/null
+++ b/docs/operations.md
@@ -0,0 +1,218 @@
+# Operations
+
+This document describes our ct-sans data collection, including information about
+the local system and a timeline leading up to assembling the 2023-04-03 dataset.
+
+## Summary
+
+The initial download time for the current CT logs was 11 days (March 2023). The
+time to assemble the final dataset of 0.91B unique SANs (25.2GiB) was 6 hours.
+
+The assembled data set can be downloaded [here](TODO).
+
+## Local system
+
+We're running Ubuntu in a VM:
+
+ $ lsb_release -a
+ No LSB modules are available.
+ Distributor ID: Ubuntu
+ Description: Ubuntu 22.04.2 LTS
+ Release: 22.04
+ Codename: jammy
+
+Our VM is configured with 62.9GiB RAM, one CPU core with 32 CPU threads, and a
+~2TiB SSD:
+
+ $ grep MemTotal /proc/meminfo
+ processor /proc/cpuinfoemTotal: 65948412 keand
+ $ grep -c processor /proc/cpuinfo
+ 32
+ $ grep 'cpu cores' /proc/cpuinfo | uniq
+ cpu cores : 1
+ $ df -BG /home
+ Filesystem 1G-blocks Used Available Use% Mounted on
+ /dev/mapper/ubuntu--vg-ubuntu--lv 2077G 220G 1772G 12% /
+
+This VM shares a 1x10Gbps link with other network VMs that we have no control
+over. We installed `vnstat` to track our own bandwidth-usage over time:
+
+ # apt install vnstat
+ # systemctl enable vnstat.service
+ # systemctl start vnstat.service
+
+We also installed Go version 1.20, see [install instructions][]:
+
+ $ go version
+ go version go1.20.2 linux/amd64
+
+[install instructions]: https://go.dev/doc/install
+
+The versions of `git.cs.kau.se/rasmoste/ct-sans@VERSION` are listed below.
+
+## Timeline
+
+| date | time (UTC) | event | notes |
+| ---------- | ---------- | --------------------------- | ------------------------------------- |
+| 2023/03/18 | 20:05:30 | snapshot and start collect | running v0.0.1, see command notes [1] |
+| 2023/03/27 | 14:53:59 | stop collect, bump version | install v0.0.2, see migrate notes [2] |
+| 2023/03/27 | 15:03:12 | start collect again | mainly waiting for Argon2023 now [3] |
+| 2023/03/29 | 10:22:24 | collect completed | |
+| 2023/03/29 | 15:46:44 | snapshot and collect again | download backlog from last 10 days |
+| 2023/03/30 | 05:52:38 | collect completed | |
+| 2023/03/30 | 08:58:50 | snapshot and collect again | download backlog from last ~16 hours |
+| 2023/03/30 | 09:53:34 | collect completed | bandwidth usage statistics [4] |
+| 2023/03/30 | 10:05:40 | start assemble | still running v0.0.2 [5] |
+| 2023/03/30 | 16:06:39 | assemble done | 0.9B sans (25GiB, 7GiB zipped in 15m) |
+| 2023/04/02 | 23:31:37 | snapshot and collect again | download backlog, again |
+| 2023/04/03 | 03:54:18 | collect completed | |
+| 2023/04/03 | 08:52:28 | snapshot and collect again | final before assembling for real use |
+| 2023/04/03 | 09:22:22 | collect completed | |
+| 2023/04/03 | 09:30:00 | start assemble | [5] |
+| 2023/04/03 | 16:12:38 | assemble done | 0.91B SANs (25.2GiB) from 3.74B certs |
+
+## Notes
+
+### 1
+
+ $ ct-sans snapshot >snapshot.stdout
+ $ ct-sans collect --workers 40 --batch-disk 131072 --batch-req 2048 --metrics 60m >collect.stdout 2>collect.stderr
+
+### 2
+
+In addition to adding the assemble command, `v0.0.2` stores notice.txt files in
+each log's directory automatically. This ensures that the output in stdout can
+be discarded as opposed to being stored and managed manually in the long run
+(e.g., grep for NOTICE prints when assembling data sets).
+
+Commit `ad9fb49670e28414637761bac4b8e8940e2d6770` includes a Go program that
+transforms an existing `collect.stderr` file to `notice.txt` files.
+
+Steps to migrate:
+
+ - [x] Stop (ctrl+c, wait)
+ - [x] Move collect.{stdout,stderr} to data/notes/
+ - [x] `grep NOTICE data/notes/collect.stdout | wc -l` gives 6919 lines
+ - [x] run the program in the above commit with the appropriate `directory` and
+ `noticeFile` paths. See output below.
+ - [x] `wc -l $(find . -name notice.txt) -> total says 6919 lines
+ - [x] go install git.cs.kau.se/rasmoste/ct-sans@latest, downloaded v0.0.2
+ - [x] run the same collect command as in note (1); this will not overwrite the
+ previous collect files because they have been moved to data/notes/. In the
+ future we will not need to store any of this, but doing it now just in case
+ something goes wrong.
+ - [x] The only two logs that had entries left to download resumed
+
+Output from migrate program and santity check:
+
+ $ go run .
+ 2023/03/27 14:57:41 Google 'Argon2023' log: 608 notices
+ 2023/03/27 14:57:41 Google 'Argon2024' log: 101 notices
+ 2023/03/27 14:57:41 Google 'Xenon2023' log: 2119 notices
+ 2023/03/27 14:57:41 Google 'Xenon2024' log: 170 notices
+ 2023/03/27 14:57:41 Cloudflare 'Nimbus2023' Log: 2194 notices
+ 2023/03/27 14:57:41 Cloudflare 'Nimbus2024' Log: 164 notices
+ 2023/03/27 14:57:41 DigiCert Yeti2024 Log: 17 notices
+ 2023/03/27 14:57:41 DigiCert Yeti2025 Log: no notices
+ 2023/03/27 14:57:41 DigiCert Nessie2023 Log: 155 notices
+ 2023/03/27 14:57:41 DigiCert Nessie2024 Log: 19 notices
+ 2023/03/27 14:57:41 DigiCert Nessie2025 Log: no notices
+ 2023/03/27 14:57:41 Sectigo 'Sabre' CT log: 1140 notices
+ 2023/03/27 14:57:41 Let's Encrypt 'Oak2023' log: 156 notices
+ 2023/03/27 14:57:41 Let's Encrypt 'Oak2024H1' log: 14 notices
+ 2023/03/27 14:57:41 Let's Encrypt 'Oak2024H2' log: no notices
+ 2023/03/27 14:57:41 Trust Asia Log2023: 62 notices
+ 2023/03/27 14:57:41 Trust Asia Log2024-2: no notices
+ $ wc -l $(find . -name notice.txt)
+ 101 ./data/logs/eecdd064d5db1acec55cb79db4cd13a23287467cbcecdec351485946711fb59b/notice.txt
+ 14 ./data/logs/3b5377753e2db9804e8b305b06fe403b67d84fc3f4c7bd000d2d726fe1fad417/notice.txt
+ 155 ./data/logs/b3737707e18450f86386d605a9dc11094a792db1670c0b87dcf0030e7936a59a/notice.txt
+ 62 ./data/logs/e87ea7660bc26cf6002ef5725d3fe0e331b9393bb92fbf58eb3b9049daf5435a/notice.txt
+ 164 ./data/logs/dab6bf6b3fb5b6229f9bc2bb5c6be87091716cbb51848534bda43d3048d7fbab/notice.txt
+ 608 ./data/logs/e83ed0da3ef5063532e75728bc896bc903d3cbd1116beceb69e1777d6d06bd6e/notice.txt
+ 156 ./data/logs/b73efb24df9c4dba75f239c5ba58f46c5dfc42cf7a9f35c49e1d098125edb499/notice.txt
+ 1140 ./data/logs/5581d4c2169036014aea0b9b573c53f0c0e43878702508172fa3aa1d0713d30c/notice.txt
+ 19 ./data/logs/73d99e891b4c9678a0207d479de6b2c61cd0515e71192a8c6b80107ac17772b5/notice.txt
+ 170 ./data/logs/76ff883f0ab6fb9551c261ccf587ba34b4a4cdbb29dc68420a9fe6674c5a3a74/notice.txt
+ 2119 ./data/logs/adf7befa7cff10c88b9d3d9c1e3e186ab467295dcfb10c24ca858634ebdc828a/notice.txt
+ 2194 ./data/logs/7a328c54d8b72db620ea38e0521ee98416703213854d3bd22bc13a57a352eb52/notice.txt
+ 17 ./data/logs/48b0e36bdaa647340fe56a02fa9d30eb1c5201cb56dd2c81d9bbbfab39d88473/notice.txt
+ 6919 total
+
+### 3
+
+For some reason Nimbus2023 is stuck at
+
+ {"tree_size":512926523,"RootHash":[41,19,83,107,69,253,233,106,68,143,173,151,177,196,60,228,22,57,246,105,184,51,24,50,230,153,233,189,214,93,132,186]}
+
+while trying to fetch until
+
+ {"sth_version":0,"tree_size":513025681,"timestamp":1679169572616,"sha256_root_hash":"0SzzS0M2RP5BHC6M9bvOPySYJadPi9nnk2Dsav4NKKs=","tree_head_signature":"BAMARjBEAiBXrmT+W2Ct+32DX/XL+YwS9Ut4rnOG6Y+A4Lxbf/6TogIgYEM32vweDC0QStwMq1PzIvm97cQhj6bUSdZWq/wMkNw=","log_id":"ejKMVNi3LbYg6jjgUh7phBZwMhOFTTvSK8E6V6NS61I="}
+
+These tree heads are not inconsistent, and a restart should resolve the problem.
+
+(There is likely a corner-case somewhere that made the fetcher exit or halt. We
+should debug this further at some point; but have not happened more than once.)
+
+### 4
+
+Quick overview:
+
+ $ vnstat -d
+ ens160 / daily
+
+ day rx | tx | total | avg. rate
+ ------------------------+-------------+-------------+---------------
+ 2023-03-18 1.49 TiB | 17.07 GiB | 1.51 TiB | 153.44 Mbit/s
+ 2023-03-19 3.77 TiB | 41.21 GiB | 3.81 TiB | 387.83 Mbit/s
+ 2023-03-20 3.09 TiB | 36.67 GiB | 3.13 TiB | 318.26 Mbit/s
+ 2023-03-21 3.11 TiB | 32.24 GiB | 3.14 TiB | 319.61 Mbit/s
+ 2023-03-22 2.08 TiB | 25.98 GiB | 2.10 TiB | 213.89 Mbit/s
+ 2023-03-23 1.16 TiB | 15.59 GiB | 1.18 TiB | 119.97 Mbit/s
+ 2023-03-24 1.17 TiB | 15.44 GiB | 1.18 TiB | 120.44 Mbit/s
+ 2023-03-25 1.18 TiB | 15.72 GiB | 1.19 TiB | 121.55 Mbit/s
+ 2023-03-26 707.47 GiB | 9.64 GiB | 717.11 GiB | 71.30 Mbit/s
+ 2023-03-27 448.80 GiB | 6.43 GiB | 455.23 GiB | 45.26 Mbit/s
+ 2023-03-28 451.49 GiB | 6.49 GiB | 457.98 GiB | 45.53 Mbit/s
+ 2023-03-29 1.01 TiB | 12.73 GiB | 1.03 TiB | 104.45 Mbit/s
+ 2023-03-30 256.75 GiB | 3.40 GiB | 260.15 GiB | 59.59 Mbit/s
+ ------------------------+-------------+-------------+---------------
+ estimated 591.55 GiB | 7.84 GiB | 599.39 GiB |
+
+### 5
+
+Use at most 58GiB RAM for sorting, 8 parallel sort workers. More than this does
+not improve performance according to the [GNU sort manual][]. We're also
+setting the `LC_ALL=C` variable to ensure consistent sort order (see man).
+
+ $ export LC_ALL=C
+ $ ct-sans assemble -b 58 -p 8 >assemble.stdout
+
+[GNU sort manual]: https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html
+
+(We don't need to change the default directories, because the collected data is
+stored in ./data and /tmp is a fine place to put things on our system.)
+
+### 6
+
+There are 0.91B unique SANs in the 25.2GiB dataset (6.1GiB compressed):
+
+ $ du -shb data/archive/2023-04-03-ct-sans
+ 27050799992 data/archive/2023-04-03-ct-sans
+ $ python3 -c "print(f'{27050799992 / 1024**3:.1f}GiB')"
+ 25.2GiB
+ $ du -shb data/archive/2023-04-03-ct-sans.zip
+ 6526876407 data/archive/2023-04-03-ct-sans.zip
+ $ python3 -c "print(f'{6526876407 / 1024**3:.1f}GiB')"
+ 6.1GiB
+ $ wc -l data/archive/2023-04-03-ct-sans/sans.lst
+ 907332515 data/archive/2023-04-03-ct-sans/sans.lst
+ $ python3 -c "print(f'{907332515 / 1000**3:.2f}B')"
+ 0.91B
+
+These SANs were found in 3.74B certificates from 17 CT logs:
+
+ $ grep "In total," data/archive/2023-04-03-ct-sans/README.md
+ In total, 3743244652 certificates were downloaded from 17 CT logs;
+ $ python3 -c "print(f'{3743244652 / 1000**3:.2f}B')"
+ 3.74B