aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus@rgdd.se>2023-04-13 09:19:30 +0200
committerRasmus Dahlberg <rasmus@rgdd.se>2023-04-13 09:19:30 +0200
commit8fb3d9985504e01a3abdd7dbe1d7c86b2110c7b0 (patch)
tree39dde3036111634dfe1907c77e20b38490d5f283
parent8e0fa61c06fd12c502ea171bee65f5fd63ccb158 (diff)
Add measurement setup and operations timeline
-rw-r--r--docs/operations.md826
1 files changed, 824 insertions, 2 deletions
diff --git a/docs/operations.md b/docs/operations.md
index 1528c32..3f67a92 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -1,3 +1,825 @@
-# Operations
+# onion-grab dataset
-Placeholder.
+This document describes our `onion-grab` data collection, including information
+about the local systems and a timeline for our operations leading up to the
+results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023.
+
+[Tranco top-1m]: https://tranco-list.eu/
+[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+
+## Summary
+
+The time to conduct initial tests against [Tranco top-1m][] was ~1 day. 207
+unique two-label `.onion` domains were found from 285 Onion-Location sites.
+
+The time to conduct the full measurement for [SANs in CT logs][] was ~10 days.
+3330 unique two-label `.onion` domains were configured from 26937 unique sites.
+13956 of those unique sites have the same Onion-Location configuration as
+Twitter, which likely means that they copied some of their HTML attributes.
+
+The collected data sets are available here:
+
+ - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip
+ - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip
+
+For further information about system configurations and operations, read on.
+
+## Local systems
+
+We have three mostly identical Ubuntu VMs:
+
+ $ lsb_release -a
+ No LSB modules are available.
+ Distributor ID: Ubuntu
+ Description: Ubuntu 22.04.2 LTS
+ Release: 22.04
+ Codename: jammy
+
+VM-1 is configured with 62.9GiB RAM, one CPU core with 32 CPU threads, and a
+~2TiB SSD:
+
+ $ grep MemTotal /proc/meminfo
+ processor /proc/cpuinfoemTotal: 65948412 keand
+ $ grep -c processor /proc/cpuinfo
+ 32
+ $ grep 'cpu cores' /proc/cpuinfo | uniq
+ cpu cores : 1
+ $ df -BG /home
+ Filesystem 1G-blocks Used Available Use% Mounted on
+ /dev/mapper/ubuntu--vg-ubuntu--lv 2077G 220G 1772G 12% /
+
+VM-2 and VM-3 are configured with 62.9GiB RAM, one CPU core with 16 CPU threads,
+and a ~60TiB SSD (each):
+
+ $ grep MemTotal /proc/meminfo
+ MemTotal: 65822508 kB
+ $ grep -c processor /proc/cpuinfo
+ 16
+ $ grep 'cpu cores' /proc/cpuinfo | uniq
+ cpu cores : 1
+ $ df -BG /home
+ Filesystem 1G-blocks Used Available Use% Mounted on
+ /dev/mapper/ubuntu--vg-ubuntu--lv 61G 11G 48G 18% /
+
+These VMs share a 1x10Gbps link with other network VMs that we have no control
+over. We installed `vnstat` to track our bandwidth-usage over time:
+
+ # apt install vnstat
+ # systemctl enable vnstat.service
+ # systemctl start vnstat.service
+
+We also installed Go version 1.20, see [install instructions][]:
+
+ $ go version
+ go version go1.20.2 linux/amd64
+
+[install instructions]: https://go.dev/doc/install
+
+Stopped and disabled `systemd-resolved`, populating `/etc/resolv.conf` with
+
+ $ cat /etc/resolv.conf
+ nameserver 8.8.8.8
+ nameserver 8.8.4.4
+
+which gives us a setup that [supports 1500 DNS look-ups][] per VM.
+
+[supports 1500 DNS look-ups]: https://developers.google.com/speed/public-dns/docs/isp
+
+We set
+
+ $ ulimit -Sn 100000
+ # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+
+before running `onion-grab`. The complete outputs of these commands with `-a`
+are available in our dataset. The versions of `onion-grab` are listed below.
+
+Finally, we [installed Mullvad VPN][] so that our `onion-grab` measurements can
+run from Melbourne (VM-1), New York (VM-2) and Stockholm (VM-1). Remember to
+set the same DNS resolvers as above (`mullvad dns set custom 8.8.8.8 8.8.4.4`).
+
+In the full measurement, we had to replace Stockholm with Frankfurt (see notes).
+
+[installed Mullvad VPN]: https://mullvad.net/en/help/install-mullvad-app-linux/
+
+## Timeline
+
+| date | time (UTC) | event | notes |
+| ---------- | ---------- | --------------------------- | ------------------------------------------- |
+| 2023/04/02 | 23:26:27 | test run with tranco top-1m | to estimate reasonable repetition count [1] |
+| 2023/04/03 | 12:47:43 | test run with tranco top-1m | to estimate reasonable repetition count [1] |
+| 2023/04/03 | 17:20:00 | shuffle ct-sans dataset | deterministic per-VM seed, 15m/shuffle [2] |
+| 2023/04/03 | 18:18:47 | test run with tranco top-1m | to estimate reasonable repetition count [1] |
+| 2023/04/03 | 20:03 | transfer shuffled dataset | from VM-1 to VM-2 (1-3MB/s, painfully slow) |
+| 2023/04/03 | 20:03 | transfer shuffled dataset | from VM-1 to VM-3 (1-3MB/s, painfully slow) |
+| 2023/04/03 | 22:36:06 | start onion-grab (au mel) | checkout v0.0.2, set measure.sh params [3] |
+| 2023/04/03 | 22:35:36 | start onion-grab (us ny) | checkout v0.0.2, set measure.sh params [4] |
+| 2023/04/03 | 22:35:38 | start onion-grab (se sto) | checkout v0.0.2, set measure.sh params [5] |
+| 2023/04/04 | 15:30 | se sto relay bw drop | store vnstat -h stats w/ daily cron job [6] |
+| 2023/04/05 | 06:30 | kill onion-grab (se sto) | all Stockholm relays are very slow [7] |
+| 2023/04/05 | 07:02:13 | start onion-grab (de fra) | all Swedish relays are very slow [8] |
+| 2023/04/11 | 04:26:26 | us nyc completed | minor exit bug [9] |
+| 2023/04/11 | 04:30:28 | au mel completed | minor exit bug [9] |
+| 2023/04/11 | 20:25:50 | de fra stopped | ran out of memory for unknown reason [10] |
+| 2023/04/11 | 22:36:25 | de fra started again | use start line we know is processed [10,11] |
+| 2023/04/11 | 20:25:50 | de fra stopped | ran out of memory for unknown reason [12] |
+| 2023/04/12 | 08:42:30 | de fra started again | use start line we know is processed [12,13] |
+| 2023/04/12 | 11:50 | prepare dataset (au mel) | only moving files on VM-1 [14] |
+| 2023/04/12 | 14:00 | prepare dataset (us nyc) | moving files on VM-2, transfer to VM-1 [15] |
+| 2023/04/12 | 16:50 | prepare dataset (se sto) | moving files on VM-3, transfer to VM-1 [16] |
+| 2023/04/12 | 17:00 | save bandwidths at VM-{1,2} | forgot to move them earlier [17] |
+| 2023/04/13 | 00:35:38 | de fra completed | minor exit bug [18] |
+| 2023/04/13 | 05:40 | prepare dataset (de fra) | moving files on VM-3, transfer to VM-1 [19] |
+| 2023/04/13 | 05:50 | experiment is completed | datasets are ready, zipped, and documented |
+
+## Notes
+
+### 1
+
+We downloaded [Tranco top-1m][], permalink [Z2XKG][] (2023-04-03):
+
+ $ sha256sum tranco_Z2XKG-1m.csv.zip
+ 3e078a84e9aae7dbaf1207aac000038f1e51e20e8ccc35563da8b175d38a39dd tranco_Z2XKG-1m.csv.zip
+ $ unzip tranco_Z2XKG-1m.csv.zip
+ $ cut -d',' -f2 top-1m.csv > top-1m.lst
+
+[Z2XKG]: https://tranco-list.eu/list/Z2XKG/1000000
+
+This gives us a list of 1M domains to perform test-runs on. The idea:
+
+ 1. Make visits at a wanted rate (1450/s, below the 1500 DNS lookup limit)
+ 2. Make visits at several slower rates (100/s, ..., 1400/s)
+ 3. Repeat this from three locations (Stockholm, New York, Melbourne)
+ 4. Hypothesis: observe that the same number of Onion-Location setups are
+ discovered when running at the most rapid rate from three locations when
+ compared to a lower rate at the same three locations; and that the error
+ rates are roughly the same regardless of if we use a lower or higher rate.
+
+We used `onion-grab`'s `scripts/test.sh` to perform the above experiment from
+VM-1. The link for downloading the data is listed above in the summary. You
+should see 3 subdirectories with results from 28 different measurements.
+
+Let's look at the results in more detail: the error rates that are printed in
+the `stderr.txt` files, as well as the parsed output using `scripts/digest.py`.
+
+#### Scan: Stockholm with limit 1450/s
+
+ $ digest.py -i 20230402-232627/se17-wireguard-l1450.txt 2>&1 |
+ tail -n6 | head -n4
+ digest.py:25 INFO: found 245 HTTP headers with Onion-Location
+ digest.py:26 INFO: found 42 HTML meta attributes with Onion-Location
+ digest.py:27 INFO: found 283 unqiue domain names that set Onion-Location
+ digest.py:28 INFO: found 205 unique two-label onion addresses in the process
+
+#### Scan: Stockholm, New York, Melbourne with limit 1450s (combined)
+
+ $ digest.py -i 20230402-232627/*l1450.txt 2>&1 | tail -n4 | head -n2
+ digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
+ digest.py:28 INFO: found 207 unique two-label onion addresses in the process
+
+Note that we found more Onion-Location setups here with the combined scan.
+
+#### Scan: Stockholm, New York, Melbourne with limits 100, 500, 1450 (combined)
+
+ $ cat 20230402-232627/stderr.txt | tail -n5 | head -n2
+ digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
+ digest.py:28 INFO: found 207 unique two-label onion addresses in the process
+
+Note that we did not find more Onion-Location setups now with 9x measurements.
+This observation holds true if `scripts/digest.py` is run with all 28 outputs:
+
+ $ ./scripts/digest.py -i\
+ 20230402-232627/*-*-*\
+ 20230403-124743/*-*-*\
+ 20230403-181847/*-*-* 2>&1 | tail -n4 | head -n2
+ digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
+ digest.py:28 INFO: found 207 unique two-label onion addresses in the process
+
+#### Error rates
+
+Below are some pretty-printed output from the error rates shown in the
+respective `stderr.txt` files, ordered by the relay and limit that we set. The
+maximum number of connects is 1M; all columns after that provide info about
+failed connection attempts. E.g., the first row has 82814 DNS lookup errors.
+
+| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other) | 3xx | eof | ctx | ??? |
+| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
+| us18-wireguard | 100 | 100.0 | 287 | 711816 | 82814 (72767 843 9204) | 51543 (21279 30264) | 87147 (77235 9912) | 2042 | 5449 | 58932 | 257 |
+| us18-wireguard | 500 | 500.3 | 285 | 711373 | 83333 (72811 1304 9218) | 54058 (24064 29994) | 86728 (76803 9925) | 2160 | 5414 | 56689 | 245 |
+| us18-wireguard | 1000 | 1001.0 | 286 | 711081 | 82882 (72804 852 9226) | 54763 (24599 30164) | 86840 (77011 9829) | 1760 | 5086 | 57333 | 255 |
+| us18-wireguard | 1200 | 1201.5 | 286 | 711741 | 82841 (72800 855 9186) | 53041 (22654 30387) | 86885 (77111 9774) | 1803 | 4955 | 58485 | 249 |
+| us18-wireguard | 1400 | 1402.1 | 287 | 710481 | 82894 (72805 1468 8621) | 59711 (29489 30222) | 86597 (76897 9700) | 1638 | 4975 | 53450 | 254 |
+| us18-wireguard | 1450 | 1452.2 | 287 | 708649 | 82866 (72820 1272 8774) | 60294 (30460 29834) | 86506 (76602 9904) | 1887 | 5233 | 54298 | 267 |
+
+| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other) | 3xx | eof | ctx | ??? |
+| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
+| au-syd-wg-002 | 100 | 100.0 | 285 | 723854 | 83319 (72800 1317 9202) | 48693 (14767 33926) | 91658 (81324 10334) | 1810 | 5235 | 45149 | 282 |
+| au-syd-wg-002 | 500 | 500.3 | 285 | 723410 | 83119 (72791 1119 9209) | 51229 (16767 34462) | 91585 (81208 10377) | 1830 | 4680 | 43876 | 271 |
+| au-syd-wg-002 | 1000 | 1001.0 | 285 | 724144 | 83052 (72771 1075 9206) | 50697 (16591 34106) | 91678 (81442 10236) | 1491 | 4922 | 43733 | 283 |
+| au-syd-wg-002 | 1200 | 1192.3 | 286 | 723169 | 83090 (72820 1122 9148) | 51408 (16685 34723) | 91571 (81354 10217) | 1413 | 5024 | 44052 | 273 |
+| au-syd-wg-002 | 1400 | 1391.8 | 286 | 721119 | 83305 (72796 1906 8603) | 55236 (21640 33596) | 91339 (81197 10142) | 842 | 5752 | 42124 | 283 |
+| au-syd-wg-002 | 1450 | 1431.3 | 285 | 720439 | 83182 (72793 1498 8891) | 56817 (23193 33624) | 91376 (81049 10327) | 1100 | 5486 | 41334 | 266 |
+
+| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other) | 3xx | eof | ctx | ??? |
+| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
+| se17-wireguard | 100 | 100.0 | 286 | 724643 | 83146 (72400 954 9792) | 48497 (14711 33786) | 92230 (81881 10349) | 2081 | 5815 | 43325 | 263 |
+| se17-wireguard | 500 | 500.3 | 288 | 723176 | 84208 (72453 1367 10388) | 48685 (15239 33446) | 91664 (81341 10323) | 2073 | 5513 | 44416 | 265 |
+| se17-wireguard | 1000 | 1001.0 | 289 | 723834 | 83156 (72427 962 9767) | 49559 (16347 33212) | 91847 (81572 10275) | 1852 | 5638 | 43856 | 258 |
+| se17-wireguard | 1200 | 1201.5 | 289 | 724093 | 83078 (72450 905 9723) | 48780 (15597 33183) | 91868 (81656 10212) | 1823 | 5708 | 44389 | 261 |
+| se17-wireguard | 1200 | 1201.5 | 289 | 723788 | 83081 (72397 950 9734) | 49070 (15848 33222) | 91745 (81595 10150) | 1790 | 5670 | 44589 | 267 |
+| se17-wireguard | 1201 | 1202.5 | 288 | 723642 | 83063 (72413 909 9741) | 48923 (15769 33154) | 92120 (81575 10545) | 1823 | 5322 | 44839 | 268 |
+| se17-wireguard | 1202 | 1202.1 | 290 | 723846 | 83055 (72452 912 9691) | 48999 (15916 33083) | 91860 (81519 10341) | 1813 | 5497 | 44669 | 261 |
+| se17-wireguard | 1203 | 1204.5 | 289 | 723772 | 83051 (72479 882 9690) | 48926 (15775 33151) | 91945 (81630 10315) | 1825 | 5502 | 44716 | 263 |
+| se17-wireguard | 1204 | 1205.5 | 290 | 723816 | 83109 (72462 902 9745) | 49256 (16161 33095) | 92015 (81551 10464) | 1762 | 5364 | 44420 | 258 |
+| se17-wireguard | 1400 | 1402.1 | 288 | 721902 | 83808 (72426 1341 10041) | 51820 (18732 33088) | 91409 (81308 10101) | 1727 | 5725 | 43345 | 264 |
+| se17-wireguard | 1446 | 1448.2 | 290 | 720637 | 83037 (72463 924 9650) | 49421 (16422 32999) | 91416 (81132 10284) | 1801 | 5517 | 47903 | 268 |
+| se17-wireguard | 1447 | 1449.2 | 286 | 720927 | 83038 (72480 930 9628) | 49361 (16463 32898) | 91630 (81243 10387) | 1807 | 5399 | 47580 | 258 |
+| se17-wireguard | 1448 | 1450.2 | 288 | 720841 | 83016 (72492 933 9591) | 49251 (16209 33042) | 91636 (81236 10400) | 1803 | 5410 | 47783 | 260 |
+| se17-wireguard | 1449 | 1449.4 | 288 | 720456 | 83065 (72459 922 9684) | 49513 (16554 32959) | 91479 (81171 10308) | 1786 | 5459 | 47981 | 261 |
+| se17-wireguard | 1450 | 1450.3 | 288 | 720684 | 83036 (72476 915 9645) | 49348 (16266 33082) | 91608 (81238 10370) | 1734 | 5404 | 47932 | 254 |
+| se17-wireguard | 1450 | 1450.0 | 287 | 719193 | 83193 (72428 1319 9446) | 53567 (20562 33005) | 91390 (81135 10255) | 1956 | 5775 | 44641 | 285 |
+
+From the looks of it, the number of successful connections decrease somewhat as
+we are approaching the 1450/s limit. Comparing the most successful and least
+successful runs with regards to the number of connects we get per location:
+
+ - Melbourne: 3705
+ - New York: 3167
+ - Stockholm: 5450
+
+These differences are mostly due to more TCP timeouts and context deadlines.
+
+#### What does this mean
+
+Running from three different locations at limit 1450/s finds the same number of
+Onion-Location setups as all 28 measurements combined. That's what we wanted.
+
+Connect errors (mainly TCP timeouts and context deadline errors) increase
+slightly as we use the higher limits. This is not what we wanted. However, the
+increase in connect errors per 1M sites is only 0.3-0.5%. These errors are
+transient, and should mostly be accounted for by having 3x tries per domain.
+
+(Each scan is running with a shuffled list, similar to our full measurement.)
+
+**Conclusion:** scanning from three different locations at limit 1450/s strikes
+a good balance between found Onion-Locations, errors, and timeliness of results.
+
+### 2
+
+The [ct-sans dataset][] that we will `onion-grab` in the full measurement was
+collected and assembled at 2023-04-03. It contains 0.91B unique SANs.
+
+[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+
+To avoid biases like encountering the same errors at all VMs due to the order in
+which the sites were visited, the dataset is shuffled separately before use.
+
+We did all shuffling on VM-1 because it has the most disk available.
+
+Prepare shuffled dataset for VM-1:
+
+ $ seed="2023-04-03-vm-1"
+ $ time shuf\
+ --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
+ -o vm-1.lst 2023-04-03-ct-sans/sans.lst
+
+ real 13m40.637s
+ user 10m30.368s
+ sys 2m28.062s
+ $ time sha256sum vm-1.lst
+ 4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b vm-1.lst
+
+ real 2m51.630s
+ user 2m33.246s
+ sys 0m11.460s
+
+Prepare shuffled dataset for VM-2:
+
+ $ seed="2023-04-03-vm-2"
+ $ time shuf\
+ --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
+ -o vm-2.lst 2023-04-03-ct-sans/sans.lst
+
+ real 14m35.500s
+ user 11m31.577s
+ sys 2m31.447s
+ $ time sha256sum vm-2.lst
+ 46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff vm-2.lst
+
+ real 3m7.084s
+ user 2m36.416s
+ sys 0m19.012s
+
+Prepare shuffled dataset for VM-3:
+
+ $ seed="2023-04-03-vm-3"
+ $ time shuf\
+ --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
+ -o vm-3.lst 2023-04-03-ct-sans/sans.lst
+
+ real 14m37.878s
+ user 11m37.963s
+ sys 2m20.373s
+ $ time sha256sum vm-3.lst
+ c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6 vm-3.lst
+
+ real 3m6.324s
+ user 2m36.804s
+ sys 0m17.056s
+
+Double-check that we have the expected number of lines:
+
+ time wc -l vm-?.lst 2023-04-03-ct-sans/sans.lst
+ 907332515 vm-1.lst
+ 907332515 vm-2.lst
+ 907332515 vm-3.lst
+ 907332515 2023-04-03-ct-sans/sans.lst
+ 3629330060 total
+
+ real 7m54.915s
+ user 0m59.213s
+ sys 1m25.353s
+
+**Note:** `shuf` is memory-hungry and needs ~2x the size of the input file. So,
+anything less than ~60GiB memory will be insufficient for a 25GiB dataset.
+
+### 3
+
+ $ ulimit -Sn 100000
+ $ ulimit -a >ulimit.txt
+ # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+ # sysctl -a >sysctl.txt
+ $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
+ $ git log | head -n1
+ commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
+ $ cd scripts
+ $ sha256sum vm-1.lst
+ 4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b vm-1.lst
+ $ git diff
+ diff --git a/scripts/measure.sh b/scripts/measure.sh
+ index a520c6d..269b5ad 100755
+ --- a/scripts/measure.sh
+ +++ b/scripts/measure.sh
+ @@ -8,11 +8,11 @@
+ # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+ #
+
+ -relay_country=se
+ -relay_city=sto
+ +relay_country=au
+ +relay_city=mel
+ limit=1450
+ num_workers=10000
+ -input_file=example.lst
+ +input_file=vm-1.lst
+ timeout_s=30
+ response_max_mib=64
+ metrics_interval=1h
+
+So, we selected Melbourne relays.
+
+ $ ./measure.sh 2>measure.stderr
+
+### 4
+
+ $ ulimit -Sn 100000
+ $ ulimit -a >ulimit.txt
+ # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+ # sysctl -a >sysctl.txt
+ $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
+ $ git log | head -n1
+ commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
+ $ cd scripts
+ $ sha256sum vm-2.lst
+ 46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff vm-2.lst
+ $ git diff
+ diff --git a/scripts/measure.sh b/scripts/measure.sh
+ index a520c6d..31b2f9e 100755
+ --- a/scripts/measure.sh
+ +++ b/scripts/measure.sh
+ @@ -8,11 +8,11 @@
+ # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+ #
+
+ -relay_country=se
+ -relay_city=sto
+ +relay_country=us
+ +relay_city=nyc
+ limit=1450
+ num_workers=10000
+ -input_file=example.lst
+ +input_file=vm-2.lst
+ timeout_s=30
+ response_max_mib=64
+ metrics_interval=1h
+
+So, we selected New York relays.
+
+ $ ./measure.sh 2>measure.stderr
+
+### 5
+
+ $ ulimit -Sn 100000
+ $ ulimit -a >ulimit.txt
+ # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+ # sysctl -a >sysctl.txt
+ $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
+ $ git log | head -n1
+ commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
+ $ cd scripts
+ $ sha256sum vm-3.lst
+ c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6 vm-3.lst
+ $ git diff
+ diff --git a/scripts/measure.sh b/scripts/measure.sh
+ index a520c6d..4cc0913 100755
+ --- a/scripts/measure.sh
+ +++ b/scripts/measure.sh
+ @@ -12,7 +12,7 @@ relay_country=se
+ relay_city=sto
+ limit=1450
+ num_workers=10000
+ -input_file=example.lst
+ +input_file=vm-3.lst
+ timeout_s=30
+ response_max_mib=64
+ metrics_interval=1h
+
+So, we selected Stockholm relays (default).
+
+ $ ./measure.sh 2>measure.stderr
+
+### 6
+
+Notice that Stockholm relays are "slow". Bandwidth appear to have dropped to
+1/10 of the initial part of the measurement. Unclear if there are more errors
+yet or not, and if this will sort itself out. Adding a cron job that prints
+hourly bandwidth stats every day at 23:59 to store more fine-grained data:
+
+ $ mkdir /home/rasmoste/vnstat
+ $ crontab -e
+
+And add at the end of the file:
+
+ 59 23 * * * vnstat -h >"/home/rasmoste/vnstat/$(date)"
+
+(Added this on all three VMs.)
+
+### 7
+
+(In VM-3)
+
+Bandwidth stats:
+
+ $ cat Tue\ Apr\ \ 4\ 11\:59\:01\ PM\ UTC\ 2023
+
+ ens160 / hourly
+
+ hour rx | tx | total | avg. rate
+ ------------------------+-------------+-------------+---------------
+ 2023-04-04
+ 00:00 82.61 GiB | 12.78 GiB | 95.39 GiB | 227.61 Mbit/s
+ 01:00 80.93 GiB | 12.70 GiB | 93.63 GiB | 223.41 Mbit/s
+ 02:00 80.90 GiB | 12.68 GiB | 93.58 GiB | 223.30 Mbit/s
+ 03:00 81.13 GiB | 12.63 GiB | 93.77 GiB | 223.74 Mbit/s
+ 04:00 88.59 GiB | 12.97 GiB | 101.57 GiB | 242.35 Mbit/s
+ 05:00 85.10 GiB | 12.93 GiB | 98.04 GiB | 233.92 Mbit/s
+ 06:00 82.97 GiB | 12.84 GiB | 95.81 GiB | 228.61 Mbit/s
+ 07:00 79.05 GiB | 12.62 GiB | 91.67 GiB | 218.72 Mbit/s
+ 08:00 87.83 GiB | 12.81 GiB | 100.64 GiB | 240.13 Mbit/s
+ 09:00 81.22 GiB | 12.62 GiB | 93.84 GiB | 223.91 Mbit/s
+ 10:00 79.26 GiB | 12.57 GiB | 91.83 GiB | 219.12 Mbit/s
+ 11:00 81.70 GiB | 12.67 GiB | 94.37 GiB | 225.17 Mbit/s
+ 12:00 97.83 GiB | 13.21 GiB | 111.04 GiB | 264.94 Mbit/s
+ 13:00 82.47 GiB | 12.59 GiB | 95.06 GiB | 226.83 Mbit/s
+ 14:00 78.42 GiB | 11.46 GiB | 89.88 GiB | 214.45 Mbit/s
+ 15:00 27.42 GiB | 5.95 GiB | 33.37 GiB | 79.62 Mbit/s
+ 16:00 23.30 GiB | 5.37 GiB | 28.67 GiB | 68.42 Mbit/s
+ 17:00 28.12 GiB | 6.03 GiB | 34.15 GiB | 81.48 Mbit/s
+ 18:00 48.01 GiB | 8.76 GiB | 56.77 GiB | 135.46 Mbit/s
+ 19:00 40.23 GiB | 7.73 GiB | 47.97 GiB | 114.46 Mbit/s
+ 20:00 55.55 GiB | 9.63 GiB | 65.18 GiB | 155.52 Mbit/s
+ 21:00 35.10 GiB | 7.06 GiB | 42.16 GiB | 100.60 Mbit/s
+ 22:00 20.94 GiB | 5.00 GiB | 25.94 GiB | 61.91 Mbit/s
+ 23:00 21.19 GiB | 4.95 GiB | 26.14 GiB | 68.03 Mbit/s
+ ------------------------+-------------+-------------+---------------
+
+We were hoping that this was a transient error, but all relays in Stockholm
+appear to underperform. The rate has dropped as a result, and the number of
+successes as well. See separate data and log files in our dataset (`se-sto/`).
+
+It will be faster, and give more accurate results, to start from a new location.
+
+Kill: `pidof onion-grab`, `kill <PID>`.
+
+Move `measure.stderr` to the data dir to not overwrite it when we restart.
+
+### 8
+
+(In VM-3.)
+
+We experienced the same "slowness" with both Gothenburg and Malmo relays. When
+moving our measurement to Frankfurt, good bandwidth is observed again.
+
+ diff --git a/scripts/measure.sh b/scripts/measure.sh
+ index a520c6d..d46f9c1 100755
+ --- a/scripts/measure.sh
+ +++ b/scripts/measure.sh
+ @@ -8,11 +8,11 @@
+ # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+ #
+
+ -relay_country=se
+ -relay_city=sto
+ +relay_country=de
+ +relay_city=fra
+ limit=1450
+ num_workers=10000
+ -input_file=example.lst
+ +input_file=vm-3.lst
+ timeout_s=30
+ response_max_mib=64
+ metrics_interval=1h
+
+So, we selected Frankfurt relays.
+
+Without any other restarts in the same tmux pane as before:
+
+ $ ./measure.sh 2>measure.stderr
+
+### 9
+
+The summary prints (which means that the Go receiver routine waited for an
+answer for at least one timeout and shutdown) are shown in `onion-grab`'s stderr
+output, however `onion-grab` hangs after that so the measure.sh script doesn't
+exit.
+
+ - VM-1 (au mel) processed up until: 907330676
+ - VM-2 (us nyc) processed up until: 907330662
+
+To be compared with the number of entries in the ct-sans dataset: 907332515.
+
+ $ python3 -c "print(f'{907332515 - 907330676}')"
+ 1839
+ $ python3 -c "print(f'{907332515 - 907330662}')"
+ 1853
+
+So, it appears that we have ~1800 workers that were unable to provide their
+final answers (most likely timeouts) before the receiver routine shutdown. This
+explains why `onion-grab` hangs, i.e., there are still workers that are waiting
+to send their answers to the receiver who is not reading answers anymore.
+
+In addition to the outstanding answers most likely being timeouts, it is not the
+same ~1800 answers on all machines since the dataset was shuffled for all VMs.
+
+**Action:** ctrl+C the measurement script that is waiting for `onion-grab` to
+complete, we already have the `onion-grab` output that we want stored to disk.
+
+### 10
+
+Latest `onion-grab` stderr print was 2023/04/11 20:25:50, then died due to too
+little memory. Latest progress print was:
+
+ 2023/04/11 20:02:33 INFO: metrics@receiver:
+
+ Processed: 819368251
+
+So, we can safely continue without missing any sites with Onion-Location
+configured by starting a new measurement from line ~819368251.
+
+ $ python3 -c "print(f'{907332515 - 819368251}')"
+ 87964264
+ $ tail -n87964264 vm-3.lst > vm-3-remaining.lst
+ $ wc -l vm-3-remaining.lst
+ 87964264 vm-3-remaining.lst
+
+### 11
+
+Restart `onion-grab` from VM-3 with the final domain names to visit.
+
+ $ git diff
+ diff --git a/scripts/measure.sh b/scripts/measure.sh
+ index a520c6d..6d77c66 100755
+ --- a/scripts/measure.sh
+ +++ b/scripts/measure.sh
+ @@ -8,11 +8,11 @@
+ # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+ #
+
+ -relay_country=se
+ -relay_city=sto
+ +relay_country=de
+ +relay_city=fra
+ limit=1450
+ num_workers=10000
+ -input_file=example.lst
+ +input_file=vm-3-remaining.lst
+ timeout_s=30
+ response_max_mib=64
+ metrics_interval=1h
+ $ ./measure.sh 2>measure-remaining.stderr
+
+(`onion-grab` results are written to a separate directory that is timestamped,
+so there is no risk that the above command will overwrite any collected data.)
+
+### 12
+
+Latest `onion-grab` stderr print was 2023/04/11 23:43:19, then died due to too
+little memory. Latest progress print was:
+
+ 2023/04/11 23:36:31 INFO: metrics@receiver:
+
+ Processed: 5217381
+
+So, we can safely continue without missing any sites with Onion-Location
+configured by starting a new measurement from line ~5217381.
+
+ $ python3 -c "print(f'{87964264 - 5217381}')"
+ 82746883
+ $ tail -n82746883 vm-3-remaining.lst > vm-3-remaining-2.lst
+ $ wc -l vm-3-remaining-2.lst
+ 82746883 vm-3-remaining-2.lst
+
+### 13
+
+Restart `onion-grab` from VM-3 with the final domain names to visit. However
+reducing the number of workers to see if that can keep us from blowing up. If
+this doesn't work we will have to bump the amount of memory in our VM.
+
+(The large amount of workers is anyway not necessary with low latency.)
+
+ $ git diff
+ diff --git a/scripts/measure.sh b/scripts/measure.sh
+ index a520c6d..3b2e54b 100755
+ --- a/scripts/measure.sh
+ +++ b/scripts/measure.sh
+ @@ -8,11 +8,11 @@
+ # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+ #
+
+ -relay_country=se
+ -relay_city=sto
+ +relay_country=de
+ +relay_city=fra
+ limit=1450
+ -num_workers=10000
+ -input_file=example.lst
+ +num_workers=4000
+ +input_file=vm-3-remaining-2.lst
+ timeout_s=30
+ response_max_mib=64
+ metrics_interval=1h
+ $ ./measure.sh 2>measure-remaining-2.stderr
+
+### 14
+
+Renaming and moving output in VM-1:
+
+ $ mv data/20230403-223517 au-mel
+ $ rmdir data
+ $ mv au-mel/au-mel-l1450.stderr au-mel/onion-grab.stderr
+ $ mv au-mel/au-mel-l1450.stdout au-mel/onion-grab.stdout
+ $ mv sysctl.txt au-mel/
+ $ mv ulimit.txt au-mel/
+ $ mv measure.stderr au-mel/
+ $ ls -l au-mel/
+ total 6992
+ -rw-rw-r-- 1 rasmoste rasmoste 800 Apr 3 22:36 measure.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 3749490 Apr 11 08:21 onion-grab.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 3346026 Apr 11 04:29 onion-grab.stdout
+ -rw-rw-r-- 1 rasmoste rasmoste 42500 Apr 3 22:11 sysctl.txt
+ -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 3 22:11 ulimit.txt
+ $ mv au-mel ~/exp/onion-grab/data/2023-04-03-ct-sans/
+
+### 15
+
+Renaming and moving output in VM-2:
+
+ $ mv data/20230403-223519 us-nyc
+ $ rmdir data
+ $ mv us-nyc/us-nyc-l1450.stdout us-nyc/onion-grab.stdout
+ $ mv us-nyc/us-nyc-l1450.stderr us-nyc/onion-grab.stderr
+ $ mv sysctl.txt us-nyc/
+ $ mv ulimit.txt us-nyc/
+ $ mv measure.stderr us-nyc/
+ $ ls -l us-nyc
+ total 6784
+ -rw-rw-r-- 1 rasmoste rasmoste 800 Apr 3 22:35 measure.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 3553624 Apr 11 08:21 onion-grab.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 3326545 Apr 11 04:25 onion-grab.stdout
+ -rw-rw-r-- 1 rasmoste rasmoste 42531 Apr 3 22:12 sysctl.txt
+ -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 3 22:11 ulimit.txt
+
+Zip and checksum before moving to VM-1:
+
+ $ zip -r us-nyc.zip us-nyc/
+ $ sha256sum us-nyc.zip
+ 8759b8e7192390cc8f125a795c55b55ad9ecadb27344ce88004998ca89b7c4be us-nyc.zip
+
+Transfer to VM-1, check that checksum is OK then unzip.
+
+### 16
+
+Renaming an moving in VM-3:
+
+ $ mv data/20230403-223521 se-sto
+ $ mv se-sto/se-sto-l1450.stderr se-sto/onion-grab.stderr
+ $ mv se-sto/se-sto-l1450.stdout se-sto/onion-grab.stdout
+ $ cp ulimit.txt se-sto/
+ $ cp sysctl.txt se-sto/
+ $ mkdir se-sto/bw
+ $ cp ~/vnstat/"Tue Apr 4 11:59:01 PM UTC 2023" se-sto/bw
+ $ cp ~/vnstat/"Wed Apr 5 11:59:01 PM UTC 2023" se-sto/bw
+ $ ls -l se-sto
+ total 912
+ drwxrwxr-x 2 rasmoste rasmoste 4096 Apr 12 16:55 bw
+ -rw-rw-r-- 1 rasmoste rasmoste 801 Apr 3 22:35 measure.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 441711 Apr 5 06:36 onion-grab.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 424925 Apr 5 06:27 onion-grab.stdout
+ -rw-rw-r-- 1 rasmoste rasmoste 42529 Apr 12 16:54 sysctl.txt
+ -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 12 16:54 ulimit.txt
+ $ zip -r se-sto.zip se-sto/
+ $ sha256sum se-sto.zip
+ 6fcd5640b1022828d19f3585b2a9c9488ce5c681a81a61c22b1bd4cbbe326b49 se-sto.zip
+
+Move to VM-1, check checksum and unzip.
+
+### 17
+
+VM-1:
+
+ $ mv ~/vnstat au-mel/bw
+
+Then stop the cronjob that creates bw output (`crontab -e`).
+
+VM-2:
+
+ $ mv ~/vnstat bw
+ $ zip -r bw.zip bw/
+ $ sha256sum bw.zip
+ c4753326fcdb4dd136af81c1359cfe37fe6756726c497f39d3c33f799fc975f3 bw.zip
+
+Transfer to VM-1, check checksum, unzip and put in us-nyc directory. Then stop
+the cronjob that creates bw output in VM-2 as well.
+
+### 18
+
+`onion-grab` hangs on shutdown similar to VM-1 and VM-2 [9]. The final summary
+print shows processed until 82746708, which should be compared to the size of
+82746883 (vm-3-remaining-2.lst). I.e., 175 missing workers/answers.
+
+Same action as in [9], ctrl+C measurement script.
+
+### 19
+
+Renaming and moving in VM-3, first run:
+
+ $ mv data/20230405-070154 de-fra
+ $ mv de-fra/de-fra-l1450.stderr de-fra/onion-grab.stderr
+ $ mv de-fra/de-fra-l1450.stdout de-fra/onion-grab.stdout
+ $ mv measure.stderr de-fra/measure.stderr
+ $ mv ulimit.txt de-fra/
+ $ mv sysctl.txt de-fra/
+
+Second run:
+
+ $ mv data/20230411-223623/de-fra-l1450.stderr de-fra/onion-grab-2.stderr
+ $ mv data/20230411-223623/de-fra-l1450.stdout de-fra/onion-grab-2.stdout
+ $ rmdir data/20230411-223623
+ $ mv measure-remaining.stderr de-fra/measure-2.stderr
+
+Third run:
+
+ $ mv data/20230412-084228/de-fra-l1450.stderr de-fra/onion-grab-3.stderr
+ $ mv data/20230412-084228/de-fra-l1450.stdout de-fra/onion-grab-3.stdout
+ $ rmdir data/20230412-084228
+ $ mv measure-remaining-2.stderr de-fra/measure-3.stderr
+
+Grab bandwidths, exclude output from 4th since this measurement started 5th:
+
+ $ rm ~/vnstat/"Tue Apr 4 11:59:01 PM UTC 2023"
+ $ vnstat -h >"/home/rasmoste/vnstat/$(date)"
+ $ mv ~/vnstat de-fra/bw
+
+Overview:
+
+ $ ls -l de-fra
+ total 6768
+ drwxrwxr-x 2 rasmoste rasmoste 4096 Apr 13 05:39 bw
+ -rw-rw-r-- 1 rasmoste rasmoste 1019 Apr 11 23:43 measure-2.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 810 Apr 12 08:42 measure-3.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 1009 Apr 11 20:25 measure.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 24004 Apr 11 23:43 onion-grab-2.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 23002 Apr 11 23:42 onion-grab-2.stdout
+ -rw-rw-r-- 1 rasmoste rasmoste 318627 Apr 13 05:38 onion-grab-3.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 312774 Apr 13 00:34 onion-grab-3.stdout
+ -rw-rw-r-- 1 rasmoste rasmoste 3117995 Apr 11 20:25 onion-grab.stderr
+ -rw-rw-r-- 1 rasmoste rasmoste 3034130 Apr 11 20:25 onion-grab.stdout
+ -rw-rw-r-- 1 rasmoste rasmoste 42529 Apr 3 22:12 sysctl.txt
+ -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 3 22:11 ulimit.txt
+
+Then stop the cronjob that creates bw outputs (`crontab -e`).
+
+Zip, checksum, and transfer to VM-1:
+
+ $ zip -r de-fra.zip de-fra/
+ $ sha256sum de-fra.zip
+ 2ea1f053decea3915b29bc60c2f954da55ea48f6d8ab9f47112caddf3a2e2f7f de-fra.zip