From 8fb3d9985504e01a3abdd7dbe1d7c86b2110c7b0 Mon Sep 17 00:00:00 2001
From: Rasmus Dahlberg <rasmus@rgdd.se>
Date: Thu, 13 Apr 2023 09:19:30 +0200
Subject: Add measurement setup and operations timeline

---
 docs/operations.md | 826 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 824 insertions(+), 2 deletions(-)

(limited to 'docs')

diff --git a/docs/operations.md b/docs/operations.md
index 1528c32..3f67a92 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -1,3 +1,825 @@
-# Operations
+# onion-grab dataset
 
-Placeholder.
+This document describes our `onion-grab` data collection, including information
+about the local systems and a timeline for our operations leading up to the
+results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023.
+
+[Tranco top-1m]: https://tranco-list.eu/
+[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+
+## Summary
+
+The time to conduct initial tests against [Tranco top-1m][] was ~1 day.  207
+unique two-label `.onion` domains were found from 285 Onion-Location sites.
+
+The time to conduct the full measurement for [SANs in CT logs][] was ~10 days.
+3330 unique two-label `.onion` domains were configured from 26937 unique sites.
+13956 of those unique sites have the same Onion-Location configuration as
+Twitter, which likely means that they copied some of their HTML attributes.
+
+The collected data sets are available here:
+
+  - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip
+  - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip
+
+For further information about system configurations and operations, read on.
+
+## Local systems
+
+We have three mostly identical Ubuntu VMs:
+
+    $ lsb_release -a
+    No LSB modules are available.
+    Distributor ID: Ubuntu
+    Description:    Ubuntu 22.04.2 LTS
+    Release:        22.04
+    Codename:       jammy
+
+VM-1 is configured with 62.9GiB RAM, one CPU core with 32 CPU threads, and a
+~2TiB SSD:
+
+    $ grep MemTotal /proc/meminfo
+    processor /proc/cpuinfoemTotal:       65948412 keand 
+    $ grep -c processor /proc/cpuinfo
+    32
+    $ grep 'cpu cores' /proc/cpuinfo | uniq
+    cpu cores       : 1
+    $ df -BG /home
+    Filesystem                        1G-blocks  Used Available Use% Mounted on
+    /dev/mapper/ubuntu--vg-ubuntu--lv     2077G  220G     1772G  12% /
+
+VM-2 and VM-3 are configured with 62.9GiB RAM, one CPU core with 16 CPU threads,
+and a ~60TiB SSD (each):
+
+    $ grep MemTotal /proc/meminfo
+    MemTotal:       65822508 kB
+    $ grep -c processor /proc/cpuinfo
+    16
+    $ grep 'cpu cores' /proc/cpuinfo | uniq
+    cpu cores       : 1
+    $ df -BG /home
+    Filesystem                        1G-blocks  Used Available Use% Mounted on
+    /dev/mapper/ubuntu--vg-ubuntu--lv       61G   11G       48G  18% /
+
+These VMs share a 1x10Gbps link with other network VMs that we have no control
+over.  We installed `vnstat` to track our bandwidth-usage over time:
+
+    # apt install vnstat
+    # systemctl enable vnstat.service
+    # systemctl start vnstat.service
+
+We also installed Go version 1.20, see [install instructions][]:
+
+    $ go version
+    go version go1.20.2 linux/amd64
+
+[install instructions]: https://go.dev/doc/install
+
+Stopped and disabled `systemd-resolved`, populating `/etc/resolv.conf` with
+
+    $ cat /etc/resolv.conf
+    nameserver 8.8.8.8
+    nameserver 8.8.4.4
+
+which gives us a setup that [supports 1500 DNS look-ups][] per VM.
+
+[supports 1500 DNS look-ups]: https://developers.google.com/speed/public-dns/docs/isp
+
+We set
+
+    $ ulimit -Sn 100000
+    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+
+before running `onion-grab`.  The complete outputs of these commands with `-a`
+are available in our dataset.  The versions of `onion-grab` are listed below.
+
+Finally, we [installed Mullvad VPN][] so that our `onion-grab` measurements can
+run from Melbourne (VM-1), New York (VM-2) and Stockholm (VM-1).  Remember to
+set the same DNS resolvers as above (`mullvad dns set custom 8.8.8.8 8.8.4.4`).
+
+In the full measurement, we had to replace Stockholm with Frankfurt (see notes).
+
+[installed Mullvad VPN]: https://mullvad.net/en/help/install-mullvad-app-linux/
+
+## Timeline
+
+| date       | time (UTC) | event                       | notes                                       |
+| ---------- | ---------- | --------------------------- | ------------------------------------------- |
+| 2023/04/02 | 23:26:27   | test run with tranco top-1m | to estimate reasonable repetition count [1] |
+| 2023/04/03 | 12:47:43   | test run with tranco top-1m | to estimate reasonable repetition count [1] |
+| 2023/04/03 | 17:20:00   | shuffle ct-sans dataset     | deterministic per-VM seed, 15m/shuffle [2]  |
+| 2023/04/03 | 18:18:47   | test run with tranco top-1m | to estimate reasonable repetition count [1] |
+| 2023/04/03 | 20:03      | transfer shuffled dataset   | from VM-1 to VM-2 (1-3MB/s, painfully slow) |
+| 2023/04/03 | 20:03      | transfer shuffled dataset   | from VM-1 to VM-3 (1-3MB/s, painfully slow) |
+| 2023/04/03 | 22:36:06   | start onion-grab (au mel)   | checkout v0.0.2, set measure.sh params [3]  |
+| 2023/04/03 | 22:35:36   | start onion-grab (us ny)    | checkout v0.0.2, set measure.sh params [4]  |
+| 2023/04/03 | 22:35:38   | start onion-grab (se sto)   | checkout v0.0.2, set measure.sh params [5]  |
+| 2023/04/04 | 15:30      | se sto relay bw drop        | store vnstat -h stats w/ daily cron job [6] |
+| 2023/04/05 | 06:30      | kill onion-grab (se sto)    | all Stockholm relays are very slow [7]      |
+| 2023/04/05 | 07:02:13   | start onion-grab (de fra)   | all Swedish relays are very slow [8]        |
+| 2023/04/11 | 04:26:26   | us nyc completed            | minor exit bug [9]                          |
+| 2023/04/11 | 04:30:28   | au mel completed            | minor exit bug [9]                          |
+| 2023/04/11 | 20:25:50   | de fra stopped              | ran out of memory for unknown reason [10]   |
+| 2023/04/11 | 22:36:25   | de fra started again        | use start line we know is processed [10,11] |
+| 2023/04/11 | 20:25:50   | de fra stopped              | ran out of memory for unknown reason [12]   |
+| 2023/04/12 | 08:42:30   | de fra started again        | use start line we know is processed [12,13] |
+| 2023/04/12 | 11:50      | prepare dataset (au mel)    | only moving files on VM-1 [14]              |
+| 2023/04/12 | 14:00      | prepare dataset (us nyc)    | moving files on VM-2, transfer to VM-1 [15] |
+| 2023/04/12 | 16:50      | prepare dataset (se sto)    | moving files on VM-3, transfer to VM-1 [16] |
+| 2023/04/12 | 17:00      | save bandwidths at VM-{1,2} | forgot to move them earlier [17]            |
+| 2023/04/13 | 00:35:38   | de fra completed            | minor exit bug [18]                         |
+| 2023/04/13 | 05:40      | prepare dataset (de fra)    | moving files on VM-3, transfer to VM-1 [19] |
+| 2023/04/13 | 05:50      | experiment is completed     | datasets are ready, zipped, and documented  |
+
+## Notes
+
+### 1
+
+We downloaded [Tranco top-1m][], permalink [Z2XKG][] (2023-04-03):
+
+    $ sha256sum tranco_Z2XKG-1m.csv.zip
+    3e078a84e9aae7dbaf1207aac000038f1e51e20e8ccc35563da8b175d38a39dd  tranco_Z2XKG-1m.csv.zip 
+    $ unzip tranco_Z2XKG-1m.csv.zip
+    $ cut -d',' -f2 top-1m.csv > top-1m.lst
+
+[Z2XKG]: https://tranco-list.eu/list/Z2XKG/1000000
+
+This gives us a list of 1M domains to perform test-runs on.  The idea:
+
+  1. Make visits at a wanted rate (1450/s, below the 1500 DNS lookup limit)
+  2. Make visits at several slower rates (100/s, ..., 1400/s)
+  3. Repeat this from three locations (Stockholm, New York, Melbourne)
+  4. Hypothesis: observe that the same number of Onion-Location setups are
+     discovered when running at the most rapid rate from three locations when
+     compared to a lower rate at the same three locations; and that the error
+     rates are roughly the same regardless of if we use a lower or higher rate.
+
+We used `onion-grab`'s `scripts/test.sh` to perform the above experiment from
+VM-1.  The link for downloading the data is listed above in the summary.  You
+should see 3 subdirectories with results from 28 different measurements.
+
+Let's look at the results in more detail: the error rates that are printed in
+the `stderr.txt` files, as well as the parsed output using `scripts/digest.py`.
+
+#### Scan: Stockholm with limit 1450/s
+
+    $ digest.py -i 20230402-232627/se17-wireguard-l1450.txt 2>&1 |
+    tail -n6 | head -n4
+    digest.py:25 INFO: found 245 HTTP headers with Onion-Location
+    digest.py:26 INFO: found 42 HTML meta attributes with Onion-Location
+    digest.py:27 INFO: found 283 unqiue domain names that set Onion-Location
+    digest.py:28 INFO: found 205 unique two-label onion addresses in the process
+
+#### Scan: Stockholm, New York, Melbourne with limit 1450s (combined)
+
+    $ digest.py -i 20230402-232627/*l1450.txt 2>&1 | tail -n4 | head -n2
+    digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
+    digest.py:28 INFO: found 207 unique two-label onion addresses in the process
+
+Note that we found more Onion-Location setups here with the combined scan.
+
+#### Scan: Stockholm, New York, Melbourne with limits 100, 500, 1450 (combined)
+
+    $ cat 20230402-232627/stderr.txt | tail -n5 | head -n2
+    digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
+    digest.py:28 INFO: found 207 unique two-label onion addresses in the process
+
+Note that we did not find more Onion-Location setups now with 9x measurements.
+This observation holds true if `scripts/digest.py` is run with all 28 outputs:
+
+    $ ./scripts/digest.py -i\
+              20230402-232627/*-*-*\
+              20230403-124743/*-*-*\
+              20230403-181847/*-*-* 2>&1 | tail -n4 | head -n2
+    digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
+    digest.py:28 INFO: found 207 unique two-label onion addresses in the process
+
+#### Error rates
+
+Below are some pretty-printed output from the error rates shown in the
+respective `stderr.txt` files, ordered by the relay and limit that we set.  The
+maximum number of connects is 1M; all columns after that provide info about
+failed connection attempts.  E.g., the first row has 82814 DNS lookup errors.
+
+| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other)    | 3xx  | eof  | ctx   | ???  |
+| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
+| us18-wireguard |  100    |  100.0 | 287    | 711816    | 82814 (72767  843  9204)     | 51543 (21279 30264)   | 87147 (77235  9912) | 2042 | 5449 | 58932 | 257  |
+| us18-wireguard |  500    |  500.3 | 285    | 711373    | 83333 (72811 1304  9218)     | 54058 (24064 29994)   | 86728 (76803  9925) | 2160 | 5414 | 56689 | 245  |
+| us18-wireguard | 1000    | 1001.0 | 286    | 711081    | 82882 (72804  852  9226)     | 54763 (24599 30164)   | 86840 (77011  9829) | 1760 | 5086 | 57333 | 255  |
+| us18-wireguard | 1200    | 1201.5 | 286    | 711741    | 82841 (72800  855  9186)     | 53041 (22654 30387)   | 86885 (77111  9774) | 1803 | 4955 | 58485 | 249  |
+| us18-wireguard | 1400    | 1402.1 | 287    | 710481    | 82894 (72805 1468  8621)     | 59711 (29489 30222)   | 86597 (76897  9700) | 1638 | 4975 | 53450 | 254  |
+| us18-wireguard | 1450    | 1452.2 | 287    | 708649    | 82866 (72820 1272  8774)     | 60294 (30460 29834)   | 86506 (76602  9904) | 1887 | 5233 | 54298 | 267  |
+
+| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other)    | 3xx  | eof  | ctx   | ???  |
+| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
+| au-syd-wg-002  |  100    |  100.0 | 285    | 723854    | 83319 (72800 1317  9202)     | 48693 (14767 33926)   | 91658 (81324 10334) | 1810 | 5235 | 45149 | 282  |
+| au-syd-wg-002  |  500    |  500.3 | 285    | 723410    | 83119 (72791 1119  9209)     | 51229 (16767 34462)   | 91585 (81208 10377) | 1830 | 4680 | 43876 | 271  |
+| au-syd-wg-002  | 1000    | 1001.0 | 285    | 724144    | 83052 (72771 1075  9206)     | 50697 (16591 34106)   | 91678 (81442 10236) | 1491 | 4922 | 43733 | 283  |
+| au-syd-wg-002  | 1200    | 1192.3 | 286    | 723169    | 83090 (72820 1122  9148)     | 51408 (16685 34723)   | 91571 (81354 10217) | 1413 | 5024 | 44052 | 273  |
+| au-syd-wg-002  | 1400    | 1391.8 | 286    | 721119    | 83305 (72796 1906  8603)     | 55236 (21640 33596)   | 91339 (81197 10142) |  842 | 5752 | 42124 | 283  |
+| au-syd-wg-002  | 1450    | 1431.3 | 285    | 720439    | 83182 (72793 1498  8891)     | 56817 (23193 33624)   | 91376 (81049 10327) | 1100 | 5486 | 41334 | 266  |
+
+| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other)    | 3xx  | eof  | ctx   | ???  |
+| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
+| se17-wireguard |  100    |  100.0 | 286    | 724643    | 83146 (72400  954  9792)     | 48497 (14711 33786)   | 92230 (81881 10349) | 2081 | 5815 | 43325 | 263  |
+| se17-wireguard |  500    |  500.3 | 288    | 723176    | 84208 (72453 1367 10388)     | 48685 (15239 33446)   | 91664 (81341 10323) | 2073 | 5513 | 44416 | 265  |
+| se17-wireguard | 1000    | 1001.0 | 289    | 723834    | 83156 (72427  962  9767)     | 49559 (16347 33212)   | 91847 (81572 10275) | 1852 | 5638 | 43856 | 258  |
+| se17-wireguard | 1200    | 1201.5 | 289    | 724093    | 83078 (72450  905  9723)     | 48780 (15597 33183)   | 91868 (81656 10212) | 1823 | 5708 | 44389 | 261  |
+| se17-wireguard | 1200    | 1201.5 | 289    | 723788    | 83081 (72397  950  9734)     | 49070 (15848 33222)   | 91745 (81595 10150) | 1790 | 5670 | 44589 | 267  |
+| se17-wireguard | 1201    | 1202.5 | 288    | 723642    | 83063 (72413  909  9741)     | 48923 (15769 33154)   | 92120 (81575 10545) | 1823 | 5322 | 44839 | 268  |
+| se17-wireguard | 1202    | 1202.1 | 290    | 723846    | 83055 (72452  912  9691)     | 48999 (15916 33083)   | 91860 (81519 10341) | 1813 | 5497 | 44669 | 261  |
+| se17-wireguard | 1203    | 1204.5 | 289    | 723772    | 83051 (72479  882  9690)     | 48926 (15775 33151)   | 91945 (81630 10315) | 1825 | 5502 | 44716 | 263  |
+| se17-wireguard | 1204    | 1205.5 | 290    | 723816    | 83109 (72462  902  9745)     | 49256 (16161 33095)   | 92015 (81551 10464) | 1762 | 5364 | 44420 | 258  |
+| se17-wireguard | 1400    | 1402.1 | 288    | 721902    | 83808 (72426 1341 10041)     | 51820 (18732 33088)   | 91409 (81308 10101) | 1727 | 5725 | 43345 | 264  |
+| se17-wireguard | 1446    | 1448.2 | 290    | 720637    | 83037 (72463  924  9650)     | 49421 (16422 32999)   | 91416 (81132 10284) | 1801 | 5517 | 47903 | 268  |
+| se17-wireguard | 1447    | 1449.2 | 286    | 720927    | 83038 (72480  930  9628)     | 49361 (16463 32898)   | 91630 (81243 10387) | 1807 | 5399 | 47580 | 258  |
+| se17-wireguard | 1448    | 1450.2 | 288    | 720841    | 83016 (72492  933  9591)     | 49251 (16209 33042)   | 91636 (81236 10400) | 1803 | 5410 | 47783 | 260  |
+| se17-wireguard | 1449    | 1449.4 | 288    | 720456    | 83065 (72459  922  9684)     | 49513 (16554 32959)   | 91479 (81171 10308) | 1786 | 5459 | 47981 | 261  |
+| se17-wireguard | 1450    | 1450.3 | 288    | 720684    | 83036 (72476  915  9645)     | 49348 (16266 33082)   | 91608 (81238 10370) | 1734 | 5404 | 47932 | 254  |
+| se17-wireguard | 1450    | 1450.0 | 287    | 719193    | 83193 (72428 1319  9446)     | 53567 (20562 33005)   | 91390 (81135 10255) | 1956 | 5775 | 44641 | 285  |
+
+From the looks of it, the number of successful connections decrease somewhat as
+we are approaching the 1450/s limit.  Comparing the most successful and least
+successful runs with regards to the number of connects we get per location:
+
+  - Melbourne: 3705
+  - New York: 3167
+  - Stockholm: 5450
+
+These differences are mostly due to more TCP timeouts and context deadlines.
+
+#### What does this mean
+
+Running from three different locations at limit 1450/s finds the same number of
+Onion-Location setups as all 28 measurements combined.  That's what we wanted.
+
+Connect errors (mainly TCP timeouts and context deadline errors) increase
+slightly as we use the higher limits.  This is not what we wanted.  However, the
+increase in connect errors per 1M sites is only 0.3-0.5%.  These errors are
+transient, and should mostly be accounted for by having 3x tries per domain.
+
+(Each scan is running with a shuffled list, similar to our full measurement.)
+
+**Conclusion:** scanning from three different locations at limit 1450/s strikes
+a good balance between found Onion-Locations, errors, and timeliness of results.
+
+### 2
+
+The [ct-sans dataset][] that we will `onion-grab` in the full measurement was
+collected and assembled at 2023-04-03.  It contains 0.91B unique SANs.
+
+[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+
+To avoid biases like encountering the same errors at all VMs due to the order in
+which the sites were visited, the dataset is shuffled separately before use.
+
+We did all shuffling on VM-1 because it has the most disk available.
+
+Prepare shuffled dataset for VM-1:
+
+    $ seed="2023-04-03-vm-1"
+    $ time shuf\
+          --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
+	  -o vm-1.lst 2023-04-03-ct-sans/sans.lst
+    
+    real    13m40.637s
+    user    10m30.368s
+    sys     2m28.062s
+    $ time sha256sum vm-1.lst
+    4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b  vm-1.lst
+
+    real    2m51.630s
+    user    2m33.246s
+    sys     0m11.460s
+
+Prepare shuffled dataset for VM-2:
+
+    $ seed="2023-04-03-vm-2"
+    $ time shuf\
+          --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
+	  -o vm-2.lst 2023-04-03-ct-sans/sans.lst
+    
+    real    14m35.500s
+    user    11m31.577s
+    sys     2m31.447s
+    $ time sha256sum vm-2.lst
+    46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff  vm-2.lst
+
+    real    3m7.084s
+    user    2m36.416s
+    sys     0m19.012s
+    
+Prepare shuffled dataset for VM-3:
+
+    $ seed="2023-04-03-vm-3"
+    $ time shuf\
+          --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
+	  -o vm-3.lst 2023-04-03-ct-sans/sans.lst
+    
+    real    14m37.878s
+    user    11m37.963s
+    sys     2m20.373s
+    $ time sha256sum vm-3.lst
+    c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6  vm-3.lst
+
+    real    3m6.324s
+    user    2m36.804s
+    sys     0m17.056s
+
+Double-check that we have the expected number of lines:
+
+    time wc -l vm-?.lst 2023-04-03-ct-sans/sans.lst
+       907332515 vm-1.lst
+       907332515 vm-2.lst
+       907332515 vm-3.lst
+       907332515 2023-04-03-ct-sans/sans.lst
+      3629330060 total
+    
+    real    7m54.915s
+    user    0m59.213s
+    sys     1m25.353s
+
+**Note:** `shuf` is memory-hungry and needs ~2x the size of the input file.  So,
+anything less than ~60GiB memory will be insufficient for a 25GiB dataset.
+
+### 3
+
+    $ ulimit -Sn 100000
+    $ ulimit -a >ulimit.txt
+    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+    # sysctl -a >sysctl.txt
+    $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
+    $ git log | head -n1
+    commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
+    $ cd scripts
+    $ sha256sum vm-1.lst
+    4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b  vm-1.lst
+    $ git diff
+    diff --git a/scripts/measure.sh b/scripts/measure.sh
+    index a520c6d..269b5ad 100755
+    --- a/scripts/measure.sh
+    +++ b/scripts/measure.sh
+    @@ -8,11 +8,11 @@
+     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+     #
+    
+    -relay_country=se
+    -relay_city=sto
+    +relay_country=au
+    +relay_city=mel
+     limit=1450
+     num_workers=10000
+    -input_file=example.lst
+    +input_file=vm-1.lst
+     timeout_s=30
+     response_max_mib=64
+     metrics_interval=1h
+
+So, we selected Melbourne relays.
+
+    $ ./measure.sh 2>measure.stderr
+
+### 4
+
+    $ ulimit -Sn 100000
+    $ ulimit -a >ulimit.txt
+    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+    # sysctl -a >sysctl.txt
+    $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
+    $ git log | head -n1
+    commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
+    $ cd scripts
+    $ sha256sum vm-2.lst
+    46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff  vm-2.lst
+    $ git diff
+    diff --git a/scripts/measure.sh b/scripts/measure.sh
+    index a520c6d..31b2f9e 100755
+    --- a/scripts/measure.sh
+    +++ b/scripts/measure.sh
+    @@ -8,11 +8,11 @@
+     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+     #
+    
+    -relay_country=se
+    -relay_city=sto
+    +relay_country=us
+    +relay_city=nyc
+     limit=1450
+     num_workers=10000
+    -input_file=example.lst
+    +input_file=vm-2.lst
+     timeout_s=30
+     response_max_mib=64
+     metrics_interval=1h
+
+So, we selected New York relays.
+
+    $ ./measure.sh 2>measure.stderr
+
+### 5
+
+    $ ulimit -Sn 100000
+    $ ulimit -a >ulimit.txt
+    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
+    # sysctl -a >sysctl.txt
+    $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
+    $ git log | head -n1
+    commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
+    $ cd scripts
+    $ sha256sum vm-3.lst
+    c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6  vm-3.lst
+    $ git diff
+    diff --git a/scripts/measure.sh b/scripts/measure.sh
+    index a520c6d..4cc0913 100755
+    --- a/scripts/measure.sh
+    +++ b/scripts/measure.sh
+    @@ -12,7 +12,7 @@ relay_country=se
+     relay_city=sto
+     limit=1450
+     num_workers=10000
+    -input_file=example.lst
+    +input_file=vm-3.lst
+     timeout_s=30
+     response_max_mib=64
+     metrics_interval=1h
+
+So, we selected Stockholm relays (default).
+
+    $ ./measure.sh 2>measure.stderr
+
+### 6
+
+Notice that Stockholm relays are "slow".  Bandwidth appear to have dropped to
+1/10 of the initial part of the measurement.  Unclear if there are more errors
+yet or not, and if this will sort itself out.  Adding a cron job that prints
+hourly bandwidth stats every day at 23:59 to store more fine-grained data:
+
+    $ mkdir /home/rasmoste/vnstat
+    $ crontab -e
+
+And add at the end of the file:
+
+    59 23 * * * vnstat -h >"/home/rasmoste/vnstat/$(date)"
+
+(Added this on all three VMs.)
+
+### 7
+
+(In VM-3)
+
+Bandwidth stats:
+
+    $ cat Tue\ Apr\ \ 4\ 11\:59\:01\ PM\ UTC\ 2023
+    
+     ens160  /  hourly
+    
+             hour        rx      |     tx      |    total    |   avg. rate
+         ------------------------+-------------+-------------+---------------
+         2023-04-04
+             00:00     82.61 GiB |   12.78 GiB |   95.39 GiB |  227.61 Mbit/s
+             01:00     80.93 GiB |   12.70 GiB |   93.63 GiB |  223.41 Mbit/s
+             02:00     80.90 GiB |   12.68 GiB |   93.58 GiB |  223.30 Mbit/s
+             03:00     81.13 GiB |   12.63 GiB |   93.77 GiB |  223.74 Mbit/s
+             04:00     88.59 GiB |   12.97 GiB |  101.57 GiB |  242.35 Mbit/s
+             05:00     85.10 GiB |   12.93 GiB |   98.04 GiB |  233.92 Mbit/s
+             06:00     82.97 GiB |   12.84 GiB |   95.81 GiB |  228.61 Mbit/s
+             07:00     79.05 GiB |   12.62 GiB |   91.67 GiB |  218.72 Mbit/s
+             08:00     87.83 GiB |   12.81 GiB |  100.64 GiB |  240.13 Mbit/s
+             09:00     81.22 GiB |   12.62 GiB |   93.84 GiB |  223.91 Mbit/s
+             10:00     79.26 GiB |   12.57 GiB |   91.83 GiB |  219.12 Mbit/s
+             11:00     81.70 GiB |   12.67 GiB |   94.37 GiB |  225.17 Mbit/s
+             12:00     97.83 GiB |   13.21 GiB |  111.04 GiB |  264.94 Mbit/s
+             13:00     82.47 GiB |   12.59 GiB |   95.06 GiB |  226.83 Mbit/s
+             14:00     78.42 GiB |   11.46 GiB |   89.88 GiB |  214.45 Mbit/s
+             15:00     27.42 GiB |    5.95 GiB |   33.37 GiB |   79.62 Mbit/s
+             16:00     23.30 GiB |    5.37 GiB |   28.67 GiB |   68.42 Mbit/s
+             17:00     28.12 GiB |    6.03 GiB |   34.15 GiB |   81.48 Mbit/s
+             18:00     48.01 GiB |    8.76 GiB |   56.77 GiB |  135.46 Mbit/s
+             19:00     40.23 GiB |    7.73 GiB |   47.97 GiB |  114.46 Mbit/s
+             20:00     55.55 GiB |    9.63 GiB |   65.18 GiB |  155.52 Mbit/s
+             21:00     35.10 GiB |    7.06 GiB |   42.16 GiB |  100.60 Mbit/s
+             22:00     20.94 GiB |    5.00 GiB |   25.94 GiB |   61.91 Mbit/s
+             23:00     21.19 GiB |    4.95 GiB |   26.14 GiB |   68.03 Mbit/s
+         ------------------------+-------------+-------------+---------------
+
+We were hoping that this was a transient error, but all relays in Stockholm
+appear to underperform.  The rate has dropped as a result, and the number of
+successes as well.  See separate data and log files in our dataset (`se-sto/`).
+
+It will be faster, and give more accurate results, to start from a new location.
+
+Kill: `pidof onion-grab`, `kill <PID>`.
+
+Move `measure.stderr` to the data dir to not overwrite it when we restart.
+
+### 8
+
+(In VM-3.)
+
+We experienced the same "slowness" with both Gothenburg and Malmo relays.  When
+moving our measurement to Frankfurt, good bandwidth is observed again.
+
+    diff --git a/scripts/measure.sh b/scripts/measure.sh
+    index a520c6d..d46f9c1 100755
+    --- a/scripts/measure.sh
+    +++ b/scripts/measure.sh
+    @@ -8,11 +8,11 @@
+     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+     #
+    
+    -relay_country=se
+    -relay_city=sto
+    +relay_country=de
+    +relay_city=fra
+     limit=1450
+     num_workers=10000
+    -input_file=example.lst
+    +input_file=vm-3.lst
+     timeout_s=30
+     response_max_mib=64
+     metrics_interval=1h
+
+So, we selected Frankfurt relays.
+
+Without any other restarts in the same tmux pane as before:
+
+    $ ./measure.sh 2>measure.stderr
+
+### 9
+
+The summary prints (which means that the Go receiver routine waited for an
+answer for at least one timeout and shutdown) are shown in `onion-grab`'s stderr
+output, however `onion-grab` hangs after that so the measure.sh script doesn't
+exit.
+
+  - VM-1 (au mel) processed up until: 907330676
+  - VM-2 (us nyc) processed up until: 907330662
+
+To be compared with the number of entries in the ct-sans dataset: 907332515.
+
+    $ python3 -c "print(f'{907332515 - 907330676}')"
+    1839
+    $ python3 -c "print(f'{907332515 - 907330662}')"
+    1853
+
+So, it appears that we have ~1800 workers that were unable to provide their
+final answers (most likely timeouts) before the receiver routine shutdown.  This
+explains why `onion-grab` hangs, i.e., there are still workers that are waiting
+to send their answers to the receiver who is not reading answers anymore.
+
+In addition to the outstanding answers most likely being timeouts, it is not the
+same ~1800 answers on all machines since the dataset was shuffled for all VMs.
+
+**Action:** ctrl+C the measurement script that is waiting for `onion-grab` to
+complete, we already have the `onion-grab` output that we want stored to disk.
+
+### 10
+
+Latest `onion-grab` stderr print was 2023/04/11 20:25:50, then died due to too
+little memory.  Latest progress print was:
+
+    2023/04/11 20:02:33 INFO: metrics@receiver:
+    
+      Processed: 819368251
+
+So, we can safely continue without missing any sites with Onion-Location
+configured by starting a new measurement from line ~819368251.
+
+    $ python3 -c "print(f'{907332515 - 819368251}')"
+    87964264
+    $ tail -n87964264 vm-3.lst > vm-3-remaining.lst
+    $ wc -l vm-3-remaining.lst
+    87964264 vm-3-remaining.lst
+
+### 11
+
+Restart `onion-grab` from VM-3 with the final domain names to visit.
+
+    $ git diff
+    diff --git a/scripts/measure.sh b/scripts/measure.sh
+    index a520c6d..6d77c66 100755
+    --- a/scripts/measure.sh
+    +++ b/scripts/measure.sh
+    @@ -8,11 +8,11 @@
+     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+     #
+    
+    -relay_country=se
+    -relay_city=sto
+    +relay_country=de
+    +relay_city=fra
+     limit=1450
+     num_workers=10000
+    -input_file=example.lst
+    +input_file=vm-3-remaining.lst
+     timeout_s=30
+     response_max_mib=64
+     metrics_interval=1h
+    $ ./measure.sh 2>measure-remaining.stderr
+
+(`onion-grab` results are written to a separate directory that is timestamped,
+so there is no risk that the above command will overwrite any collected data.)
+
+### 12
+
+Latest `onion-grab` stderr print was 2023/04/11 23:43:19, then died due to too
+little memory.  Latest progress print was:
+
+    2023/04/11 23:36:31 INFO: metrics@receiver:
+    
+      Processed: 5217381
+
+So, we can safely continue without missing any sites with Onion-Location
+configured by starting a new measurement from line ~5217381.
+
+    $ python3 -c "print(f'{87964264 - 5217381}')"
+    82746883
+    $ tail -n82746883 vm-3-remaining.lst > vm-3-remaining-2.lst
+    $ wc -l vm-3-remaining-2.lst
+    82746883 vm-3-remaining-2.lst
+
+### 13
+
+Restart `onion-grab` from VM-3 with the final domain names to visit.  However
+reducing the number of workers to see if that can keep us from blowing up.  If
+this doesn't work we will have to bump the amount of memory in our VM.
+
+(The large amount of workers is anyway not necessary with low latency.)
+
+    $ git diff
+    diff --git a/scripts/measure.sh b/scripts/measure.sh
+    index a520c6d..3b2e54b 100755
+    --- a/scripts/measure.sh
+    +++ b/scripts/measure.sh
+    @@ -8,11 +8,11 @@
+     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
+     #
+    
+    -relay_country=se
+    -relay_city=sto
+    +relay_country=de
+    +relay_city=fra
+     limit=1450
+    -num_workers=10000
+    -input_file=example.lst
+    +num_workers=4000
+    +input_file=vm-3-remaining-2.lst
+     timeout_s=30
+     response_max_mib=64
+     metrics_interval=1h
+    $ ./measure.sh 2>measure-remaining-2.stderr
+
+### 14
+
+Renaming and moving output in VM-1:
+
+    $ mv data/20230403-223517 au-mel
+    $ rmdir data 
+    $ mv au-mel/au-mel-l1450.stderr au-mel/onion-grab.stderr
+    $ mv au-mel/au-mel-l1450.stdout au-mel/onion-grab.stdout
+    $ mv sysctl.txt au-mel/
+    $ mv ulimit.txt au-mel/
+    $ mv measure.stderr au-mel/
+    $ ls -l au-mel/
+    total 6992
+    -rw-rw-r-- 1 rasmoste rasmoste     800 Apr  3 22:36 measure.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 3749490 Apr 11 08:21 onion-grab.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 3346026 Apr 11 04:29 onion-grab.stdout
+    -rw-rw-r-- 1 rasmoste rasmoste   42500 Apr  3 22:11 sysctl.txt
+    -rw-rw-r-- 1 rasmoste rasmoste     823 Apr  3 22:11 ulimit.txt
+    $ mv au-mel ~/exp/onion-grab/data/2023-04-03-ct-sans/
+
+### 15
+
+Renaming and moving output in VM-2:
+
+    $ mv data/20230403-223519 us-nyc
+    $ rmdir data
+    $ mv us-nyc/us-nyc-l1450.stdout us-nyc/onion-grab.stdout
+    $ mv us-nyc/us-nyc-l1450.stderr us-nyc/onion-grab.stderr
+    $ mv sysctl.txt us-nyc/
+    $ mv ulimit.txt us-nyc/
+    $ mv measure.stderr us-nyc/
+    $ ls -l us-nyc
+    total 6784
+    -rw-rw-r-- 1 rasmoste rasmoste     800 Apr  3 22:35 measure.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 3553624 Apr 11 08:21 onion-grab.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 3326545 Apr 11 04:25 onion-grab.stdout
+    -rw-rw-r-- 1 rasmoste rasmoste   42531 Apr  3 22:12 sysctl.txt
+    -rw-rw-r-- 1 rasmoste rasmoste     823 Apr  3 22:11 ulimit.txt
+
+Zip and checksum before moving to VM-1:
+
+    $ zip -r us-nyc.zip us-nyc/
+    $ sha256sum us-nyc.zip
+    8759b8e7192390cc8f125a795c55b55ad9ecadb27344ce88004998ca89b7c4be  us-nyc.zip
+
+Transfer to VM-1, check that checksum is OK then unzip.
+
+### 16
+
+Renaming an moving in VM-3:
+
+    $ mv data/20230403-223521 se-sto
+    $ mv se-sto/se-sto-l1450.stderr se-sto/onion-grab.stderr
+    $ mv se-sto/se-sto-l1450.stdout se-sto/onion-grab.stdout
+    $ cp ulimit.txt se-sto/
+    $ cp sysctl.txt se-sto/
+    $ mkdir se-sto/bw
+    $ cp ~/vnstat/"Tue Apr  4 11:59:01 PM UTC 2023" se-sto/bw
+    $ cp ~/vnstat/"Wed Apr  5 11:59:01 PM UTC 2023" se-sto/bw
+    $ ls -l se-sto
+    total 912
+    drwxrwxr-x 2 rasmoste rasmoste   4096 Apr 12 16:55 bw
+    -rw-rw-r-- 1 rasmoste rasmoste    801 Apr  3 22:35 measure.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 441711 Apr  5 06:36 onion-grab.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 424925 Apr  5 06:27 onion-grab.stdout
+    -rw-rw-r-- 1 rasmoste rasmoste  42529 Apr 12 16:54 sysctl.txt
+    -rw-rw-r-- 1 rasmoste rasmoste    823 Apr 12 16:54 ulimit.txt
+    $ zip -r se-sto.zip se-sto/
+    $ sha256sum se-sto.zip
+    6fcd5640b1022828d19f3585b2a9c9488ce5c681a81a61c22b1bd4cbbe326b49  se-sto.zip
+
+Move to VM-1, check checksum and unzip.
+
+### 17
+
+VM-1: 
+
+    $ mv ~/vnstat au-mel/bw
+
+Then stop the cronjob that creates bw output (`crontab -e`).
+
+VM-2:
+
+    $ mv ~/vnstat bw
+    $ zip -r bw.zip bw/
+    $ sha256sum bw.zip
+    c4753326fcdb4dd136af81c1359cfe37fe6756726c497f39d3c33f799fc975f3  bw.zip
+
+Transfer to VM-1, check checksum, unzip and put in us-nyc directory.  Then stop
+the cronjob that creates bw output in VM-2 as well.
+
+### 18
+
+`onion-grab` hangs on shutdown similar to VM-1 and VM-2 [9].  The final summary
+print shows processed until 82746708, which should be compared to the size of
+82746883 (vm-3-remaining-2.lst).  I.e., 175 missing workers/answers.
+
+Same action as in [9], ctrl+C measurement script.
+
+### 19
+
+Renaming and moving in VM-3, first run:
+
+    $ mv data/20230405-070154 de-fra
+    $ mv de-fra/de-fra-l1450.stderr de-fra/onion-grab.stderr
+    $ mv de-fra/de-fra-l1450.stdout de-fra/onion-grab.stdout
+    $ mv measure.stderr de-fra/measure.stderr
+    $ mv ulimit.txt de-fra/
+    $ mv sysctl.txt de-fra/
+
+Second run:
+
+    $ mv data/20230411-223623/de-fra-l1450.stderr de-fra/onion-grab-2.stderr
+    $ mv data/20230411-223623/de-fra-l1450.stdout de-fra/onion-grab-2.stdout
+    $ rmdir data/20230411-223623
+    $ mv measure-remaining.stderr de-fra/measure-2.stderr
+
+Third run:
+
+    $ mv data/20230412-084228/de-fra-l1450.stderr de-fra/onion-grab-3.stderr
+    $ mv data/20230412-084228/de-fra-l1450.stdout de-fra/onion-grab-3.stdout
+    $ rmdir data/20230412-084228
+    $ mv measure-remaining-2.stderr de-fra/measure-3.stderr
+
+Grab bandwidths, exclude output from 4th since this measurement started 5th:
+
+    $ rm ~/vnstat/"Tue Apr  4 11:59:01 PM UTC 2023"
+    $ vnstat -h >"/home/rasmoste/vnstat/$(date)"
+    $ mv ~/vnstat de-fra/bw
+
+Overview:
+
+    $ ls -l de-fra
+    total 6768
+    drwxrwxr-x 2 rasmoste rasmoste    4096 Apr 13 05:39 bw
+    -rw-rw-r-- 1 rasmoste rasmoste    1019 Apr 11 23:43 measure-2.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste     810 Apr 12 08:42 measure-3.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste    1009 Apr 11 20:25 measure.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste   24004 Apr 11 23:43 onion-grab-2.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste   23002 Apr 11 23:42 onion-grab-2.stdout
+    -rw-rw-r-- 1 rasmoste rasmoste  318627 Apr 13 05:38 onion-grab-3.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste  312774 Apr 13 00:34 onion-grab-3.stdout
+    -rw-rw-r-- 1 rasmoste rasmoste 3117995 Apr 11 20:25 onion-grab.stderr
+    -rw-rw-r-- 1 rasmoste rasmoste 3034130 Apr 11 20:25 onion-grab.stdout
+    -rw-rw-r-- 1 rasmoste rasmoste   42529 Apr  3 22:12 sysctl.txt
+    -rw-rw-r-- 1 rasmoste rasmoste     823 Apr  3 22:11 ulimit.txt
+
+Then stop the cronjob that creates bw outputs (`crontab -e`).
+
+Zip, checksum, and transfer to VM-1:
+
+    $ zip -r de-fra.zip de-fra/
+    $ sha256sum de-fra.zip
+    2ea1f053decea3915b29bc60c2f954da55ea48f6d8ab9f47112caddf3a2e2f7f  de-fra.zip
-- 
cgit v1.2.3