diff options
author | Rasmus Dahlberg <rasmus@rgdd.se> | 2023-04-13 09:19:30 +0200 |
---|---|---|
committer | Rasmus Dahlberg <rasmus@rgdd.se> | 2023-04-13 09:19:30 +0200 |
commit | 8fb3d9985504e01a3abdd7dbe1d7c86b2110c7b0 (patch) | |
tree | 39dde3036111634dfe1907c77e20b38490d5f283 /docs | |
parent | 8e0fa61c06fd12c502ea171bee65f5fd63ccb158 (diff) |
Add measurement setup and operations timeline
Diffstat (limited to 'docs')
-rw-r--r-- | docs/operations.md | 826 |
1 files changed, 824 insertions, 2 deletions
diff --git a/docs/operations.md b/docs/operations.md index 1528c32..3f67a92 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -1,3 +1,825 @@ -# Operations +# onion-grab dataset -Placeholder. +This document describes our `onion-grab` data collection, including information +about the local systems and a timeline for our operations leading up to the +results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023. + +[Tranco top-1m]: https://tranco-list.eu/ +[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md + +## Summary + +The time to conduct initial tests against [Tranco top-1m][] was ~1 day. 207 +unique two-label `.onion` domains were found from 285 Onion-Location sites. + +The time to conduct the full measurement for [SANs in CT logs][] was ~10 days. +3330 unique two-label `.onion` domains were configured from 26937 unique sites. +13956 of those unique sites have the same Onion-Location configuration as +Twitter, which likely means that they copied some of their HTML attributes. + +The collected data sets are available here: + + - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip + - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip + +For further information about system configurations and operations, read on. + +## Local systems + +We have three mostly identical Ubuntu VMs: + + $ lsb_release -a + No LSB modules are available. + Distributor ID: Ubuntu + Description: Ubuntu 22.04.2 LTS + Release: 22.04 + Codename: jammy + +VM-1 is configured with 62.9GiB RAM, one CPU core with 32 CPU threads, and a +~2TiB SSD: + + $ grep MemTotal /proc/meminfo + processor /proc/cpuinfoemTotal: 65948412 keand + $ grep -c processor /proc/cpuinfo + 32 + $ grep 'cpu cores' /proc/cpuinfo | uniq + cpu cores : 1 + $ df -BG /home + Filesystem 1G-blocks Used Available Use% Mounted on + /dev/mapper/ubuntu--vg-ubuntu--lv 2077G 220G 1772G 12% / + +VM-2 and VM-3 are configured with 62.9GiB RAM, one CPU core with 16 CPU threads, +and a ~60TiB SSD (each): + + $ grep MemTotal /proc/meminfo + MemTotal: 65822508 kB + $ grep -c processor /proc/cpuinfo + 16 + $ grep 'cpu cores' /proc/cpuinfo | uniq + cpu cores : 1 + $ df -BG /home + Filesystem 1G-blocks Used Available Use% Mounted on + /dev/mapper/ubuntu--vg-ubuntu--lv 61G 11G 48G 18% / + +These VMs share a 1x10Gbps link with other network VMs that we have no control +over. We installed `vnstat` to track our bandwidth-usage over time: + + # apt install vnstat + # systemctl enable vnstat.service + # systemctl start vnstat.service + +We also installed Go version 1.20, see [install instructions][]: + + $ go version + go version go1.20.2 linux/amd64 + +[install instructions]: https://go.dev/doc/install + +Stopped and disabled `systemd-resolved`, populating `/etc/resolv.conf` with + + $ cat /etc/resolv.conf + nameserver 8.8.8.8 + nameserver 8.8.4.4 + +which gives us a setup that [supports 1500 DNS look-ups][] per VM. + +[supports 1500 DNS look-ups]: https://developers.google.com/speed/public-dns/docs/isp + +We set + + $ ulimit -Sn 100000 + # sysctl -w net.ipv4.ip_local_port_range="1024 65535" + +before running `onion-grab`. The complete outputs of these commands with `-a` +are available in our dataset. The versions of `onion-grab` are listed below. + +Finally, we [installed Mullvad VPN][] so that our `onion-grab` measurements can +run from Melbourne (VM-1), New York (VM-2) and Stockholm (VM-1). Remember to +set the same DNS resolvers as above (`mullvad dns set custom 8.8.8.8 8.8.4.4`). + +In the full measurement, we had to replace Stockholm with Frankfurt (see notes). + +[installed Mullvad VPN]: https://mullvad.net/en/help/install-mullvad-app-linux/ + +## Timeline + +| date | time (UTC) | event | notes | +| ---------- | ---------- | --------------------------- | ------------------------------------------- | +| 2023/04/02 | 23:26:27 | test run with tranco top-1m | to estimate reasonable repetition count [1] | +| 2023/04/03 | 12:47:43 | test run with tranco top-1m | to estimate reasonable repetition count [1] | +| 2023/04/03 | 17:20:00 | shuffle ct-sans dataset | deterministic per-VM seed, 15m/shuffle [2] | +| 2023/04/03 | 18:18:47 | test run with tranco top-1m | to estimate reasonable repetition count [1] | +| 2023/04/03 | 20:03 | transfer shuffled dataset | from VM-1 to VM-2 (1-3MB/s, painfully slow) | +| 2023/04/03 | 20:03 | transfer shuffled dataset | from VM-1 to VM-3 (1-3MB/s, painfully slow) | +| 2023/04/03 | 22:36:06 | start onion-grab (au mel) | checkout v0.0.2, set measure.sh params [3] | +| 2023/04/03 | 22:35:36 | start onion-grab (us ny) | checkout v0.0.2, set measure.sh params [4] | +| 2023/04/03 | 22:35:38 | start onion-grab (se sto) | checkout v0.0.2, set measure.sh params [5] | +| 2023/04/04 | 15:30 | se sto relay bw drop | store vnstat -h stats w/ daily cron job [6] | +| 2023/04/05 | 06:30 | kill onion-grab (se sto) | all Stockholm relays are very slow [7] | +| 2023/04/05 | 07:02:13 | start onion-grab (de fra) | all Swedish relays are very slow [8] | +| 2023/04/11 | 04:26:26 | us nyc completed | minor exit bug [9] | +| 2023/04/11 | 04:30:28 | au mel completed | minor exit bug [9] | +| 2023/04/11 | 20:25:50 | de fra stopped | ran out of memory for unknown reason [10] | +| 2023/04/11 | 22:36:25 | de fra started again | use start line we know is processed [10,11] | +| 2023/04/11 | 20:25:50 | de fra stopped | ran out of memory for unknown reason [12] | +| 2023/04/12 | 08:42:30 | de fra started again | use start line we know is processed [12,13] | +| 2023/04/12 | 11:50 | prepare dataset (au mel) | only moving files on VM-1 [14] | +| 2023/04/12 | 14:00 | prepare dataset (us nyc) | moving files on VM-2, transfer to VM-1 [15] | +| 2023/04/12 | 16:50 | prepare dataset (se sto) | moving files on VM-3, transfer to VM-1 [16] | +| 2023/04/12 | 17:00 | save bandwidths at VM-{1,2} | forgot to move them earlier [17] | +| 2023/04/13 | 00:35:38 | de fra completed | minor exit bug [18] | +| 2023/04/13 | 05:40 | prepare dataset (de fra) | moving files on VM-3, transfer to VM-1 [19] | +| 2023/04/13 | 05:50 | experiment is completed | datasets are ready, zipped, and documented | + +## Notes + +### 1 + +We downloaded [Tranco top-1m][], permalink [Z2XKG][] (2023-04-03): + + $ sha256sum tranco_Z2XKG-1m.csv.zip + 3e078a84e9aae7dbaf1207aac000038f1e51e20e8ccc35563da8b175d38a39dd tranco_Z2XKG-1m.csv.zip + $ unzip tranco_Z2XKG-1m.csv.zip + $ cut -d',' -f2 top-1m.csv > top-1m.lst + +[Z2XKG]: https://tranco-list.eu/list/Z2XKG/1000000 + +This gives us a list of 1M domains to perform test-runs on. The idea: + + 1. Make visits at a wanted rate (1450/s, below the 1500 DNS lookup limit) + 2. Make visits at several slower rates (100/s, ..., 1400/s) + 3. Repeat this from three locations (Stockholm, New York, Melbourne) + 4. Hypothesis: observe that the same number of Onion-Location setups are + discovered when running at the most rapid rate from three locations when + compared to a lower rate at the same three locations; and that the error + rates are roughly the same regardless of if we use a lower or higher rate. + +We used `onion-grab`'s `scripts/test.sh` to perform the above experiment from +VM-1. The link for downloading the data is listed above in the summary. You +should see 3 subdirectories with results from 28 different measurements. + +Let's look at the results in more detail: the error rates that are printed in +the `stderr.txt` files, as well as the parsed output using `scripts/digest.py`. + +#### Scan: Stockholm with limit 1450/s + + $ digest.py -i 20230402-232627/se17-wireguard-l1450.txt 2>&1 | + tail -n6 | head -n4 + digest.py:25 INFO: found 245 HTTP headers with Onion-Location + digest.py:26 INFO: found 42 HTML meta attributes with Onion-Location + digest.py:27 INFO: found 283 unqiue domain names that set Onion-Location + digest.py:28 INFO: found 205 unique two-label onion addresses in the process + +#### Scan: Stockholm, New York, Melbourne with limit 1450s (combined) + + $ digest.py -i 20230402-232627/*l1450.txt 2>&1 | tail -n4 | head -n2 + digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location + digest.py:28 INFO: found 207 unique two-label onion addresses in the process + +Note that we found more Onion-Location setups here with the combined scan. + +#### Scan: Stockholm, New York, Melbourne with limits 100, 500, 1450 (combined) + + $ cat 20230402-232627/stderr.txt | tail -n5 | head -n2 + digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location + digest.py:28 INFO: found 207 unique two-label onion addresses in the process + +Note that we did not find more Onion-Location setups now with 9x measurements. +This observation holds true if `scripts/digest.py` is run with all 28 outputs: + + $ ./scripts/digest.py -i\ + 20230402-232627/*-*-*\ + 20230403-124743/*-*-*\ + 20230403-181847/*-*-* 2>&1 | tail -n4 | head -n2 + digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location + digest.py:28 INFO: found 207 unique two-label onion addresses in the process + +#### Error rates + +Below are some pretty-printed output from the error rates shown in the +respective `stderr.txt` files, ordered by the relay and limit that we set. The +maximum number of connects is 1M; all columns after that provide info about +failed connection attempts. E.g., the first row has 82814 DNS lookup errors. + +| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other) | 3xx | eof | ctx | ??? | +| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- | +| us18-wireguard | 100 | 100.0 | 287 | 711816 | 82814 (72767 843 9204) | 51543 (21279 30264) | 87147 (77235 9912) | 2042 | 5449 | 58932 | 257 | +| us18-wireguard | 500 | 500.3 | 285 | 711373 | 83333 (72811 1304 9218) | 54058 (24064 29994) | 86728 (76803 9925) | 2160 | 5414 | 56689 | 245 | +| us18-wireguard | 1000 | 1001.0 | 286 | 711081 | 82882 (72804 852 9226) | 54763 (24599 30164) | 86840 (77011 9829) | 1760 | 5086 | 57333 | 255 | +| us18-wireguard | 1200 | 1201.5 | 286 | 711741 | 82841 (72800 855 9186) | 53041 (22654 30387) | 86885 (77111 9774) | 1803 | 4955 | 58485 | 249 | +| us18-wireguard | 1400 | 1402.1 | 287 | 710481 | 82894 (72805 1468 8621) | 59711 (29489 30222) | 86597 (76897 9700) | 1638 | 4975 | 53450 | 254 | +| us18-wireguard | 1450 | 1452.2 | 287 | 708649 | 82866 (72820 1272 8774) | 60294 (30460 29834) | 86506 (76602 9904) | 1887 | 5233 | 54298 | 267 | + +| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other) | 3xx | eof | ctx | ??? | +| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- | +| au-syd-wg-002 | 100 | 100.0 | 285 | 723854 | 83319 (72800 1317 9202) | 48693 (14767 33926) | 91658 (81324 10334) | 1810 | 5235 | 45149 | 282 | +| au-syd-wg-002 | 500 | 500.3 | 285 | 723410 | 83119 (72791 1119 9209) | 51229 (16767 34462) | 91585 (81208 10377) | 1830 | 4680 | 43876 | 271 | +| au-syd-wg-002 | 1000 | 1001.0 | 285 | 724144 | 83052 (72771 1075 9206) | 50697 (16591 34106) | 91678 (81442 10236) | 1491 | 4922 | 43733 | 283 | +| au-syd-wg-002 | 1200 | 1192.3 | 286 | 723169 | 83090 (72820 1122 9148) | 51408 (16685 34723) | 91571 (81354 10217) | 1413 | 5024 | 44052 | 273 | +| au-syd-wg-002 | 1400 | 1391.8 | 286 | 721119 | 83305 (72796 1906 8603) | 55236 (21640 33596) | 91339 (81197 10142) | 842 | 5752 | 42124 | 283 | +| au-syd-wg-002 | 1450 | 1431.3 | 285 | 720439 | 83182 (72793 1498 8891) | 56817 (23193 33624) | 91376 (81049 10327) | 1100 | 5486 | 41334 | 266 | + +| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other) | 3xx | eof | ctx | ??? | +| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- | +| se17-wireguard | 100 | 100.0 | 286 | 724643 | 83146 (72400 954 9792) | 48497 (14711 33786) | 92230 (81881 10349) | 2081 | 5815 | 43325 | 263 | +| se17-wireguard | 500 | 500.3 | 288 | 723176 | 84208 (72453 1367 10388) | 48685 (15239 33446) | 91664 (81341 10323) | 2073 | 5513 | 44416 | 265 | +| se17-wireguard | 1000 | 1001.0 | 289 | 723834 | 83156 (72427 962 9767) | 49559 (16347 33212) | 91847 (81572 10275) | 1852 | 5638 | 43856 | 258 | +| se17-wireguard | 1200 | 1201.5 | 289 | 724093 | 83078 (72450 905 9723) | 48780 (15597 33183) | 91868 (81656 10212) | 1823 | 5708 | 44389 | 261 | +| se17-wireguard | 1200 | 1201.5 | 289 | 723788 | 83081 (72397 950 9734) | 49070 (15848 33222) | 91745 (81595 10150) | 1790 | 5670 | 44589 | 267 | +| se17-wireguard | 1201 | 1202.5 | 288 | 723642 | 83063 (72413 909 9741) | 48923 (15769 33154) | 92120 (81575 10545) | 1823 | 5322 | 44839 | 268 | +| se17-wireguard | 1202 | 1202.1 | 290 | 723846 | 83055 (72452 912 9691) | 48999 (15916 33083) | 91860 (81519 10341) | 1813 | 5497 | 44669 | 261 | +| se17-wireguard | 1203 | 1204.5 | 289 | 723772 | 83051 (72479 882 9690) | 48926 (15775 33151) | 91945 (81630 10315) | 1825 | 5502 | 44716 | 263 | +| se17-wireguard | 1204 | 1205.5 | 290 | 723816 | 83109 (72462 902 9745) | 49256 (16161 33095) | 92015 (81551 10464) | 1762 | 5364 | 44420 | 258 | +| se17-wireguard | 1400 | 1402.1 | 288 | 721902 | 83808 (72426 1341 10041) | 51820 (18732 33088) | 91409 (81308 10101) | 1727 | 5725 | 43345 | 264 | +| se17-wireguard | 1446 | 1448.2 | 290 | 720637 | 83037 (72463 924 9650) | 49421 (16422 32999) | 91416 (81132 10284) | 1801 | 5517 | 47903 | 268 | +| se17-wireguard | 1447 | 1449.2 | 286 | 720927 | 83038 (72480 930 9628) | 49361 (16463 32898) | 91630 (81243 10387) | 1807 | 5399 | 47580 | 258 | +| se17-wireguard | 1448 | 1450.2 | 288 | 720841 | 83016 (72492 933 9591) | 49251 (16209 33042) | 91636 (81236 10400) | 1803 | 5410 | 47783 | 260 | +| se17-wireguard | 1449 | 1449.4 | 288 | 720456 | 83065 (72459 922 9684) | 49513 (16554 32959) | 91479 (81171 10308) | 1786 | 5459 | 47981 | 261 | +| se17-wireguard | 1450 | 1450.3 | 288 | 720684 | 83036 (72476 915 9645) | 49348 (16266 33082) | 91608 (81238 10370) | 1734 | 5404 | 47932 | 254 | +| se17-wireguard | 1450 | 1450.0 | 287 | 719193 | 83193 (72428 1319 9446) | 53567 (20562 33005) | 91390 (81135 10255) | 1956 | 5775 | 44641 | 285 | + +From the looks of it, the number of successful connections decrease somewhat as +we are approaching the 1450/s limit. Comparing the most successful and least +successful runs with regards to the number of connects we get per location: + + - Melbourne: 3705 + - New York: 3167 + - Stockholm: 5450 + +These differences are mostly due to more TCP timeouts and context deadlines. + +#### What does this mean + +Running from three different locations at limit 1450/s finds the same number of +Onion-Location setups as all 28 measurements combined. That's what we wanted. + +Connect errors (mainly TCP timeouts and context deadline errors) increase +slightly as we use the higher limits. This is not what we wanted. However, the +increase in connect errors per 1M sites is only 0.3-0.5%. These errors are +transient, and should mostly be accounted for by having 3x tries per domain. + +(Each scan is running with a shuffled list, similar to our full measurement.) + +**Conclusion:** scanning from three different locations at limit 1450/s strikes +a good balance between found Onion-Locations, errors, and timeliness of results. + +### 2 + +The [ct-sans dataset][] that we will `onion-grab` in the full measurement was +collected and assembled at 2023-04-03. It contains 0.91B unique SANs. + +[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md + +To avoid biases like encountering the same errors at all VMs due to the order in +which the sites were visited, the dataset is shuffled separately before use. + +We did all shuffling on VM-1 because it has the most disk available. + +Prepare shuffled dataset for VM-1: + + $ seed="2023-04-03-vm-1" + $ time shuf\ + --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\ + -o vm-1.lst 2023-04-03-ct-sans/sans.lst + + real 13m40.637s + user 10m30.368s + sys 2m28.062s + $ time sha256sum vm-1.lst + 4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b vm-1.lst + + real 2m51.630s + user 2m33.246s + sys 0m11.460s + +Prepare shuffled dataset for VM-2: + + $ seed="2023-04-03-vm-2" + $ time shuf\ + --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\ + -o vm-2.lst 2023-04-03-ct-sans/sans.lst + + real 14m35.500s + user 11m31.577s + sys 2m31.447s + $ time sha256sum vm-2.lst + 46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff vm-2.lst + + real 3m7.084s + user 2m36.416s + sys 0m19.012s + +Prepare shuffled dataset for VM-3: + + $ seed="2023-04-03-vm-3" + $ time shuf\ + --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\ + -o vm-3.lst 2023-04-03-ct-sans/sans.lst + + real 14m37.878s + user 11m37.963s + sys 2m20.373s + $ time sha256sum vm-3.lst + c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6 vm-3.lst + + real 3m6.324s + user 2m36.804s + sys 0m17.056s + +Double-check that we have the expected number of lines: + + time wc -l vm-?.lst 2023-04-03-ct-sans/sans.lst + 907332515 vm-1.lst + 907332515 vm-2.lst + 907332515 vm-3.lst + 907332515 2023-04-03-ct-sans/sans.lst + 3629330060 total + + real 7m54.915s + user 0m59.213s + sys 1m25.353s + +**Note:** `shuf` is memory-hungry and needs ~2x the size of the input file. So, +anything less than ~60GiB memory will be insufficient for a 25GiB dataset. + +### 3 + + $ ulimit -Sn 100000 + $ ulimit -a >ulimit.txt + # sysctl -w net.ipv4.ip_local_port_range="1024 65535" + # sysctl -a >sysctl.txt + $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2 + $ git log | head -n1 + commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9 + $ cd scripts + $ sha256sum vm-1.lst + 4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b vm-1.lst + $ git diff + diff --git a/scripts/measure.sh b/scripts/measure.sh + index a520c6d..269b5ad 100755 + --- a/scripts/measure.sh + +++ b/scripts/measure.sh + @@ -8,11 +8,11 @@ + # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc. + # + + -relay_country=se + -relay_city=sto + +relay_country=au + +relay_city=mel + limit=1450 + num_workers=10000 + -input_file=example.lst + +input_file=vm-1.lst + timeout_s=30 + response_max_mib=64 + metrics_interval=1h + +So, we selected Melbourne relays. + + $ ./measure.sh 2>measure.stderr + +### 4 + + $ ulimit -Sn 100000 + $ ulimit -a >ulimit.txt + # sysctl -w net.ipv4.ip_local_port_range="1024 65535" + # sysctl -a >sysctl.txt + $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2 + $ git log | head -n1 + commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9 + $ cd scripts + $ sha256sum vm-2.lst + 46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff vm-2.lst + $ git diff + diff --git a/scripts/measure.sh b/scripts/measure.sh + index a520c6d..31b2f9e 100755 + --- a/scripts/measure.sh + +++ b/scripts/measure.sh + @@ -8,11 +8,11 @@ + # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc. + # + + -relay_country=se + -relay_city=sto + +relay_country=us + +relay_city=nyc + limit=1450 + num_workers=10000 + -input_file=example.lst + +input_file=vm-2.lst + timeout_s=30 + response_max_mib=64 + metrics_interval=1h + +So, we selected New York relays. + + $ ./measure.sh 2>measure.stderr + +### 5 + + $ ulimit -Sn 100000 + $ ulimit -a >ulimit.txt + # sysctl -w net.ipv4.ip_local_port_range="1024 65535" + # sysctl -a >sysctl.txt + $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2 + $ git log | head -n1 + commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9 + $ cd scripts + $ sha256sum vm-3.lst + c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6 vm-3.lst + $ git diff + diff --git a/scripts/measure.sh b/scripts/measure.sh + index a520c6d..4cc0913 100755 + --- a/scripts/measure.sh + +++ b/scripts/measure.sh + @@ -12,7 +12,7 @@ relay_country=se + relay_city=sto + limit=1450 + num_workers=10000 + -input_file=example.lst + +input_file=vm-3.lst + timeout_s=30 + response_max_mib=64 + metrics_interval=1h + +So, we selected Stockholm relays (default). + + $ ./measure.sh 2>measure.stderr + +### 6 + +Notice that Stockholm relays are "slow". Bandwidth appear to have dropped to +1/10 of the initial part of the measurement. Unclear if there are more errors +yet or not, and if this will sort itself out. Adding a cron job that prints +hourly bandwidth stats every day at 23:59 to store more fine-grained data: + + $ mkdir /home/rasmoste/vnstat + $ crontab -e + +And add at the end of the file: + + 59 23 * * * vnstat -h >"/home/rasmoste/vnstat/$(date)" + +(Added this on all three VMs.) + +### 7 + +(In VM-3) + +Bandwidth stats: + + $ cat Tue\ Apr\ \ 4\ 11\:59\:01\ PM\ UTC\ 2023 + + ens160 / hourly + + hour rx | tx | total | avg. rate + ------------------------+-------------+-------------+--------------- + 2023-04-04 + 00:00 82.61 GiB | 12.78 GiB | 95.39 GiB | 227.61 Mbit/s + 01:00 80.93 GiB | 12.70 GiB | 93.63 GiB | 223.41 Mbit/s + 02:00 80.90 GiB | 12.68 GiB | 93.58 GiB | 223.30 Mbit/s + 03:00 81.13 GiB | 12.63 GiB | 93.77 GiB | 223.74 Mbit/s + 04:00 88.59 GiB | 12.97 GiB | 101.57 GiB | 242.35 Mbit/s + 05:00 85.10 GiB | 12.93 GiB | 98.04 GiB | 233.92 Mbit/s + 06:00 82.97 GiB | 12.84 GiB | 95.81 GiB | 228.61 Mbit/s + 07:00 79.05 GiB | 12.62 GiB | 91.67 GiB | 218.72 Mbit/s + 08:00 87.83 GiB | 12.81 GiB | 100.64 GiB | 240.13 Mbit/s + 09:00 81.22 GiB | 12.62 GiB | 93.84 GiB | 223.91 Mbit/s + 10:00 79.26 GiB | 12.57 GiB | 91.83 GiB | 219.12 Mbit/s + 11:00 81.70 GiB | 12.67 GiB | 94.37 GiB | 225.17 Mbit/s + 12:00 97.83 GiB | 13.21 GiB | 111.04 GiB | 264.94 Mbit/s + 13:00 82.47 GiB | 12.59 GiB | 95.06 GiB | 226.83 Mbit/s + 14:00 78.42 GiB | 11.46 GiB | 89.88 GiB | 214.45 Mbit/s + 15:00 27.42 GiB | 5.95 GiB | 33.37 GiB | 79.62 Mbit/s + 16:00 23.30 GiB | 5.37 GiB | 28.67 GiB | 68.42 Mbit/s + 17:00 28.12 GiB | 6.03 GiB | 34.15 GiB | 81.48 Mbit/s + 18:00 48.01 GiB | 8.76 GiB | 56.77 GiB | 135.46 Mbit/s + 19:00 40.23 GiB | 7.73 GiB | 47.97 GiB | 114.46 Mbit/s + 20:00 55.55 GiB | 9.63 GiB | 65.18 GiB | 155.52 Mbit/s + 21:00 35.10 GiB | 7.06 GiB | 42.16 GiB | 100.60 Mbit/s + 22:00 20.94 GiB | 5.00 GiB | 25.94 GiB | 61.91 Mbit/s + 23:00 21.19 GiB | 4.95 GiB | 26.14 GiB | 68.03 Mbit/s + ------------------------+-------------+-------------+--------------- + +We were hoping that this was a transient error, but all relays in Stockholm +appear to underperform. The rate has dropped as a result, and the number of +successes as well. See separate data and log files in our dataset (`se-sto/`). + +It will be faster, and give more accurate results, to start from a new location. + +Kill: `pidof onion-grab`, `kill <PID>`. + +Move `measure.stderr` to the data dir to not overwrite it when we restart. + +### 8 + +(In VM-3.) + +We experienced the same "slowness" with both Gothenburg and Malmo relays. When +moving our measurement to Frankfurt, good bandwidth is observed again. + + diff --git a/scripts/measure.sh b/scripts/measure.sh + index a520c6d..d46f9c1 100755 + --- a/scripts/measure.sh + +++ b/scripts/measure.sh + @@ -8,11 +8,11 @@ + # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc. + # + + -relay_country=se + -relay_city=sto + +relay_country=de + +relay_city=fra + limit=1450 + num_workers=10000 + -input_file=example.lst + +input_file=vm-3.lst + timeout_s=30 + response_max_mib=64 + metrics_interval=1h + +So, we selected Frankfurt relays. + +Without any other restarts in the same tmux pane as before: + + $ ./measure.sh 2>measure.stderr + +### 9 + +The summary prints (which means that the Go receiver routine waited for an +answer for at least one timeout and shutdown) are shown in `onion-grab`'s stderr +output, however `onion-grab` hangs after that so the measure.sh script doesn't +exit. + + - VM-1 (au mel) processed up until: 907330676 + - VM-2 (us nyc) processed up until: 907330662 + +To be compared with the number of entries in the ct-sans dataset: 907332515. + + $ python3 -c "print(f'{907332515 - 907330676}')" + 1839 + $ python3 -c "print(f'{907332515 - 907330662}')" + 1853 + +So, it appears that we have ~1800 workers that were unable to provide their +final answers (most likely timeouts) before the receiver routine shutdown. This +explains why `onion-grab` hangs, i.e., there are still workers that are waiting +to send their answers to the receiver who is not reading answers anymore. + +In addition to the outstanding answers most likely being timeouts, it is not the +same ~1800 answers on all machines since the dataset was shuffled for all VMs. + +**Action:** ctrl+C the measurement script that is waiting for `onion-grab` to +complete, we already have the `onion-grab` output that we want stored to disk. + +### 10 + +Latest `onion-grab` stderr print was 2023/04/11 20:25:50, then died due to too +little memory. Latest progress print was: + + 2023/04/11 20:02:33 INFO: metrics@receiver: + + Processed: 819368251 + +So, we can safely continue without missing any sites with Onion-Location +configured by starting a new measurement from line ~819368251. + + $ python3 -c "print(f'{907332515 - 819368251}')" + 87964264 + $ tail -n87964264 vm-3.lst > vm-3-remaining.lst + $ wc -l vm-3-remaining.lst + 87964264 vm-3-remaining.lst + +### 11 + +Restart `onion-grab` from VM-3 with the final domain names to visit. + + $ git diff + diff --git a/scripts/measure.sh b/scripts/measure.sh + index a520c6d..6d77c66 100755 + --- a/scripts/measure.sh + +++ b/scripts/measure.sh + @@ -8,11 +8,11 @@ + # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc. + # + + -relay_country=se + -relay_city=sto + +relay_country=de + +relay_city=fra + limit=1450 + num_workers=10000 + -input_file=example.lst + +input_file=vm-3-remaining.lst + timeout_s=30 + response_max_mib=64 + metrics_interval=1h + $ ./measure.sh 2>measure-remaining.stderr + +(`onion-grab` results are written to a separate directory that is timestamped, +so there is no risk that the above command will overwrite any collected data.) + +### 12 + +Latest `onion-grab` stderr print was 2023/04/11 23:43:19, then died due to too +little memory. Latest progress print was: + + 2023/04/11 23:36:31 INFO: metrics@receiver: + + Processed: 5217381 + +So, we can safely continue without missing any sites with Onion-Location +configured by starting a new measurement from line ~5217381. + + $ python3 -c "print(f'{87964264 - 5217381}')" + 82746883 + $ tail -n82746883 vm-3-remaining.lst > vm-3-remaining-2.lst + $ wc -l vm-3-remaining-2.lst + 82746883 vm-3-remaining-2.lst + +### 13 + +Restart `onion-grab` from VM-3 with the final domain names to visit. However +reducing the number of workers to see if that can keep us from blowing up. If +this doesn't work we will have to bump the amount of memory in our VM. + +(The large amount of workers is anyway not necessary with low latency.) + + $ git diff + diff --git a/scripts/measure.sh b/scripts/measure.sh + index a520c6d..3b2e54b 100755 + --- a/scripts/measure.sh + +++ b/scripts/measure.sh + @@ -8,11 +8,11 @@ + # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc. + # + + -relay_country=se + -relay_city=sto + +relay_country=de + +relay_city=fra + limit=1450 + -num_workers=10000 + -input_file=example.lst + +num_workers=4000 + +input_file=vm-3-remaining-2.lst + timeout_s=30 + response_max_mib=64 + metrics_interval=1h + $ ./measure.sh 2>measure-remaining-2.stderr + +### 14 + +Renaming and moving output in VM-1: + + $ mv data/20230403-223517 au-mel + $ rmdir data + $ mv au-mel/au-mel-l1450.stderr au-mel/onion-grab.stderr + $ mv au-mel/au-mel-l1450.stdout au-mel/onion-grab.stdout + $ mv sysctl.txt au-mel/ + $ mv ulimit.txt au-mel/ + $ mv measure.stderr au-mel/ + $ ls -l au-mel/ + total 6992 + -rw-rw-r-- 1 rasmoste rasmoste 800 Apr 3 22:36 measure.stderr + -rw-rw-r-- 1 rasmoste rasmoste 3749490 Apr 11 08:21 onion-grab.stderr + -rw-rw-r-- 1 rasmoste rasmoste 3346026 Apr 11 04:29 onion-grab.stdout + -rw-rw-r-- 1 rasmoste rasmoste 42500 Apr 3 22:11 sysctl.txt + -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 3 22:11 ulimit.txt + $ mv au-mel ~/exp/onion-grab/data/2023-04-03-ct-sans/ + +### 15 + +Renaming and moving output in VM-2: + + $ mv data/20230403-223519 us-nyc + $ rmdir data + $ mv us-nyc/us-nyc-l1450.stdout us-nyc/onion-grab.stdout + $ mv us-nyc/us-nyc-l1450.stderr us-nyc/onion-grab.stderr + $ mv sysctl.txt us-nyc/ + $ mv ulimit.txt us-nyc/ + $ mv measure.stderr us-nyc/ + $ ls -l us-nyc + total 6784 + -rw-rw-r-- 1 rasmoste rasmoste 800 Apr 3 22:35 measure.stderr + -rw-rw-r-- 1 rasmoste rasmoste 3553624 Apr 11 08:21 onion-grab.stderr + -rw-rw-r-- 1 rasmoste rasmoste 3326545 Apr 11 04:25 onion-grab.stdout + -rw-rw-r-- 1 rasmoste rasmoste 42531 Apr 3 22:12 sysctl.txt + -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 3 22:11 ulimit.txt + +Zip and checksum before moving to VM-1: + + $ zip -r us-nyc.zip us-nyc/ + $ sha256sum us-nyc.zip + 8759b8e7192390cc8f125a795c55b55ad9ecadb27344ce88004998ca89b7c4be us-nyc.zip + +Transfer to VM-1, check that checksum is OK then unzip. + +### 16 + +Renaming an moving in VM-3: + + $ mv data/20230403-223521 se-sto + $ mv se-sto/se-sto-l1450.stderr se-sto/onion-grab.stderr + $ mv se-sto/se-sto-l1450.stdout se-sto/onion-grab.stdout + $ cp ulimit.txt se-sto/ + $ cp sysctl.txt se-sto/ + $ mkdir se-sto/bw + $ cp ~/vnstat/"Tue Apr 4 11:59:01 PM UTC 2023" se-sto/bw + $ cp ~/vnstat/"Wed Apr 5 11:59:01 PM UTC 2023" se-sto/bw + $ ls -l se-sto + total 912 + drwxrwxr-x 2 rasmoste rasmoste 4096 Apr 12 16:55 bw + -rw-rw-r-- 1 rasmoste rasmoste 801 Apr 3 22:35 measure.stderr + -rw-rw-r-- 1 rasmoste rasmoste 441711 Apr 5 06:36 onion-grab.stderr + -rw-rw-r-- 1 rasmoste rasmoste 424925 Apr 5 06:27 onion-grab.stdout + -rw-rw-r-- 1 rasmoste rasmoste 42529 Apr 12 16:54 sysctl.txt + -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 12 16:54 ulimit.txt + $ zip -r se-sto.zip se-sto/ + $ sha256sum se-sto.zip + 6fcd5640b1022828d19f3585b2a9c9488ce5c681a81a61c22b1bd4cbbe326b49 se-sto.zip + +Move to VM-1, check checksum and unzip. + +### 17 + +VM-1: + + $ mv ~/vnstat au-mel/bw + +Then stop the cronjob that creates bw output (`crontab -e`). + +VM-2: + + $ mv ~/vnstat bw + $ zip -r bw.zip bw/ + $ sha256sum bw.zip + c4753326fcdb4dd136af81c1359cfe37fe6756726c497f39d3c33f799fc975f3 bw.zip + +Transfer to VM-1, check checksum, unzip and put in us-nyc directory. Then stop +the cronjob that creates bw output in VM-2 as well. + +### 18 + +`onion-grab` hangs on shutdown similar to VM-1 and VM-2 [9]. The final summary +print shows processed until 82746708, which should be compared to the size of +82746883 (vm-3-remaining-2.lst). I.e., 175 missing workers/answers. + +Same action as in [9], ctrl+C measurement script. + +### 19 + +Renaming and moving in VM-3, first run: + + $ mv data/20230405-070154 de-fra + $ mv de-fra/de-fra-l1450.stderr de-fra/onion-grab.stderr + $ mv de-fra/de-fra-l1450.stdout de-fra/onion-grab.stdout + $ mv measure.stderr de-fra/measure.stderr + $ mv ulimit.txt de-fra/ + $ mv sysctl.txt de-fra/ + +Second run: + + $ mv data/20230411-223623/de-fra-l1450.stderr de-fra/onion-grab-2.stderr + $ mv data/20230411-223623/de-fra-l1450.stdout de-fra/onion-grab-2.stdout + $ rmdir data/20230411-223623 + $ mv measure-remaining.stderr de-fra/measure-2.stderr + +Third run: + + $ mv data/20230412-084228/de-fra-l1450.stderr de-fra/onion-grab-3.stderr + $ mv data/20230412-084228/de-fra-l1450.stdout de-fra/onion-grab-3.stdout + $ rmdir data/20230412-084228 + $ mv measure-remaining-2.stderr de-fra/measure-3.stderr + +Grab bandwidths, exclude output from 4th since this measurement started 5th: + + $ rm ~/vnstat/"Tue Apr 4 11:59:01 PM UTC 2023" + $ vnstat -h >"/home/rasmoste/vnstat/$(date)" + $ mv ~/vnstat de-fra/bw + +Overview: + + $ ls -l de-fra + total 6768 + drwxrwxr-x 2 rasmoste rasmoste 4096 Apr 13 05:39 bw + -rw-rw-r-- 1 rasmoste rasmoste 1019 Apr 11 23:43 measure-2.stderr + -rw-rw-r-- 1 rasmoste rasmoste 810 Apr 12 08:42 measure-3.stderr + -rw-rw-r-- 1 rasmoste rasmoste 1009 Apr 11 20:25 measure.stderr + -rw-rw-r-- 1 rasmoste rasmoste 24004 Apr 11 23:43 onion-grab-2.stderr + -rw-rw-r-- 1 rasmoste rasmoste 23002 Apr 11 23:42 onion-grab-2.stdout + -rw-rw-r-- 1 rasmoste rasmoste 318627 Apr 13 05:38 onion-grab-3.stderr + -rw-rw-r-- 1 rasmoste rasmoste 312774 Apr 13 00:34 onion-grab-3.stdout + -rw-rw-r-- 1 rasmoste rasmoste 3117995 Apr 11 20:25 onion-grab.stderr + -rw-rw-r-- 1 rasmoste rasmoste 3034130 Apr 11 20:25 onion-grab.stdout + -rw-rw-r-- 1 rasmoste rasmoste 42529 Apr 3 22:12 sysctl.txt + -rw-rw-r-- 1 rasmoste rasmoste 823 Apr 3 22:11 ulimit.txt + +Then stop the cronjob that creates bw outputs (`crontab -e`). + +Zip, checksum, and transfer to VM-1: + + $ zip -r de-fra.zip de-fra/ + $ sha256sum de-fra.zip + 2ea1f053decea3915b29bc60c2f954da55ea48f6d8ab9f47112caddf3a2e2f7f de-fra.zip |