aboutsummaryrefslogtreecommitdiff
path: root/docs/operations.md
blob: 9c213d603379325ad40cf5208f13676aa5fcdbe9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
# onion-grab dataset

This document describes our `onion-grab` data collection, including information
about the local systems and a timeline for our operations leading up to the
results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023.

[Tranco top-1m]: https://tranco-list.eu/
[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md

## Summary

The time to conduct initial tests against [Tranco top-1m][] was ~1 day.  207
unique two-label `.onion` domains were found from 285 Onion-Location sites.

The time to conduct the full measurement for [SANs in CT logs][] was ~10 days.
3330 unique two-label `.onion` domains were configured from 26937 unique sites.
13956 of those unique sites have the same Onion-Location configuration as
Twitter, which likely means that they copied some of their HTML attributes.

The collected data sets are available here:

  - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip
  - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip

For further information about system configurations and operations, read on.

## Local systems

We have three mostly identical Ubuntu VMs:

    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 22.04.2 LTS
    Release:        22.04
    Codename:       jammy

VM-1 is configured with 62.9GiB RAM, one CPU core with 32 CPU threads, and a
~2TiB SSD:

    $ grep MemTotal /proc/meminfo
    processor /proc/cpuinfoemTotal:       65948412 keand 
    $ grep -c processor /proc/cpuinfo
    32
    $ grep 'cpu cores' /proc/cpuinfo | uniq
    cpu cores       : 1
    $ df -BG /home
    Filesystem                        1G-blocks  Used Available Use% Mounted on
    /dev/mapper/ubuntu--vg-ubuntu--lv     2077G  220G     1772G  12% /

VM-2 and VM-3 are configured with 62.9GiB RAM, one CPU core with 16 CPU threads,
and a ~60TiB SSD (each):

    $ grep MemTotal /proc/meminfo
    MemTotal:       65822508 kB
    $ grep -c processor /proc/cpuinfo
    16
    $ grep 'cpu cores' /proc/cpuinfo | uniq
    cpu cores       : 1
    $ df -BG /home
    Filesystem                        1G-blocks  Used Available Use% Mounted on
    /dev/mapper/ubuntu--vg-ubuntu--lv       61G   11G       48G  18% /

These VMs share a 1x10Gbps link with other network VMs that we have no control
over.  We installed `vnstat` to track our bandwidth-usage over time:

    # apt install vnstat
    # systemctl enable vnstat.service
    # systemctl start vnstat.service

We also installed Go version 1.20, see [install instructions][]:

    $ go version
    go version go1.20.2 linux/amd64

[install instructions]: https://go.dev/doc/install

Stopped and disabled `systemd-resolved`, populating `/etc/resolv.conf` with

    $ cat /etc/resolv.conf
    nameserver 8.8.8.8
    nameserver 8.8.4.4

which gives us a setup that [supports 1500 DNS look-ups][] per VM.

[supports 1500 DNS look-ups]: https://developers.google.com/speed/public-dns/docs/isp

We set

    $ ulimit -Sn 100000
    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"

before running `onion-grab`.  The complete outputs of these commands with `-a`
are available in our dataset.  The versions of `onion-grab` are listed below.

Finally, we [installed Mullvad VPN][] so that our `onion-grab` measurements can
run from Melbourne (VM-1), New York (VM-2) and Stockholm (VM-1).  Remember to
set the same DNS resolvers as above (`mullvad dns set custom 8.8.8.8 8.8.4.4`).

In the full measurement, we had to replace Stockholm with Frankfurt (see notes).

[installed Mullvad VPN]: https://mullvad.net/en/help/install-mullvad-app-linux/

## Timeline

| date       | time (UTC) | event                       | notes                                       |
| ---------- | ---------- | --------------------------- | ------------------------------------------- |
| 2023/04/02 | 23:26:27   | test run with tranco top-1m | to estimate reasonable repetition count [1] |
| 2023/04/03 | 12:47:43   | test run with tranco top-1m | to estimate reasonable repetition count [1] |
| 2023/04/03 | 17:20:00   | shuffle ct-sans dataset     | deterministic per-VM seed, 15m/shuffle [2]  |
| 2023/04/03 | 18:18:47   | test run with tranco top-1m | to estimate reasonable repetition count [1] |
| 2023/04/03 | 20:03      | transfer shuffled dataset   | from VM-1 to VM-2 (1-3MB/s, painfully slow) |
| 2023/04/03 | 20:03      | transfer shuffled dataset   | from VM-1 to VM-3 (1-3MB/s, painfully slow) |
| 2023/04/03 | 22:36:06   | start onion-grab (au mel)   | checkout v0.0.2, set measure.sh params [3]  |
| 2023/04/03 | 22:35:36   | start onion-grab (us ny)    | checkout v0.0.2, set measure.sh params [4]  |
| 2023/04/03 | 22:35:38   | start onion-grab (se sto)   | checkout v0.0.2, set measure.sh params [5]  |
| 2023/04/04 | 15:30      | se sto relay bw drop        | store vnstat -h stats w/ daily cron job [6] |
| 2023/04/05 | 06:30      | kill onion-grab (se sto)    | all Stockholm relays are very slow [7]      |
| 2023/04/05 | 07:02:13   | start onion-grab (de fra)   | all Swedish relays are very slow [8]        |
| 2023/04/11 | 04:26:26   | us nyc completed            | minor exit bug [9]                          |
| 2023/04/11 | 04:30:28   | au mel completed            | minor exit bug [9]                          |
| 2023/04/11 | 20:25:50   | de fra stopped              | ran out of memory for unknown reason [10]   |
| 2023/04/11 | 22:36:25   | de fra started again        | use start line we know is processed [10,11] |
| 2023/04/11 | 20:25:50   | de fra stopped              | ran out of memory for unknown reason [12]   |
| 2023/04/12 | 08:42:30   | de fra started again        | use start line we know is processed [12,13] |
| 2023/04/12 | 11:50      | prepare dataset (au mel)    | only moving files on VM-1 [14]              |
| 2023/04/12 | 14:00      | prepare dataset (us nyc)    | moving files on VM-2, transfer to VM-1 [15] |
| 2023/04/12 | 16:50      | prepare dataset (se sto)    | moving files on VM-3, transfer to VM-1 [16] |
| 2023/04/12 | 17:00      | save bandwidths at VM-{1,2} | forgot to move them earlier [17]            |
| 2023/04/13 | 00:35:38   | de fra completed            | minor exit bug [18]                         |
| 2023/04/13 | 05:40      | prepare dataset (de fra)    | moving files on VM-3, transfer to VM-1 [19] |
| 2023/04/13 | 05:50      | experiment is completed     | datasets are ready, zipped, and documented  |
| 2023/07/06 |            | move source to tpo gitlab   | git.cs.kau.se/rasmoste is not a stable home |

## Notes

### 1

We downloaded [Tranco top-1m][], permalink [Z2XKG][] (2023-04-03):

    $ sha256sum tranco_Z2XKG-1m.csv.zip
    3e078a84e9aae7dbaf1207aac000038f1e51e20e8ccc35563da8b175d38a39dd  tranco_Z2XKG-1m.csv.zip 
    $ unzip tranco_Z2XKG-1m.csv.zip
    $ cut -d',' -f2 top-1m.csv > top-1m.lst

[Z2XKG]: https://tranco-list.eu/list/Z2XKG/1000000

This gives us a list of 1M domains to perform test-runs on.  The idea:

  1. Make visits at a wanted rate (1450/s, below the 1500 DNS lookup limit)
  2. Make visits at several slower rates (100/s, ..., 1400/s)
  3. Repeat this from three locations (Stockholm, New York, Melbourne)
  4. Hypothesis: observe that the same number of Onion-Location setups are
     discovered when running at the most rapid rate from three locations when
     compared to a lower rate at the same three locations; and that the error
     rates are roughly the same regardless of if we use a lower or higher rate.

We used `onion-grab`'s `scripts/test.sh` to perform the above experiment from
VM-1.  The link for downloading the data is listed above in the summary.  You
should see 3 subdirectories with results from 28 different measurements.

Let's look at the results in more detail: the error rates that are printed in
the `stderr.txt` files, as well as the parsed output using `scripts/digest.py`.

#### Scan: Stockholm with limit 1450/s

    $ digest.py -i 20230402-232627/se17-wireguard-l1450.txt 2>&1 |
    tail -n6 | head -n4
    digest.py:25 INFO: found 245 HTTP headers with Onion-Location
    digest.py:26 INFO: found 42 HTML meta attributes with Onion-Location
    digest.py:27 INFO: found 283 unqiue domain names that set Onion-Location
    digest.py:28 INFO: found 205 unique two-label onion addresses in the process

#### Scan: Stockholm, New York, Melbourne with limit 1450s (combined)

    $ digest.py -i 20230402-232627/*l1450.txt 2>&1 | tail -n4 | head -n2
    digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
    digest.py:28 INFO: found 207 unique two-label onion addresses in the process

Note that we found more Onion-Location setups here with the combined scan.

#### Scan: Stockholm, New York, Melbourne with limits 100, 500, 1450 (combined)

    $ cat 20230402-232627/stderr.txt | tail -n5 | head -n2
    digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
    digest.py:28 INFO: found 207 unique two-label onion addresses in the process

Note that we did not find more Onion-Location setups now with 9x measurements.
This observation holds true if `scripts/digest.py` is run with all 28 outputs:

    $ ./scripts/digest.py -i\
              20230402-232627/*-*-*\
              20230403-124743/*-*-*\
              20230403-181847/*-*-* 2>&1 | tail -n4 | head -n2
    digest.py:27 INFO: found 285 unqiue domain names that set Onion-Location
    digest.py:28 INFO: found 207 unique two-label onion addresses in the process

#### Error rates

Below are some pretty-printed output from the error rates shown in the
respective `stderr.txt` files, ordered by the relay and limit that we set.  The
maximum number of connects is 1M; all columns after that provide info about
failed connection attempts.  E.g., the first row has 82814 DNS lookup errors.

| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other)    | 3xx  | eof  | ctx   | ???  |
| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
| us18-wireguard |  100    |  100.0 | 287    | 711816    | 82814 (72767  843  9204)     | 51543 (21279 30264)   | 87147 (77235  9912) | 2042 | 5449 | 58932 | 257  |
| us18-wireguard |  500    |  500.3 | 285    | 711373    | 83333 (72811 1304  9218)     | 54058 (24064 29994)   | 86728 (76803  9925) | 2160 | 5414 | 56689 | 245  |
| us18-wireguard | 1000    | 1001.0 | 286    | 711081    | 82882 (72804  852  9226)     | 54763 (24599 30164)   | 86840 (77011  9829) | 1760 | 5086 | 57333 | 255  |
| us18-wireguard | 1200    | 1201.5 | 286    | 711741    | 82841 (72800  855  9186)     | 53041 (22654 30387)   | 86885 (77111  9774) | 1803 | 4955 | 58485 | 249  |
| us18-wireguard | 1400    | 1402.1 | 287    | 710481    | 82894 (72805 1468  8621)     | 59711 (29489 30222)   | 86597 (76897  9700) | 1638 | 4975 | 53450 | 254  |
| us18-wireguard | 1450    | 1452.2 | 287    | 708649    | 82866 (72820 1272  8774)     | 60294 (30460 29834)   | 86506 (76602  9904) | 1887 | 5233 | 54298 | 267  |

| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other)    | 3xx  | eof  | ctx   | ???  |
| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
| au-syd-wg-002  |  100    |  100.0 | 285    | 723854    | 83319 (72800 1317  9202)     | 48693 (14767 33926)   | 91658 (81324 10334) | 1810 | 5235 | 45149 | 282  |
| au-syd-wg-002  |  500    |  500.3 | 285    | 723410    | 83119 (72791 1119  9209)     | 51229 (16767 34462)   | 91585 (81208 10377) | 1830 | 4680 | 43876 | 271  |
| au-syd-wg-002  | 1000    | 1001.0 | 285    | 724144    | 83052 (72771 1075  9206)     | 50697 (16591 34106)   | 91678 (81442 10236) | 1491 | 4922 | 43733 | 283  |
| au-syd-wg-002  | 1200    | 1192.3 | 286    | 723169    | 83090 (72820 1122  9148)     | 51408 (16685 34723)   | 91571 (81354 10217) | 1413 | 5024 | 44052 | 273  |
| au-syd-wg-002  | 1400    | 1391.8 | 286    | 721119    | 83305 (72796 1906  8603)     | 55236 (21640 33596)   | 91339 (81197 10142) |  842 | 5752 | 42124 | 283  |
| au-syd-wg-002  | 1450    | 1431.3 | 285    | 720439    | 83182 (72793 1498  8891)     | 56817 (23193 33624)   | 91376 (81049 10327) | 1100 | 5486 | 41334 | 266  |

| relay hostname | limit/s | rate/s | onions | connected | dns (NotFound/Timeout/Other) | tcp (Timeout/Syscall) | tls (Cert/Other)    | 3xx  | eof  | ctx   | ???  |
| -------------- | ------- | ------ | ------ | --------- | ---------------------------- | --------------------- | ------------------- | ---- | ---- | ----- | ---- |
| se17-wireguard |  100    |  100.0 | 286    | 724643    | 83146 (72400  954  9792)     | 48497 (14711 33786)   | 92230 (81881 10349) | 2081 | 5815 | 43325 | 263  |
| se17-wireguard |  500    |  500.3 | 288    | 723176    | 84208 (72453 1367 10388)     | 48685 (15239 33446)   | 91664 (81341 10323) | 2073 | 5513 | 44416 | 265  |
| se17-wireguard | 1000    | 1001.0 | 289    | 723834    | 83156 (72427  962  9767)     | 49559 (16347 33212)   | 91847 (81572 10275) | 1852 | 5638 | 43856 | 258  |
| se17-wireguard | 1200    | 1201.5 | 289    | 724093    | 83078 (72450  905  9723)     | 48780 (15597 33183)   | 91868 (81656 10212) | 1823 | 5708 | 44389 | 261  |
| se17-wireguard | 1200    | 1201.5 | 289    | 723788    | 83081 (72397  950  9734)     | 49070 (15848 33222)   | 91745 (81595 10150) | 1790 | 5670 | 44589 | 267  |
| se17-wireguard | 1201    | 1202.5 | 288    | 723642    | 83063 (72413  909  9741)     | 48923 (15769 33154)   | 92120 (81575 10545) | 1823 | 5322 | 44839 | 268  |
| se17-wireguard | 1202    | 1202.1 | 290    | 723846    | 83055 (72452  912  9691)     | 48999 (15916 33083)   | 91860 (81519 10341) | 1813 | 5497 | 44669 | 261  |
| se17-wireguard | 1203    | 1204.5 | 289    | 723772    | 83051 (72479  882  9690)     | 48926 (15775 33151)   | 91945 (81630 10315) | 1825 | 5502 | 44716 | 263  |
| se17-wireguard | 1204    | 1205.5 | 290    | 723816    | 83109 (72462  902  9745)     | 49256 (16161 33095)   | 92015 (81551 10464) | 1762 | 5364 | 44420 | 258  |
| se17-wireguard | 1400    | 1402.1 | 288    | 721902    | 83808 (72426 1341 10041)     | 51820 (18732 33088)   | 91409 (81308 10101) | 1727 | 5725 | 43345 | 264  |
| se17-wireguard | 1446    | 1448.2 | 290    | 720637    | 83037 (72463  924  9650)     | 49421 (16422 32999)   | 91416 (81132 10284) | 1801 | 5517 | 47903 | 268  |
| se17-wireguard | 1447    | 1449.2 | 286    | 720927    | 83038 (72480  930  9628)     | 49361 (16463 32898)   | 91630 (81243 10387) | 1807 | 5399 | 47580 | 258  |
| se17-wireguard | 1448    | 1450.2 | 288    | 720841    | 83016 (72492  933  9591)     | 49251 (16209 33042)   | 91636 (81236 10400) | 1803 | 5410 | 47783 | 260  |
| se17-wireguard | 1449    | 1449.4 | 288    | 720456    | 83065 (72459  922  9684)     | 49513 (16554 32959)   | 91479 (81171 10308) | 1786 | 5459 | 47981 | 261  |
| se17-wireguard | 1450    | 1450.3 | 288    | 720684    | 83036 (72476  915  9645)     | 49348 (16266 33082)   | 91608 (81238 10370) | 1734 | 5404 | 47932 | 254  |
| se17-wireguard | 1450    | 1450.0 | 287    | 719193    | 83193 (72428 1319  9446)     | 53567 (20562 33005)   | 91390 (81135 10255) | 1956 | 5775 | 44641 | 285  |

From the looks of it, the number of successful connections decrease somewhat as
we are approaching the 1450/s limit.  Comparing the most successful and least
successful runs with regards to the number of connects we get per location:

  - Melbourne: 3705
  - New York: 3167
  - Stockholm: 5450

These differences are mostly due to more TCP timeouts and context deadlines.

#### What does this mean

Running from three different locations at limit 1450/s finds the same number of
Onion-Location setups as all 28 measurements combined.  That's what we wanted.

Connect errors (mainly TCP timeouts and context deadline errors) increase
slightly as we use the higher limits.  This is not what we wanted.  However, the
increase in connect errors per 1M sites is only 0.3-0.5%.  These errors are
transient, and should mostly be accounted for by having 3x tries per domain.

(Each scan is running with a shuffled list, similar to our full measurement.)

**Conclusion:** scanning from three different locations at limit 1450/s strikes
a good balance between found Onion-Locations, errors, and timeliness of results.

### 2

The [ct-sans dataset][] that we will `onion-grab` in the full measurement was
collected and assembled at 2023-04-03.  It contains 0.91B unique SANs.

[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md

To avoid biases like encountering the same errors at all VMs due to the order in
which the sites were visited, the dataset is shuffled separately before use.

We did all shuffling on VM-1 because it has the most disk available.

Prepare shuffled dataset for VM-1:

    $ seed="2023-04-03-vm-1"
    $ time shuf\
          --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
	  -o vm-1.lst 2023-04-03-ct-sans/sans.lst
    
    real    13m40.637s
    user    10m30.368s
    sys     2m28.062s
    $ time sha256sum vm-1.lst
    4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b  vm-1.lst

    real    2m51.630s
    user    2m33.246s
    sys     0m11.460s

Prepare shuffled dataset for VM-2:

    $ seed="2023-04-03-vm-2"
    $ time shuf\
          --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
	  -o vm-2.lst 2023-04-03-ct-sans/sans.lst
    
    real    14m35.500s
    user    11m31.577s
    sys     2m31.447s
    $ time sha256sum vm-2.lst
    46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff  vm-2.lst

    real    3m7.084s
    user    2m36.416s
    sys     0m19.012s
    
Prepare shuffled dataset for VM-3:

    $ seed="2023-04-03-vm-3"
    $ time shuf\
          --random-source <(openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt </dev/zero 2>/dev/null)\
	  -o vm-3.lst 2023-04-03-ct-sans/sans.lst
    
    real    14m37.878s
    user    11m37.963s
    sys     2m20.373s
    $ time sha256sum vm-3.lst
    c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6  vm-3.lst

    real    3m6.324s
    user    2m36.804s
    sys     0m17.056s

Double-check that we have the expected number of lines:

    time wc -l vm-?.lst 2023-04-03-ct-sans/sans.lst
       907332515 vm-1.lst
       907332515 vm-2.lst
       907332515 vm-3.lst
       907332515 2023-04-03-ct-sans/sans.lst
      3629330060 total
    
    real    7m54.915s
    user    0m59.213s
    sys     1m25.353s

**Note:** `shuf` is memory-hungry and needs ~2x the size of the input file.  So,
anything less than ~60GiB memory will be insufficient for a 25GiB dataset.

### 3

    $ ulimit -Sn 100000
    $ ulimit -a >ulimit.txt
    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
    # sysctl -a >sysctl.txt
    $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
    $ git log | head -n1
    commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
    $ cd scripts
    $ sha256sum vm-1.lst
    4bf4b2701e01dd7886757987a2a9f2750aff677c2bd9f3e28d6ca8a1b7c25a3b  vm-1.lst
    $ git diff
    diff --git a/scripts/measure.sh b/scripts/measure.sh
    index a520c6d..269b5ad 100755
    --- a/scripts/measure.sh
    +++ b/scripts/measure.sh
    @@ -8,11 +8,11 @@
     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
     #
    
    -relay_country=se
    -relay_city=sto
    +relay_country=au
    +relay_city=mel
     limit=1450
     num_workers=10000
    -input_file=example.lst
    +input_file=vm-1.lst
     timeout_s=30
     response_max_mib=64
     metrics_interval=1h

So, we selected Melbourne relays.

    $ ./measure.sh 2>measure.stderr

### 4

    $ ulimit -Sn 100000
    $ ulimit -a >ulimit.txt
    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
    # sysctl -a >sysctl.txt
    $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
    $ git log | head -n1
    commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
    $ cd scripts
    $ sha256sum vm-2.lst
    46f6c4af1e215f2d2cfb3ee302c8e3d02f43d4d918eb42f300a818e68f73f7ff  vm-2.lst
    $ git diff
    diff --git a/scripts/measure.sh b/scripts/measure.sh
    index a520c6d..31b2f9e 100755
    --- a/scripts/measure.sh
    +++ b/scripts/measure.sh
    @@ -8,11 +8,11 @@
     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
     #
    
    -relay_country=se
    -relay_city=sto
    +relay_country=us
    +relay_city=nyc
     limit=1450
     num_workers=10000
    -input_file=example.lst
    +input_file=vm-2.lst
     timeout_s=30
     response_max_mib=64
     metrics_interval=1h

So, we selected New York relays.

    $ ./measure.sh 2>measure.stderr

### 5

    $ ulimit -Sn 100000
    $ ulimit -a >ulimit.txt
    # sysctl -w net.ipv4.ip_local_port_range="1024 65535"
    # sysctl -a >sysctl.txt
    $ go install git.cs.kau.se/rasmoste/onion-grab@v0.0.2
    $ git log | head -n1
    commit abce43c4ad9000e0c5c83d31c2185986ab8a54c9
    $ cd scripts
    $ sha256sum vm-3.lst
    c2df53320c1e7ab21355c9ebc1e53b1a8f564c9e7a2bd3e24f2cc8fca8b9eaf6  vm-3.lst
    $ git diff
    diff --git a/scripts/measure.sh b/scripts/measure.sh
    index a520c6d..4cc0913 100755
    --- a/scripts/measure.sh
    +++ b/scripts/measure.sh
    @@ -12,7 +12,7 @@ relay_country=se
     relay_city=sto
     limit=1450
     num_workers=10000
    -input_file=example.lst
    +input_file=vm-3.lst
     timeout_s=30
     response_max_mib=64
     metrics_interval=1h

So, we selected Stockholm relays (default).

    $ ./measure.sh 2>measure.stderr

### 6

Notice that Stockholm relays are "slow".  Bandwidth appear to have dropped to
1/10 of the initial part of the measurement.  Unclear if there are more errors
yet or not, and if this will sort itself out.  Adding a cron job that prints
hourly bandwidth stats every day at 23:59 to store more fine-grained data:

    $ mkdir /home/rasmoste/vnstat
    $ crontab -e

And add at the end of the file:

    59 23 * * * vnstat -h >"/home/rasmoste/vnstat/$(date)"

(Added this on all three VMs.)

### 7

(In VM-3)

Bandwidth stats:

    $ cat Tue\ Apr\ \ 4\ 11\:59\:01\ PM\ UTC\ 2023
    
     ens160  /  hourly
    
             hour        rx      |     tx      |    total    |   avg. rate
         ------------------------+-------------+-------------+---------------
         2023-04-04
             00:00     82.61 GiB |   12.78 GiB |   95.39 GiB |  227.61 Mbit/s
             01:00     80.93 GiB |   12.70 GiB |   93.63 GiB |  223.41 Mbit/s
             02:00     80.90 GiB |   12.68 GiB |   93.58 GiB |  223.30 Mbit/s
             03:00     81.13 GiB |   12.63 GiB |   93.77 GiB |  223.74 Mbit/s
             04:00     88.59 GiB |   12.97 GiB |  101.57 GiB |  242.35 Mbit/s
             05:00     85.10 GiB |   12.93 GiB |   98.04 GiB |  233.92 Mbit/s
             06:00     82.97 GiB |   12.84 GiB |   95.81 GiB |  228.61 Mbit/s
             07:00     79.05 GiB |   12.62 GiB |   91.67 GiB |  218.72 Mbit/s
             08:00     87.83 GiB |   12.81 GiB |  100.64 GiB |  240.13 Mbit/s
             09:00     81.22 GiB |   12.62 GiB |   93.84 GiB |  223.91 Mbit/s
             10:00     79.26 GiB |   12.57 GiB |   91.83 GiB |  219.12 Mbit/s
             11:00     81.70 GiB |   12.67 GiB |   94.37 GiB |  225.17 Mbit/s
             12:00     97.83 GiB |   13.21 GiB |  111.04 GiB |  264.94 Mbit/s
             13:00     82.47 GiB |   12.59 GiB |   95.06 GiB |  226.83 Mbit/s
             14:00     78.42 GiB |   11.46 GiB |   89.88 GiB |  214.45 Mbit/s
             15:00     27.42 GiB |    5.95 GiB |   33.37 GiB |   79.62 Mbit/s
             16:00     23.30 GiB |    5.37 GiB |   28.67 GiB |   68.42 Mbit/s
             17:00     28.12 GiB |    6.03 GiB |   34.15 GiB |   81.48 Mbit/s
             18:00     48.01 GiB |    8.76 GiB |   56.77 GiB |  135.46 Mbit/s
             19:00     40.23 GiB |    7.73 GiB |   47.97 GiB |  114.46 Mbit/s
             20:00     55.55 GiB |    9.63 GiB |   65.18 GiB |  155.52 Mbit/s
             21:00     35.10 GiB |    7.06 GiB |   42.16 GiB |  100.60 Mbit/s
             22:00     20.94 GiB |    5.00 GiB |   25.94 GiB |   61.91 Mbit/s
             23:00     21.19 GiB |    4.95 GiB |   26.14 GiB |   68.03 Mbit/s
         ------------------------+-------------+-------------+---------------

We were hoping that this was a transient error, but all relays in Stockholm
appear to underperform.  The rate has dropped as a result, and the number of
successes as well.  See separate data and log files in our dataset (`se-sto/`).

It will be faster, and give more accurate results, to start from a new location.

Kill: `pidof onion-grab`, `kill <PID>`.

Move `measure.stderr` to the data dir to not overwrite it when we restart.

### 8

(In VM-3.)

We experienced the same "slowness" with both Gothenburg and Malmo relays.  When
moving our measurement to Frankfurt, good bandwidth is observed again.

    diff --git a/scripts/measure.sh b/scripts/measure.sh
    index a520c6d..d46f9c1 100755
    --- a/scripts/measure.sh
    +++ b/scripts/measure.sh
    @@ -8,11 +8,11 @@
     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
     #
    
    -relay_country=se
    -relay_city=sto
    +relay_country=de
    +relay_city=fra
     limit=1450
     num_workers=10000
    -input_file=example.lst
    +input_file=vm-3.lst
     timeout_s=30
     response_max_mib=64
     metrics_interval=1h

So, we selected Frankfurt relays.

Without any other restarts in the same tmux pane as before:

    $ ./measure.sh 2>measure.stderr

### 9

The summary prints (which means that the Go receiver routine waited for an
answer for at least one timeout and shutdown) are shown in `onion-grab`'s stderr
output, however `onion-grab` hangs after that so the measure.sh script doesn't
exit.

  - VM-1 (au mel) processed up until: 907330676
  - VM-2 (us nyc) processed up until: 907330662

To be compared with the number of entries in the ct-sans dataset: 907332515.

    $ python3 -c "print(f'{907332515 - 907330676}')"
    1839
    $ python3 -c "print(f'{907332515 - 907330662}')"
    1853

So, it appears that we have ~1800 workers that were unable to provide their
final answers (most likely timeouts) before the receiver routine shutdown.  This
explains why `onion-grab` hangs, i.e., there are still workers that are waiting
to send their answers to the receiver who is not reading answers anymore.

In addition to the outstanding answers most likely being timeouts, it is not the
same ~1800 answers on all machines since the dataset was shuffled for all VMs.

**Action:** ctrl+C the measurement script that is waiting for `onion-grab` to
complete, we already have the `onion-grab` output that we want stored to disk.

### 10

Latest `onion-grab` stderr print was 2023/04/11 20:25:50, then died due to too
little memory.  Latest progress print was:

    2023/04/11 20:02:33 INFO: metrics@receiver:
    
      Processed: 819368251

So, we can safely continue without missing any sites with Onion-Location
configured by starting a new measurement from line ~819368251.

    $ python3 -c "print(f'{907332515 - 819368251}')"
    87964264
    $ tail -n87964264 vm-3.lst > vm-3-remaining.lst
    $ wc -l vm-3-remaining.lst
    87964264 vm-3-remaining.lst

### 11

Restart `onion-grab` from VM-3 with the final domain names to visit.

    $ git diff
    diff --git a/scripts/measure.sh b/scripts/measure.sh
    index a520c6d..6d77c66 100755
    --- a/scripts/measure.sh
    +++ b/scripts/measure.sh
    @@ -8,11 +8,11 @@
     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
     #
    
    -relay_country=se
    -relay_city=sto
    +relay_country=de
    +relay_city=fra
     limit=1450
     num_workers=10000
    -input_file=example.lst
    +input_file=vm-3-remaining.lst
     timeout_s=30
     response_max_mib=64
     metrics_interval=1h
    $ ./measure.sh 2>measure-remaining.stderr

(`onion-grab` results are written to a separate directory that is timestamped,
so there is no risk that the above command will overwrite any collected data.)

### 12

Latest `onion-grab` stderr print was 2023/04/11 23:43:19, then died due to too
little memory.  Latest progress print was:

    2023/04/11 23:36:31 INFO: metrics@receiver:
    
      Processed: 5217381

So, we can safely continue without missing any sites with Onion-Location
configured by starting a new measurement from line ~5217381.

    $ python3 -c "print(f'{87964264 - 5217381}')"
    82746883
    $ tail -n82746883 vm-3-remaining.lst > vm-3-remaining-2.lst
    $ wc -l vm-3-remaining-2.lst
    82746883 vm-3-remaining-2.lst

### 13

Restart `onion-grab` from VM-3 with the final domain names to visit.  However
reducing the number of workers to see if that can keep us from blowing up.  If
this doesn't work we will have to bump the amount of memory in our VM.

(The large amount of workers is anyway not necessary with low latency.)

    $ git diff
    diff --git a/scripts/measure.sh b/scripts/measure.sh
    index a520c6d..3b2e54b 100755
    --- a/scripts/measure.sh
    +++ b/scripts/measure.sh
    @@ -8,11 +8,11 @@
     # lists 8.8.8.8 and 8.8.4.4, output of uname -a and sysctl -a is ..., etc.
     #
    
    -relay_country=se
    -relay_city=sto
    +relay_country=de
    +relay_city=fra
     limit=1450
    -num_workers=10000
    -input_file=example.lst
    +num_workers=4000
    +input_file=vm-3-remaining-2.lst
     timeout_s=30
     response_max_mib=64
     metrics_interval=1h
    $ ./measure.sh 2>measure-remaining-2.stderr

### 14

Renaming and moving output in VM-1:

    $ mv data/20230403-223517 au-mel
    $ rmdir data 
    $ mv au-mel/au-mel-l1450.stderr au-mel/onion-grab.stderr
    $ mv au-mel/au-mel-l1450.stdout au-mel/onion-grab.stdout
    $ mv sysctl.txt au-mel/
    $ mv ulimit.txt au-mel/
    $ mv measure.stderr au-mel/
    $ ls -l au-mel/
    total 6992
    -rw-rw-r-- 1 rasmoste rasmoste     800 Apr  3 22:36 measure.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 3749490 Apr 11 08:21 onion-grab.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 3346026 Apr 11 04:29 onion-grab.stdout
    -rw-rw-r-- 1 rasmoste rasmoste   42500 Apr  3 22:11 sysctl.txt
    -rw-rw-r-- 1 rasmoste rasmoste     823 Apr  3 22:11 ulimit.txt
    $ mv au-mel ~/exp/onion-grab/data/2023-04-03-ct-sans/

### 15

Renaming and moving output in VM-2:

    $ mv data/20230403-223519 us-nyc
    $ rmdir data
    $ mv us-nyc/us-nyc-l1450.stdout us-nyc/onion-grab.stdout
    $ mv us-nyc/us-nyc-l1450.stderr us-nyc/onion-grab.stderr
    $ mv sysctl.txt us-nyc/
    $ mv ulimit.txt us-nyc/
    $ mv measure.stderr us-nyc/
    $ ls -l us-nyc
    total 6784
    -rw-rw-r-- 1 rasmoste rasmoste     800 Apr  3 22:35 measure.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 3553624 Apr 11 08:21 onion-grab.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 3326545 Apr 11 04:25 onion-grab.stdout
    -rw-rw-r-- 1 rasmoste rasmoste   42531 Apr  3 22:12 sysctl.txt
    -rw-rw-r-- 1 rasmoste rasmoste     823 Apr  3 22:11 ulimit.txt

Zip and checksum before moving to VM-1:

    $ zip -r us-nyc.zip us-nyc/
    $ sha256sum us-nyc.zip
    8759b8e7192390cc8f125a795c55b55ad9ecadb27344ce88004998ca89b7c4be  us-nyc.zip

Transfer to VM-1, check that checksum is OK then unzip.

### 16

Renaming an moving in VM-3:

    $ mv data/20230403-223521 se-sto
    $ mv se-sto/se-sto-l1450.stderr se-sto/onion-grab.stderr
    $ mv se-sto/se-sto-l1450.stdout se-sto/onion-grab.stdout
    $ cp ulimit.txt se-sto/
    $ cp sysctl.txt se-sto/
    $ mkdir se-sto/bw
    $ cp ~/vnstat/"Tue Apr  4 11:59:01 PM UTC 2023" se-sto/bw
    $ cp ~/vnstat/"Wed Apr  5 11:59:01 PM UTC 2023" se-sto/bw
    $ ls -l se-sto
    total 912
    drwxrwxr-x 2 rasmoste rasmoste   4096 Apr 12 16:55 bw
    -rw-rw-r-- 1 rasmoste rasmoste    801 Apr  3 22:35 measure.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 441711 Apr  5 06:36 onion-grab.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 424925 Apr  5 06:27 onion-grab.stdout
    -rw-rw-r-- 1 rasmoste rasmoste  42529 Apr 12 16:54 sysctl.txt
    -rw-rw-r-- 1 rasmoste rasmoste    823 Apr 12 16:54 ulimit.txt
    $ zip -r se-sto.zip se-sto/
    $ sha256sum se-sto.zip
    6fcd5640b1022828d19f3585b2a9c9488ce5c681a81a61c22b1bd4cbbe326b49  se-sto.zip

Move to VM-1, check checksum and unzip.

### 17

VM-1: 

    $ mv ~/vnstat au-mel/bw

Then stop the cronjob that creates bw output (`crontab -e`).

VM-2:

    $ mv ~/vnstat bw
    $ zip -r bw.zip bw/
    $ sha256sum bw.zip
    c4753326fcdb4dd136af81c1359cfe37fe6756726c497f39d3c33f799fc975f3  bw.zip

Transfer to VM-1, check checksum, unzip and put in us-nyc directory.  Then stop
the cronjob that creates bw output in VM-2 as well.

### 18

`onion-grab` hangs on shutdown similar to VM-1 and VM-2 [9].  The final summary
print shows processed until 82746708, which should be compared to the size of
82746883 (vm-3-remaining-2.lst).  I.e., 175 missing workers/answers.

Same action as in [9], ctrl+C measurement script.

### 19

Renaming and moving in VM-3, first run:

    $ mv data/20230405-070154 de-fra
    $ mv de-fra/de-fra-l1450.stderr de-fra/onion-grab.stderr
    $ mv de-fra/de-fra-l1450.stdout de-fra/onion-grab.stdout
    $ mv measure.stderr de-fra/measure.stderr
    $ mv ulimit.txt de-fra/
    $ mv sysctl.txt de-fra/

Second run:

    $ mv data/20230411-223623/de-fra-l1450.stderr de-fra/onion-grab-2.stderr
    $ mv data/20230411-223623/de-fra-l1450.stdout de-fra/onion-grab-2.stdout
    $ rmdir data/20230411-223623
    $ mv measure-remaining.stderr de-fra/measure-2.stderr

Third run:

    $ mv data/20230412-084228/de-fra-l1450.stderr de-fra/onion-grab-3.stderr
    $ mv data/20230412-084228/de-fra-l1450.stdout de-fra/onion-grab-3.stdout
    $ rmdir data/20230412-084228
    $ mv measure-remaining-2.stderr de-fra/measure-3.stderr

Grab bandwidths, exclude output from 4th since this measurement started 5th:

    $ rm ~/vnstat/"Tue Apr  4 11:59:01 PM UTC 2023"
    $ vnstat -h >"/home/rasmoste/vnstat/$(date)"
    $ mv ~/vnstat de-fra/bw

Overview:

    $ ls -l de-fra
    total 6768
    drwxrwxr-x 2 rasmoste rasmoste    4096 Apr 13 05:39 bw
    -rw-rw-r-- 1 rasmoste rasmoste    1019 Apr 11 23:43 measure-2.stderr
    -rw-rw-r-- 1 rasmoste rasmoste     810 Apr 12 08:42 measure-3.stderr
    -rw-rw-r-- 1 rasmoste rasmoste    1009 Apr 11 20:25 measure.stderr
    -rw-rw-r-- 1 rasmoste rasmoste   24004 Apr 11 23:43 onion-grab-2.stderr
    -rw-rw-r-- 1 rasmoste rasmoste   23002 Apr 11 23:42 onion-grab-2.stdout
    -rw-rw-r-- 1 rasmoste rasmoste  318627 Apr 13 05:38 onion-grab-3.stderr
    -rw-rw-r-- 1 rasmoste rasmoste  312774 Apr 13 00:34 onion-grab-3.stdout
    -rw-rw-r-- 1 rasmoste rasmoste 3117995 Apr 11 20:25 onion-grab.stderr
    -rw-rw-r-- 1 rasmoste rasmoste 3034130 Apr 11 20:25 onion-grab.stdout
    -rw-rw-r-- 1 rasmoste rasmoste   42529 Apr  3 22:12 sysctl.txt
    -rw-rw-r-- 1 rasmoste rasmoste     823 Apr  3 22:11 ulimit.txt

Then stop the cronjob that creates bw outputs (`crontab -e`).

Zip, checksum, and transfer to VM-1:

    $ zip -r de-fra.zip de-fra/
    $ sha256sum de-fra.zip
    2ea1f053decea3915b29bc60c2f954da55ea48f6d8ab9f47112caddf3a2e2f7f  de-fra.zip