aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus@rgdd.se>2025-03-11 19:43:25 +0000
committerRasmus Dahlberg <rasmus@rgdd.se>2025-03-11 19:43:25 +0000
commitc6201722350800ee6f2cb457d4db5a09df5d85cc (patch)
treec62883d3489053985fef7c1895a1f0e8190e3b34
parent970a3582e772ab878df324753515a650cede18d8 (diff)
parentc02eedad2906c34e2b5c8102a91f5342a6d79bb8 (diff)
Merge branch 'rgdd/minor-nits' into 'main'
Fix a few minor nits See merge request tpo/onion-services/onion-grab!3
-rw-r--r--README.md8
-rw-r--r--docs/operations.md16
2 files changed, 14 insertions, 10 deletions
diff --git a/README.md b/README.md
index 4ed4db5..507a894 100644
--- a/README.md
+++ b/README.md
@@ -114,17 +114,17 @@ In other words, the digest script prints some information and writes two files:
possibly with subdomains, paths, etc., that were removed. Such pruning of
the set Onion-Location values is useful to estimate the number of onions.
-See [scripts/test.sh](./scripts/test.sh) and if you are looking to test
-different `onion-grab` configuration. You may find
+See [scripts/test.sh](./scripts/test.sh) if you are looking to test different
+`onion-grab` configurations. You may find
[scripts/measure.sh](scripts/measure.sh) to be a useful measurement script.
## Running a larger measurement
-See [docs/operations.md](TODO)
+See [docs/operations.md](./docs/operations.md)
for measurements of [Tranco top-1M][] and [ct-sans][].
[Tranco top-1M]: https://tranco-list.eu/latest_list
-[ct-sans]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+[ct-sans]: https://git.rgdd.se/ct-sans/about/docs/operations.md
## Contact
diff --git a/docs/operations.md b/docs/operations.md
index 1e437f6..a7785f1 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -5,7 +5,7 @@ about the local systems and a timeline for our operations leading up to the
results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023.
[Tranco top-1m]: https://tranco-list.eu/
-[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+[SANs in CT logs]: https://git.rgdd.se/ct-sans/about/docs/operations.md
## Summary
@@ -14,13 +14,17 @@ unique two-label `.onion` domains were found from 285 Onion-Location sites.
The time to conduct the full measurement for [SANs in CT logs][] was ~10 days.
3330 unique two-label `.onion` domains were configured from 26937 unique sites.
-13956 of those unique sites have the same Onion-Location configuration as
-Twitter, which likely means that they copied some of their HTML attributes.
+13956 of those "unique sites" had the same Onion-Location configuration as
+Twitter. At first this was surprising, but it was eventually explained by
+onion-grab following redirects without attributing the configured Onion-Location
+with the redirected destination, see [20] towards the bottom of the timeline.
The collected data sets are available here:
- - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip
- - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip
+ - <https://dart.cse.kau.se/ol-measurements-and-fp/onion-grab/2023-04-03-tranco.zip>,
+ `sha256sum` 1f4a0b4009486bce83262f8e3a58ec50757c3f49305cfa427dadbb10dc4b8c1b
+ - <https://dart.cse.kau.se/ol-measurements-and-fp/onion-grab/2023-04-03-ct-sans.zip>,
+ `sha256sum` 8d476da6077c7bff2c0afbe444344c9549ad0d1b64cacfd525a7c65dec68529c
For further information about system configurations and operations, read on.
@@ -270,7 +274,7 @@ a good balance between found Onion-Locations, errors, and timeliness of results.
The [ct-sans dataset][] that we will `onion-grab` in the full measurement was
collected and assembled at 2023-04-03. It contains 0.91B unique SANs.
-[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
+[ct-sans dataset]: https://git.rgdd.se/ct-sans/about/docs/operations.md
To avoid biases like encountering the same errors at all VMs due to the order in
which the sites were visited, the dataset is shuffled separately before use.