diff options
author | Rasmus Dahlberg <rasmus@rgdd.se> | 2025-03-11 19:43:25 +0000 |
---|---|---|
committer | Rasmus Dahlberg <rasmus@rgdd.se> | 2025-03-11 19:43:25 +0000 |
commit | c6201722350800ee6f2cb457d4db5a09df5d85cc (patch) | |
tree | c62883d3489053985fef7c1895a1f0e8190e3b34 | |
parent | 970a3582e772ab878df324753515a650cede18d8 (diff) | |
parent | c02eedad2906c34e2b5c8102a91f5342a6d79bb8 (diff) |
Merge branch 'rgdd/minor-nits' into 'main'
Fix a few minor nits
See merge request tpo/onion-services/onion-grab!3
-rw-r--r-- | README.md | 8 | ||||
-rw-r--r-- | docs/operations.md | 16 |
2 files changed, 14 insertions, 10 deletions
@@ -114,17 +114,17 @@ In other words, the digest script prints some information and writes two files: possibly with subdomains, paths, etc., that were removed. Such pruning of the set Onion-Location values is useful to estimate the number of onions. -See [scripts/test.sh](./scripts/test.sh) and if you are looking to test -different `onion-grab` configuration. You may find +See [scripts/test.sh](./scripts/test.sh) if you are looking to test different +`onion-grab` configurations. You may find [scripts/measure.sh](scripts/measure.sh) to be a useful measurement script. ## Running a larger measurement -See [docs/operations.md](TODO) +See [docs/operations.md](./docs/operations.md) for measurements of [Tranco top-1M][] and [ct-sans][]. [Tranco top-1M]: https://tranco-list.eu/latest_list -[ct-sans]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md +[ct-sans]: https://git.rgdd.se/ct-sans/about/docs/operations.md ## Contact diff --git a/docs/operations.md b/docs/operations.md index 1e437f6..a7785f1 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -5,7 +5,7 @@ about the local systems and a timeline for our operations leading up to the results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023. [Tranco top-1m]: https://tranco-list.eu/ -[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md +[SANs in CT logs]: https://git.rgdd.se/ct-sans/about/docs/operations.md ## Summary @@ -14,13 +14,17 @@ unique two-label `.onion` domains were found from 285 Onion-Location sites. The time to conduct the full measurement for [SANs in CT logs][] was ~10 days. 3330 unique two-label `.onion` domains were configured from 26937 unique sites. -13956 of those unique sites have the same Onion-Location configuration as -Twitter, which likely means that they copied some of their HTML attributes. +13956 of those "unique sites" had the same Onion-Location configuration as +Twitter. At first this was surprising, but it was eventually explained by +onion-grab following redirects without attributing the configured Onion-Location +with the redirected destination, see [20] towards the bottom of the timeline. The collected data sets are available here: - - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip - - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip + - <https://dart.cse.kau.se/ol-measurements-and-fp/onion-grab/2023-04-03-tranco.zip>, + `sha256sum` 1f4a0b4009486bce83262f8e3a58ec50757c3f49305cfa427dadbb10dc4b8c1b + - <https://dart.cse.kau.se/ol-measurements-and-fp/onion-grab/2023-04-03-ct-sans.zip>, + `sha256sum` 8d476da6077c7bff2c0afbe444344c9549ad0d1b64cacfd525a7c65dec68529c For further information about system configurations and operations, read on. @@ -270,7 +274,7 @@ a good balance between found Onion-Locations, errors, and timeliness of results. The [ct-sans dataset][] that we will `onion-grab` in the full measurement was collected and assembled at 2023-04-03. It contains 0.91B unique SANs. -[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md +[ct-sans dataset]: https://git.rgdd.se/ct-sans/about/docs/operations.md To avoid biases like encountering the same errors at all VMs due to the order in which the sites were visited, the dataset is shuffled separately before use. |