From 4179fc6d46c33a12bccbddcffe7957f18a9cc882 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 11 Mar 2025 20:17:44 +0100 Subject: Fix typo --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 4ed4db5..2090ff1 100644 --- a/README.md +++ b/README.md @@ -114,8 +114,8 @@ In other words, the digest script prints some information and writes two files: possibly with subdomains, paths, etc., that were removed. Such pruning of the set Onion-Location values is useful to estimate the number of onions. -See [scripts/test.sh](./scripts/test.sh) and if you are looking to test -different `onion-grab` configuration. You may find +See [scripts/test.sh](./scripts/test.sh) if you are looking to test different +`onion-grab` configurations. You may find [scripts/measure.sh](scripts/measure.sh) to be a useful measurement script. ## Running a larger measurement -- cgit v1.2.3 From 8d328e47834f097bf6174dd95794a8bb2ba12885 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 11 Mar 2025 20:19:41 +0100 Subject: Fix broken links --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 2090ff1..507a894 100644 --- a/README.md +++ b/README.md @@ -120,11 +120,11 @@ See [scripts/test.sh](./scripts/test.sh) if you are looking to test different ## Running a larger measurement -See [docs/operations.md](TODO) +See [docs/operations.md](./docs/operations.md) for measurements of [Tranco top-1M][] and [ct-sans][]. [Tranco top-1M]: https://tranco-list.eu/latest_list -[ct-sans]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md +[ct-sans]: https://git.rgdd.se/ct-sans/about/docs/operations.md ## Contact -- cgit v1.2.3 From 7a113f9d4654a68c0752d383a0e8eda3880eec11 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 11 Mar 2025 20:28:21 +0100 Subject: Update summary in operations timeline in light of [20] --- docs/operations.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/operations.md b/docs/operations.md index 1e437f6..ba5e7f0 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -14,8 +14,10 @@ unique two-label `.onion` domains were found from 285 Onion-Location sites. The time to conduct the full measurement for [SANs in CT logs][] was ~10 days. 3330 unique two-label `.onion` domains were configured from 26937 unique sites. -13956 of those unique sites have the same Onion-Location configuration as -Twitter, which likely means that they copied some of their HTML attributes. +13956 of those "unique sites" had the same Onion-Location configuration as +Twitter. At first this was surprising, but it was eventually explained by +onion-grab following redirects without attributing the configured Onion-Location +with the redirected destination, see [20] towards the bottom of the timeline. The collected data sets are available here: -- cgit v1.2.3 From 97cfbcc98ee07ee37504b983a09a579069b05c43 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 11 Mar 2025 20:31:14 +0100 Subject: Set final dataset links and add sha256 checksums --- docs/operations.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/operations.md b/docs/operations.md index ba5e7f0..103969f 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -21,8 +21,10 @@ with the redirected destination, see [20] towards the bottom of the timeline. The collected data sets are available here: - - https://dart.cse.kau.se/onion-grab/2023-04-03-tranco.zip - - https://dart.cse.kau.se/onion-grab/2023-04-03-ct-sans.zip + - , + `sha256sum` 1f4a0b4009486bce83262f8e3a58ec50757c3f49305cfa427dadbb10dc4b8c1b + - , + `sha256sum` 8d476da6077c7bff2c0afbe444344c9549ad0d1b64cacfd525a7c65dec68529c For further information about system configurations and operations, read on. -- cgit v1.2.3 From c02eedad2906c34e2b5c8102a91f5342a6d79bb8 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Tue, 11 Mar 2025 20:36:36 +0100 Subject: Fix links that point at obsolete repos --- docs/operations.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/operations.md b/docs/operations.md index 103969f..a7785f1 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -5,7 +5,7 @@ about the local systems and a timeline for our operations leading up to the results for [Tranco top-1m][] and [SANs in CT logs][] during April, 2023. [Tranco top-1m]: https://tranco-list.eu/ -[SANs in CT logs]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md +[SANs in CT logs]: https://git.rgdd.se/ct-sans/about/docs/operations.md ## Summary @@ -274,7 +274,7 @@ a good balance between found Onion-Locations, errors, and timeliness of results. The [ct-sans dataset][] that we will `onion-grab` in the full measurement was collected and assembled at 2023-04-03. It contains 0.91B unique SANs. -[ct-sans dataset]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md +[ct-sans dataset]: https://git.rgdd.se/ct-sans/about/docs/operations.md To avoid biases like encountering the same errors at all VMs due to the order in which the sites were visited, the dataset is shuffled separately before use. -- cgit v1.2.3