diff options
author | Rasmus Dahlberg <rasmus@rgdd.se> | 2025-03-11 19:06:40 +0000 |
---|---|---|
committer | Rasmus Dahlberg <rasmus@rgdd.se> | 2025-03-11 19:06:40 +0000 |
commit | 970a3582e772ab878df324753515a650cede18d8 (patch) | |
tree | 8ad90042ae02357711bb2b20e469016ee0e90783 | |
parent | 0cfc1759e5548dcfb133a6268c5d49976871cd43 (diff) | |
parent | 1114087cfbb9425f4128e27741d46161530be8e5 (diff) |
Merge branch 'rgdd/fix-redirect-bug' into 'main'
fix: Attribute OL with the domain we arrived at
See merge request tpo/onion-services/onion-grab!2
-rw-r--r-- | README.md | 4 | ||||
-rw-r--r-- | docs/operations.md | 18 | ||||
-rw-r--r-- | main.go | 9 | ||||
-rwxr-xr-x | scripts/digest2.py | 9 |
4 files changed, 37 insertions, 3 deletions
@@ -78,6 +78,10 @@ Sites with Onion-Location are printed to stdout, here showing that `www.torproject.org` configures it with an HTTP header while `www.qubes-os.org` does it with an HTML attribute. All three sites connected successfully. +In case that onion-grab is redirected, e.g., from `example.org` to +`www.example.org`, then any found Onion-Location configuration is associated +with the *redirected domain name* rather than the original domain name. + In case of errors, the type of error is identified with relatively few `???`. ### Scripts diff --git a/docs/operations.md b/docs/operations.md index 9c213d6..1e437f6 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -131,6 +131,7 @@ In the full measurement, we had to replace Stockholm with Frankfurt (see notes). | 2023/04/13 | 05:40 | prepare dataset (de fra) | moving files on VM-3, transfer to VM-1 [19] | | 2023/04/13 | 05:50 | experiment is completed | datasets are ready, zipped, and documented | | 2023/07/06 | | move source to tpo gitlab | git.cs.kau.se/rasmoste is not a stable home | +| 2024/07/16 | | onion-grab bug report | wrt. how redirects are followed [20] | ## Notes @@ -824,3 +825,20 @@ Zip, checksum, and transfer to VM-1: $ zip -r de-fra.zip de-fra/ $ sha256sum de-fra.zip 2ea1f053decea3915b29bc60c2f954da55ea48f6d8ab9f47112caddf3a2e2f7f de-fra.zip + +### 20 + +Pier found that onion-grab follows redirects without correctly attributing the +Onion-Location configuration to the destination it was redirected to, see: + + - <https://gitlab.torproject.org/tpo/onion-services/onion-grab/-/issues/1> + +This explained an anomaly where it looked like a lot of sites were, e.g., +configuring Twitter/X's Onion-Location when in fact they were redirecting. + +Use `scripts/digest2.py` to get a cleaner picture of the distribution of sites +that use HTTP and HTML for configuring Onion-Location. Since this was found +before the camera-ready deadline, we were able to update ยง4.1.2 accordingly. + +To avoid this bug in the future, onion-grab was patched on 2025-05-11 to (still) +follow redirects but then associate Onion-Location with the final destination. @@ -129,6 +129,15 @@ func work(ctx context.Context, cli *http.Client, timeout time.Duration, question } defer rsp.Body.Close() + // We're using an HTTP client that follows redirects. Ensure that we're + // attributing any found Onion-Location header with the domain name we were + // redirected too, rather than the original domain name that we started + // from. If there are no redirects this assignment doesn't change anything. + // + // More details on why this behavior matters: + // https://gitlab.torproject.org/tpo/onion-services/onion-grab/-/issues/1 + answer.Domain = rsp.Request.URL.Host + onion, ok := onionloc.HTTP(rsp) if ok { answer.HTTP = onion diff --git a/scripts/digest2.py b/scripts/digest2.py index d01293b..0e81840 100755 --- a/scripts/digest2.py +++ b/scripts/digest2.py @@ -2,9 +2,12 @@ __program_description =''' A script that digests the output of onion-grab. Meant to be used for sorting -out the number of onion addresses and how they were discovered via O-L. It -is digest "2" because this was added after discovering a redirect bug. So, -this output gives a better view of how common HTTP and HTML config really is. +out the number of onion addresses and how they were discovered via O-L. It is +digest "2" because this was added after discovering a redirect bug. So, this +output gives a better view of how common HTTP and HTML config really is because +here we're doing the analysis based solely on *unique onion addresses*. + +Consider using this script if you need to look at data collected before v0.1.0. ''' import sys |