aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus@rgdd.se>2025-03-11 19:06:40 +0000
committerRasmus Dahlberg <rasmus@rgdd.se>2025-03-11 19:06:40 +0000
commit970a3582e772ab878df324753515a650cede18d8 (patch)
tree8ad90042ae02357711bb2b20e469016ee0e90783
parent0cfc1759e5548dcfb133a6268c5d49976871cd43 (diff)
parent1114087cfbb9425f4128e27741d46161530be8e5 (diff)
Merge branch 'rgdd/fix-redirect-bug' into 'main'
fix: Attribute OL with the domain we arrived at See merge request tpo/onion-services/onion-grab!2
-rw-r--r--README.md4
-rw-r--r--docs/operations.md18
-rw-r--r--main.go9
-rwxr-xr-xscripts/digest2.py9
4 files changed, 37 insertions, 3 deletions
diff --git a/README.md b/README.md
index 9f357b3..4ed4db5 100644
--- a/README.md
+++ b/README.md
@@ -78,6 +78,10 @@ Sites with Onion-Location are printed to stdout, here showing that
`www.torproject.org` configures it with an HTTP header while `www.qubes-os.org`
does it with an HTML attribute. All three sites connected successfully.
+In case that onion-grab is redirected, e.g., from `example.org` to
+`www.example.org`, then any found Onion-Location configuration is associated
+with the *redirected domain name* rather than the original domain name.
+
In case of errors, the type of error is identified with relatively few `???`.
### Scripts
diff --git a/docs/operations.md b/docs/operations.md
index 9c213d6..1e437f6 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -131,6 +131,7 @@ In the full measurement, we had to replace Stockholm with Frankfurt (see notes).
| 2023/04/13 | 05:40 | prepare dataset (de fra) | moving files on VM-3, transfer to VM-1 [19] |
| 2023/04/13 | 05:50 | experiment is completed | datasets are ready, zipped, and documented |
| 2023/07/06 | | move source to tpo gitlab | git.cs.kau.se/rasmoste is not a stable home |
+| 2024/07/16 | | onion-grab bug report | wrt. how redirects are followed [20] |
## Notes
@@ -824,3 +825,20 @@ Zip, checksum, and transfer to VM-1:
$ zip -r de-fra.zip de-fra/
$ sha256sum de-fra.zip
2ea1f053decea3915b29bc60c2f954da55ea48f6d8ab9f47112caddf3a2e2f7f de-fra.zip
+
+### 20
+
+Pier found that onion-grab follows redirects without correctly attributing the
+Onion-Location configuration to the destination it was redirected to, see:
+
+ - <https://gitlab.torproject.org/tpo/onion-services/onion-grab/-/issues/1>
+
+This explained an anomaly where it looked like a lot of sites were, e.g.,
+configuring Twitter/X's Onion-Location when in fact they were redirecting.
+
+Use `scripts/digest2.py` to get a cleaner picture of the distribution of sites
+that use HTTP and HTML for configuring Onion-Location. Since this was found
+before the camera-ready deadline, we were able to update ยง4.1.2 accordingly.
+
+To avoid this bug in the future, onion-grab was patched on 2025-05-11 to (still)
+follow redirects but then associate Onion-Location with the final destination.
diff --git a/main.go b/main.go
index 539e86f..72f6dd1 100644
--- a/main.go
+++ b/main.go
@@ -129,6 +129,15 @@ func work(ctx context.Context, cli *http.Client, timeout time.Duration, question
}
defer rsp.Body.Close()
+ // We're using an HTTP client that follows redirects. Ensure that we're
+ // attributing any found Onion-Location header with the domain name we were
+ // redirected too, rather than the original domain name that we started
+ // from. If there are no redirects this assignment doesn't change anything.
+ //
+ // More details on why this behavior matters:
+ // https://gitlab.torproject.org/tpo/onion-services/onion-grab/-/issues/1
+ answer.Domain = rsp.Request.URL.Host
+
onion, ok := onionloc.HTTP(rsp)
if ok {
answer.HTTP = onion
diff --git a/scripts/digest2.py b/scripts/digest2.py
index d01293b..0e81840 100755
--- a/scripts/digest2.py
+++ b/scripts/digest2.py
@@ -2,9 +2,12 @@
__program_description ='''
A script that digests the output of onion-grab. Meant to be used for sorting
-out the number of onion addresses and how they were discovered via O-L. It
-is digest "2" because this was added after discovering a redirect bug. So,
-this output gives a better view of how common HTTP and HTML config really is.
+out the number of onion addresses and how they were discovered via O-L. It is
+digest "2" because this was added after discovering a redirect bug. So, this
+output gives a better view of how common HTTP and HTML config really is because
+here we're doing the analysis based solely on *unique onion addresses*.
+
+Consider using this script if you need to look at data collected before v0.1.0.
'''
import sys