1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
|
# onion-grab
A tool that visits a list of domains over HTTPS to see if they have
[Onion-Location][] configured.
[Onion-Location]: https://community.torproject.org/onion-services/advanced/onion-location/
**Warning:** research prototype.
## Quickstart
### Install
You will need a [Go compiler][] on the local system:
$ which go >/dev/null || echo "Go compiler is not in PATH"
[Go compiler]: https://go.dev/doc/install
Install `onion-grab`:
$ go install gitlab.torproject.org/tpo/onion-services/onion-grab@latest
List all options:
$ onion-grab -h
### Basic usage
Store one domain per line in a file:
$ cat domains.lst
www.eff.org
www.qubes-os.org
www.torproject.org
Run onion-grab with default parameters:
$ onion-grab -i domains.lst
2023/04/07 20:29:45 INFO: ctrl+C to exit prematurely
2023/04/07 20:29:45 INFO: starting 128 workers with limit 64/s
2023/04/07 20:29:45 INFO: starting work receiver
2023/04/07 20:29:45 INFO: starting work generator
www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
2023/04/07 20:29:50 INFO: metrics@receiver:
Processed: 3
Success: 3 (Onion-Location:2)
Failure: 0 (See breakdown below)
Req: 0 (Before sending request)
DNS: 0 (NotFound:0 Timeout:0 Other:0)
TCP: 0 (Timeout:0 Syscall:0)
TLS: 0 (Cert:0 Other:0)
3xx: 0 (Too many redirects)
EOF: 0 (Unclear meaning)
CTX: 0 (Deadline exceeded)
???: 0 (Other errors)
2023/04/07 20:29:51 INFO: about to exit in at most 11s, reading remaining answers
2023/04/07 20:29:57 INFO: metrics@receiver: summary:
Processed: 3
Success: 3 (Onion-Location:2)
Failure: 0 (See breakdown below)
Req: 0 (Before sending request)
DNS: 0 (NotFound:0 Timeout:0 Other:0)
TCP: 0 (Timeout:0 Syscall:0)
TLS: 0 (Cert:0 Other:0)
3xx: 0 (Too many redirects)
EOF: 0 (Unclear meaning)
CTX: 0 (Deadline exceeded)
???: 0 (Other errors)
2023/04/07 20:29:57 INFO: measurement duration was 12s
Sites with Onion-Location are printed to stdout, here showing that
`www.torproject.org` configures it with an HTTP header while `www.qubes-os.org`
does it with an HTML attribute. All three sites connected successfully.
In case that onion-grab is redirected, e.g., from `example.org` to
`www.example.org`, then any found Onion-Location configuration is associated
with the *redirected domain name* rather than the original domain name.
In case of errors, the type of error is identified with relatively few `???`.
### Scripts
Digest the results, here stored as `onion-grab.stdout`:
$ cat onion-grab.stdout
www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
$ ./scripts/digest.py -i onion-grab.stdout
digest.py:25 INFO: found 1 HTTP headers with Onion-Location
digest.py:26 INFO: found 1 HTML meta attributes with Onion-Location
digest.py:27 INFO: found 2 unqiue domain names that set Onion-Location
digest.py:28 INFO: found 2 unique two-label onion addresses in the process
digest.py:30 INFO: storing domains with valid Onion-Location configurations in domains.txt
digest.py:35 INFO: storing two-label onion addresses that domains referenced in onions.txt
$ cat domains.txt
www.qubes-os.org http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
www.torproject.org http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html
$ cat onions.txt
qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion www.qubes-os.org
2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion www.torproject.org
In other words, the digest script prints some information and writes two files:
- `domains.txt`: domains that configured valid Onion-Location headers. The
listed Onion-Location values are de-duplicated and space-separated.
- `onions.txt`: two-label `.onion` addresses that were discovered. The listed
domains referenced this address in their Onion-Location configuration,
possibly with subdomains, paths, etc., that were removed. Such pruning of
the set Onion-Location values is useful to estimate the number of onions.
See [scripts/test.sh](./scripts/test.sh) and if you are looking to test
different `onion-grab` configuration. You may find
[scripts/measure.sh](scripts/measure.sh) to be a useful measurement script.
## Running a larger measurement
See [docs/operations.md](TODO)
for measurements of [Tranco top-1M][] and [ct-sans][].
[Tranco top-1M]: https://tranco-list.eu/latest_list
[ct-sans]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md
## Contact
- rasmus (at) rgdd (dot) se
## Licence
BSD 2-Clause License
|