# onion-grab A tool that visits a list of domains over HTTPS to see if they have [Onion-Location][] configured. [Onion-Location]: https://community.torproject.org/onion-services/advanced/onion-location/ **Warning:** research prototype. The source code may also be moved. ## Quickstart You will need a Go compiler on the local system: $ which go >/dev/null || echo "Go compiler is not in PATH" Install `onion-grab`: $ go install git.cs.kau.se/rasmoste/onion-grab@latest List all options: $ onion-grab -h ### Basic usage Store domains in a file; one domain per line: $ cat domains.lst www.eff.org www.qubes-os.org www.torproject.org $ onion-grab -i domains.lst 2023/03/25 17:43:30 INFO: starting await handler, ctrl+C to exit 2023/03/25 17:43:30 INFO: starting 2 workers 2023/03/25 17:43:30 INFO: starting work aggregator 2023/03/25 17:43:30 INFO: generating work www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/ www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute= 2023/03/25 17:43:40 INFO: about to exit, reading remaining answers 2023/03/25 17:43:50 SUMMARY: 3/3 connected, 2 sites configured Onion-Location Sites with Onion-Location are printed to stdout, here showing that `www.torproject.org` configures it with an HTTP header while `www.qubes-os.org` does it with an HTML attribute. All three sites connected successfully. ### Working with a larger list Below the [Tranco top-1M][] list is used as an example; 100 workers are specified, metrics are printed every 15s, and sanity-checks against a site with Onion-Location which should be reachable are carried out every 60s. $ cut -d ',' -f2 top-1m.csv > top-1m.lst $ onion-grab -i top-1m.lst -w 100 -m 15s -C 60s -c rgdd.se 2023/03/25 17:44:20 INFO: starting await handler, ctrl+C to exit 2023/03/25 17:44:20 INFO: starting checker 2023/03/25 17:44:20 INFO: starting 100 workers 2023/03/25 17:44:20 INFO: starting work aggregator 2023/03/25 17:44:20 INFO: generating work nytimes.com header=https://www.nytimesn7cgmftshazwhfgzm37qxb44r64ytbb2dj3x62d2lljsciiyd.onion/ attribute= twitter.com header= attribute=https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/ theguardian.com header=https://www.guardian2zotagl6tmjucg3lrhxdk4dw3lhbqnkvvkywawy3oqfoprid.onion/international attribute= 2023/03/25 17:44:22 Transport: unhandled response frame type *http.http2UnknownFrame 2023/03/25 17:44:31 Transport: unhandled response frame type *http.http2UnknownFrame dw.com header=https://www.dwnewsgngmhlplxy6o2twtfgjnrnjxbegbwqx6wnotdhkzt562tszfid.onion/ attribute= brave.com header=https://brave4u7jddbv7cyviptqjc7jusxh72uik7zt6adtckl5f4nwy2v72qd.onion/index.html attribute= 2023/03/25 17:44:35 INFO: currently 72.3 sites/s, 72.3 sites/s since start guardian.co.uk header=https://www.guardian2zotagl6tmjucg3lrhxdk4dw3lhbqnkvvkywawy3oqfoprid.onion/international attribute= proton.me header=https://protonmailrmez3lotccipshtkleegetolb73fuirgj7r4o4vfu7ozyd.onion/ attribute= voanews.com header=https://www.voanews5aitmne6gs2btokcacixclgfl43cv27sirgbauyyjylwpdtqd.onion/ attribute= 2023/03/25 17:44:50 INFO: currently 64.3 sites/s, 68.3 sites/s since start ^C2023/03/25 17:44:51 INFO: about to exit, reading remaining answers 2023/03/25 17:44:51 NOTICE: only read up until line 2089 2023/03/25 17:45:01 SUMMARY: 1488/2089 connected, 8 sites configured Onion-Location [Tranco top-1M]: https://tranco-list.eu/latest_list Note that `ctrl+C` can be used to exit early as shown above. To continue from where you left off (line `2089`), specify the `-n` option on the next run: $ onion-grab -i top-1m.lst -w 100 -m 15s -C 60s -c rgdd.se -n 2089 2023/03/25 17:45:57 INFO: starting await handler, ctrl+C to exit 2023/03/25 17:45:57 INFO: starting checker 2023/03/25 17:45:57 INFO: starting 100 workers 2023/03/25 17:45:57 INFO: starting work aggregator 2023/03/25 17:45:57 INFO: generating work cia.gov header= attribute=http://ciadotgov4sjwlzihbbgxnqg3xiyrg7so2r2o3lt5wz5ypk4sxyjstad.onion 2023/03/25 17:46:12 INFO: currently 79.9 sites/s, 79.9 sites/s since start propublica.org header=http://p53lf57qovyuvwsc6xnrppyply3vtqm7l6pcobkmyqsiofyeznfu5uqd.onion/ attribute= theintercept.com header=https://54dus3ggt7uxz7wjvhkia2ntxmz5lkhbvgohrwur43trt3d6vrcvfmqd.onion/ attribute= torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute= 2023/03/25 17:46:27 INFO: currently 75.1 sites/s, 77.5 sites/s since start ^C2023/03/25 17:46:28 INFO: about to exit, reading remaining answers 2023/03/25 17:46:28 NOTICE: only read up until line 4487 (line 2398 relative to start) 2023/03/25 17:46:38 SUMMARY: 1609/2399 connected, 4 sites configured Onion-Location ## Known issues ### Too many parallel workers **TODO:** update, ripped-out -c and -C. Here's what would happen if the local system cannot handle the number of workers: $ onion-grab -i top-1m.lst -w 1000 -m 15s -C 60s -c rgdd.se 2023/03/25 17:47:36 INFO: starting await handler, ctrl+C to exit 2023/03/25 17:47:36 INFO: starting checker 2023/03/25 17:47:36 INFO: starting 1000 workers 2023/03/25 17:47:36 INFO: starting work aggregator 2023/03/25 17:47:36 INFO: generating work 2023/03/25 17:47:36 Transport: unhandled response frame type *http.http2UnknownFrame twitter.com header= attribute=https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/ 2023/03/25 17:47:37 Transport: unhandled response frame type *http.http2UnknownFrame brave.com header=https://brave4u7jddbv7cyviptqjc7jusxh72uik7zt6adtckl5f4nwy2v72qd.onion/index.html attribute= 2023/03/25 17:47:51 INFO: currently 151.3 sites/s, 151.3 sites/s since start 2023/03/25 17:48:06 INFO: currently 78.1 sites/s, 114.7 sites/s since start 2023/03/25 17:48:21 INFO: currently 121.9 sites/s, 117.1 sites/s since start 2023/03/25 17:48:36 INFO: currently 78.1 sites/s, 107.4 sites/s since start 2023/03/25 17:48:46 ERROR: checker expected onion for {Domain:rgdd.se OK:false HTTP: HTML:} 2023/03/25 17:48:46 NOTICE: only read up until line 7442 2023/03/25 17:48:46 INFO: about to exit, reading remaining answers 2023/03/25 17:48:56 SUMMARY: 232/7442 connected, 2 sites configured Onion-Location This is most likely an OS problem; not an onion-grab problem. Debug hints: - Stop and disable `systemd-resolved`, then specify a recursive resolver that can handle the expected load. - You may need to tinker with kernel tunables, see `ulimit -a` and `sysctl -a` for what can be configured. For example, if you find that the error is caused by too many open files, try increasing the value of `ulimit -n`. **Credit:** Björn Töpel helped debugging this issue. **Note:** domains with Onion-Location are likely to be missed if `-n 7442` is used here in a subsequent run. For example, with `-C 60s` and an average of 100 domains/s, it would be wise to roll-back _at least_ 6000 lines. This should be a last-resort option, and is mainly here to sanity-check long measurements. ### Misc notes We use the default `net.Dial` function, which in turn uses [goLookupIPCNAMEOrder][] for DNS lookups with the recursive name servers in `/etc/resolve.conf`. For example, with $ cat /etc/resolve.conf nameserver 8.8.8.8 nameserver 8.8.4.4 the query will first be directed to `8.8.8.8`; then `8.8.4.4` if no valid answer is available yet ([lines 663-778][]). [goLookupIPCNAMEOrder]: https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/net/dnsclient_unix.go#L595-L804 [lines 663-778]: https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/net/dnsclient_unix.go#L663-L778 ## Contact - rasmus (at) rgdd (dot) se ## Licence BSD 2-Clause License