From 3066034cdf6b577fc15e9f389f01bdee7b8f8537 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sat, 25 Mar 2023 17:58:55 +0100 Subject: Add drafty README --- README.md | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 136 insertions(+), 2 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 6f4314a..8356ba4 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,137 @@ -# find-onion +# onion-grab -docdoc +A tool that visits a list of domains over HTTPS to see if they have +[Onion-Location][] configured. + +[Onion-Location]: https://community.torproject.org/onion-services/advanced/onion-location/ + +**Warning:** research prototype. The source code may also be moved. + +## Quickstart + +You will need a Go compiler on the local system: + + $ which go >/dev/null || echo "Go compiler is not in PATH" + +Install `onion-grab`: + + $ go install git.cs.kau.se/rasmoste/onion-grab@latest + +List all options: + + $ onion-grab -h + +### Basic usage + +Store domains in a file; one domain per line: + + $ cat domains.lst + www.eff.org + www.qubes-os.org + www.torproject.org + $ onion-grab -i domains.lst + 2023/03/25 17:43:30 INFO: starting await handler, ctrl+C to exit + 2023/03/25 17:43:30 INFO: starting 2 workers + 2023/03/25 17:43:30 INFO: starting work aggregator + 2023/03/25 17:43:30 INFO: generating work + www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/ + www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute= + 2023/03/25 17:43:40 INFO: about to exit, reading remaining answers + 2023/03/25 17:43:50 SUMMARY: 3/3 connected, 2 sites configured Onion-Location + +Sites with Onion-Location are printed to stdout, here showing that +`www.torproject.org` configures it with an HTTP header while `www.qubes-os.org` +does it with an HTML attribute. All three sites connected successfully. + +### Working with a larger list + +Below the [Tranco top-1M][] list is used as an example; 100 workers are +specified, metrics are printed every 15s, and sanity-checks against a site with +Onion-Location which should be reachable are carried out every 60s. + + $ cut -d ',' -f2 top-1m.csv > top-1m.lst + $ onion-grab -i top-1m.lst -w 100 -m 15s -C 60s -c rgdd.se + 2023/03/25 17:44:20 INFO: starting await handler, ctrl+C to exit + 2023/03/25 17:44:20 INFO: starting checker + 2023/03/25 17:44:20 INFO: starting 100 workers + 2023/03/25 17:44:20 INFO: starting work aggregator + 2023/03/25 17:44:20 INFO: generating work + nytimes.com header=https://www.nytimesn7cgmftshazwhfgzm37qxb44r64ytbb2dj3x62d2lljsciiyd.onion/ attribute= + twitter.com header= attribute=https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/ + theguardian.com header=https://www.guardian2zotagl6tmjucg3lrhxdk4dw3lhbqnkvvkywawy3oqfoprid.onion/international attribute= + 2023/03/25 17:44:22 Transport: unhandled response frame type *http.http2UnknownFrame + 2023/03/25 17:44:31 Transport: unhandled response frame type *http.http2UnknownFrame + dw.com header=https://www.dwnewsgngmhlplxy6o2twtfgjnrnjxbegbwqx6wnotdhkzt562tszfid.onion/ attribute= + brave.com header=https://brave4u7jddbv7cyviptqjc7jusxh72uik7zt6adtckl5f4nwy2v72qd.onion/index.html attribute= + 2023/03/25 17:44:35 INFO: currently 72.3 sites/s, 72.3 sites/s since start + guardian.co.uk header=https://www.guardian2zotagl6tmjucg3lrhxdk4dw3lhbqnkvvkywawy3oqfoprid.onion/international attribute= + proton.me header=https://protonmailrmez3lotccipshtkleegetolb73fuirgj7r4o4vfu7ozyd.onion/ attribute= + voanews.com header=https://www.voanews5aitmne6gs2btokcacixclgfl43cv27sirgbauyyjylwpdtqd.onion/ attribute= + 2023/03/25 17:44:50 INFO: currently 64.3 sites/s, 68.3 sites/s since start + ^C2023/03/25 17:44:51 INFO: about to exit, reading remaining answers + 2023/03/25 17:44:51 NOTICE: only read up until line 2089 + 2023/03/25 17:45:01 SUMMARY: 1488/2089 connected, 8 sites configured Onion-Location + +[Tranco top-1M]: https://tranco-list.eu/latest_list + +Note that `ctrl+C` can be used to exit early as shown above. To continue from +where you left off (line `2089`), specify the `-n` option on the next run: + + $ onion-grab -i top-1m.lst -w 100 -m 15s -C 60s -c rgdd.se -n 2089 + 2023/03/25 17:45:57 INFO: starting await handler, ctrl+C to exit + 2023/03/25 17:45:57 INFO: starting checker + 2023/03/25 17:45:57 INFO: starting 100 workers + 2023/03/25 17:45:57 INFO: starting work aggregator + 2023/03/25 17:45:57 INFO: generating work + cia.gov header= attribute=http://ciadotgov4sjwlzihbbgxnqg3xiyrg7so2r2o3lt5wz5ypk4sxyjstad.onion + 2023/03/25 17:46:12 INFO: currently 79.9 sites/s, 79.9 sites/s since start + propublica.org header=http://p53lf57qovyuvwsc6xnrppyply3vtqm7l6pcobkmyqsiofyeznfu5uqd.onion/ attribute= + theintercept.com header=https://54dus3ggt7uxz7wjvhkia2ntxmz5lkhbvgohrwur43trt3d6vrcvfmqd.onion/ attribute= + torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute= + 2023/03/25 17:46:27 INFO: currently 75.1 sites/s, 77.5 sites/s since start + ^C2023/03/25 17:46:28 INFO: about to exit, reading remaining answers + 2023/03/25 17:46:28 NOTICE: only read up until line 4487 (line 2398 relative to start) + 2023/03/25 17:46:38 SUMMARY: 1609/2399 connected, 4 sites configured Onion-Location + +## Known issues + +### Too many parallel workers + +Here's what would happen if the local system cannot handle the number of workers: + + $ onion-grab -i top-1m.lst -w 1000 -m 15s -C 60s -c rgdd.se + 2023/03/25 17:47:36 INFO: starting await handler, ctrl+C to exit + 2023/03/25 17:47:36 INFO: starting checker + 2023/03/25 17:47:36 INFO: starting 1000 workers + 2023/03/25 17:47:36 INFO: starting work aggregator + 2023/03/25 17:47:36 INFO: generating work + 2023/03/25 17:47:36 Transport: unhandled response frame type *http.http2UnknownFrame + twitter.com header= attribute=https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/ + 2023/03/25 17:47:37 Transport: unhandled response frame type *http.http2UnknownFrame + brave.com header=https://brave4u7jddbv7cyviptqjc7jusxh72uik7zt6adtckl5f4nwy2v72qd.onion/index.html attribute= + 2023/03/25 17:47:51 INFO: currently 151.3 sites/s, 151.3 sites/s since start + 2023/03/25 17:48:06 INFO: currently 78.1 sites/s, 114.7 sites/s since start + 2023/03/25 17:48:21 INFO: currently 121.9 sites/s, 117.1 sites/s since start + 2023/03/25 17:48:36 INFO: currently 78.1 sites/s, 107.4 sites/s since start + 2023/03/25 17:48:46 ERROR: checker expected onion for {Domain:rgdd.se OK:false HTTP: HTML:} + 2023/03/25 17:48:46 NOTICE: only read up until line 7442 + 2023/03/25 17:48:46 INFO: about to exit, reading remaining answers + 2023/03/25 17:48:56 SUMMARY: 232/7442 connected, 2 sites configured Onion-Location + +On a Debian system, it appears that all future HTTP GET requests made by +`onion-grab` will fail if a worker overload happens. The exact cause is +unclear. Other programs may be affected too, e.g., `curl` and `Firefox`. + +To get back into a normal state, try: + + # systemctl restart systemd-resolved + +**Note:** domains with Onion-Location are likely to be missed if `-n 7442` is +used here in a subsequent run. For example, with `-C 60s` and an average of 100 +domains/s, it would be wise to roll-back _at least_ 6000 lines. + +Get in touch if you know a fix, e.g., based on `ulimit` and `sysctl` tinkering. + +## Contact + + - rasmus (at) rgdd (dot) se -- cgit v1.2.3