aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.md138
1 files changed, 136 insertions, 2 deletions
diff --git a/README.md b/README.md
index 6f4314a..8356ba4 100644
--- a/README.md
+++ b/README.md
@@ -1,3 +1,137 @@
-# find-onion
+# onion-grab
-docdoc
+A tool that visits a list of domains over HTTPS to see if they have
+[Onion-Location][] configured.
+
+[Onion-Location]: https://community.torproject.org/onion-services/advanced/onion-location/
+
+**Warning:** research prototype. The source code may also be moved.
+
+## Quickstart
+
+You will need a Go compiler on the local system:
+
+ $ which go >/dev/null || echo "Go compiler is not in PATH"
+
+Install `onion-grab`:
+
+ $ go install git.cs.kau.se/rasmoste/onion-grab@latest
+
+List all options:
+
+ $ onion-grab -h
+
+### Basic usage
+
+Store domains in a file; one domain per line:
+
+ $ cat domains.lst
+ www.eff.org
+ www.qubes-os.org
+ www.torproject.org
+ $ onion-grab -i domains.lst
+ 2023/03/25 17:43:30 INFO: starting await handler, ctrl+C to exit
+ 2023/03/25 17:43:30 INFO: starting 2 workers
+ 2023/03/25 17:43:30 INFO: starting work aggregator
+ 2023/03/25 17:43:30 INFO: generating work
+ www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
+ www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
+ 2023/03/25 17:43:40 INFO: about to exit, reading remaining answers
+ 2023/03/25 17:43:50 SUMMARY: 3/3 connected, 2 sites configured Onion-Location
+
+Sites with Onion-Location are printed to stdout, here showing that
+`www.torproject.org` configures it with an HTTP header while `www.qubes-os.org`
+does it with an HTML attribute. All three sites connected successfully.
+
+### Working with a larger list
+
+Below the [Tranco top-1M][] list is used as an example; 100 workers are
+specified, metrics are printed every 15s, and sanity-checks against a site with
+Onion-Location which should be reachable are carried out every 60s.
+
+ $ cut -d ',' -f2 top-1m.csv > top-1m.lst
+ $ onion-grab -i top-1m.lst -w 100 -m 15s -C 60s -c rgdd.se
+ 2023/03/25 17:44:20 INFO: starting await handler, ctrl+C to exit
+ 2023/03/25 17:44:20 INFO: starting checker
+ 2023/03/25 17:44:20 INFO: starting 100 workers
+ 2023/03/25 17:44:20 INFO: starting work aggregator
+ 2023/03/25 17:44:20 INFO: generating work
+ nytimes.com header=https://www.nytimesn7cgmftshazwhfgzm37qxb44r64ytbb2dj3x62d2lljsciiyd.onion/ attribute=
+ twitter.com header= attribute=https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/
+ theguardian.com header=https://www.guardian2zotagl6tmjucg3lrhxdk4dw3lhbqnkvvkywawy3oqfoprid.onion/international attribute=
+ 2023/03/25 17:44:22 Transport: unhandled response frame type *http.http2UnknownFrame
+ 2023/03/25 17:44:31 Transport: unhandled response frame type *http.http2UnknownFrame
+ dw.com header=https://www.dwnewsgngmhlplxy6o2twtfgjnrnjxbegbwqx6wnotdhkzt562tszfid.onion/ attribute=
+ brave.com header=https://brave4u7jddbv7cyviptqjc7jusxh72uik7zt6adtckl5f4nwy2v72qd.onion/index.html attribute=
+ 2023/03/25 17:44:35 INFO: currently 72.3 sites/s, 72.3 sites/s since start
+ guardian.co.uk header=https://www.guardian2zotagl6tmjucg3lrhxdk4dw3lhbqnkvvkywawy3oqfoprid.onion/international attribute=
+ proton.me header=https://protonmailrmez3lotccipshtkleegetolb73fuirgj7r4o4vfu7ozyd.onion/ attribute=
+ voanews.com header=https://www.voanews5aitmne6gs2btokcacixclgfl43cv27sirgbauyyjylwpdtqd.onion/ attribute=
+ 2023/03/25 17:44:50 INFO: currently 64.3 sites/s, 68.3 sites/s since start
+ ^C2023/03/25 17:44:51 INFO: about to exit, reading remaining answers
+ 2023/03/25 17:44:51 NOTICE: only read up until line 2089
+ 2023/03/25 17:45:01 SUMMARY: 1488/2089 connected, 8 sites configured Onion-Location
+
+[Tranco top-1M]: https://tranco-list.eu/latest_list
+
+Note that `ctrl+C` can be used to exit early as shown above. To continue from
+where you left off (line `2089`), specify the `-n` option on the next run:
+
+ $ onion-grab -i top-1m.lst -w 100 -m 15s -C 60s -c rgdd.se -n 2089
+ 2023/03/25 17:45:57 INFO: starting await handler, ctrl+C to exit
+ 2023/03/25 17:45:57 INFO: starting checker
+ 2023/03/25 17:45:57 INFO: starting 100 workers
+ 2023/03/25 17:45:57 INFO: starting work aggregator
+ 2023/03/25 17:45:57 INFO: generating work
+ cia.gov header= attribute=http://ciadotgov4sjwlzihbbgxnqg3xiyrg7so2r2o3lt5wz5ypk4sxyjstad.onion
+ 2023/03/25 17:46:12 INFO: currently 79.9 sites/s, 79.9 sites/s since start
+ propublica.org header=http://p53lf57qovyuvwsc6xnrppyply3vtqm7l6pcobkmyqsiofyeznfu5uqd.onion/ attribute=
+ theintercept.com header=https://54dus3ggt7uxz7wjvhkia2ntxmz5lkhbvgohrwur43trt3d6vrcvfmqd.onion/ attribute=
+ torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
+ 2023/03/25 17:46:27 INFO: currently 75.1 sites/s, 77.5 sites/s since start
+ ^C2023/03/25 17:46:28 INFO: about to exit, reading remaining answers
+ 2023/03/25 17:46:28 NOTICE: only read up until line 4487 (line 2398 relative to start)
+ 2023/03/25 17:46:38 SUMMARY: 1609/2399 connected, 4 sites configured Onion-Location
+
+## Known issues
+
+### Too many parallel workers
+
+Here's what would happen if the local system cannot handle the number of workers:
+
+ $ onion-grab -i top-1m.lst -w 1000 -m 15s -C 60s -c rgdd.se
+ 2023/03/25 17:47:36 INFO: starting await handler, ctrl+C to exit
+ 2023/03/25 17:47:36 INFO: starting checker
+ 2023/03/25 17:47:36 INFO: starting 1000 workers
+ 2023/03/25 17:47:36 INFO: starting work aggregator
+ 2023/03/25 17:47:36 INFO: generating work
+ 2023/03/25 17:47:36 Transport: unhandled response frame type *http.http2UnknownFrame
+ twitter.com header= attribute=https://twitter3e4tixl4xyajtrzo62zg5vztmjuricljdp2c5kshju4avyoid.onion/
+ 2023/03/25 17:47:37 Transport: unhandled response frame type *http.http2UnknownFrame
+ brave.com header=https://brave4u7jddbv7cyviptqjc7jusxh72uik7zt6adtckl5f4nwy2v72qd.onion/index.html attribute=
+ 2023/03/25 17:47:51 INFO: currently 151.3 sites/s, 151.3 sites/s since start
+ 2023/03/25 17:48:06 INFO: currently 78.1 sites/s, 114.7 sites/s since start
+ 2023/03/25 17:48:21 INFO: currently 121.9 sites/s, 117.1 sites/s since start
+ 2023/03/25 17:48:36 INFO: currently 78.1 sites/s, 107.4 sites/s since start
+ 2023/03/25 17:48:46 ERROR: checker expected onion for {Domain:rgdd.se OK:false HTTP: HTML:}
+ 2023/03/25 17:48:46 NOTICE: only read up until line 7442
+ 2023/03/25 17:48:46 INFO: about to exit, reading remaining answers
+ 2023/03/25 17:48:56 SUMMARY: 232/7442 connected, 2 sites configured Onion-Location
+
+On a Debian system, it appears that all future HTTP GET requests made by
+`onion-grab` will fail if a worker overload happens. The exact cause is
+unclear. Other programs may be affected too, e.g., `curl` and `Firefox`.
+
+To get back into a normal state, try:
+
+ # systemctl restart systemd-resolved
+
+**Note:** domains with Onion-Location are likely to be missed if `-n 7442` is
+used here in a subsequent run. For example, with `-C 60s` and an average of 100
+domains/s, it would be wise to roll-back _at least_ 6000 lines.
+
+Get in touch if you know a fix, e.g., based on `ulimit` and `sysctl` tinkering.
+
+## Contact
+
+ - rasmus (at) rgdd (dot) se