aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 9f357b38356fc222ad0429e67be0a60e7be136c5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# onion-grab

A tool that visits a list of domains over HTTPS to see if they have
[Onion-Location][] configured.

[Onion-Location]: https://community.torproject.org/onion-services/advanced/onion-location/

**Warning:** research prototype.

## Quickstart

### Install

You will need a [Go compiler][] on the local system:

    $ which go >/dev/null || echo "Go compiler is not in PATH"

[Go compiler]: https://go.dev/doc/install

Install `onion-grab`:

    $ go install gitlab.torproject.org/tpo/onion-services/onion-grab@latest

List all options:

    $ onion-grab -h

### Basic usage

Store one domain per line in a file:

    $ cat domains.lst
    www.eff.org
    www.qubes-os.org
    www.torproject.org

Run onion-grab with default parameters:

    $ onion-grab -i domains.lst
    2023/04/07 20:29:45 INFO: ctrl+C to exit prematurely
    2023/04/07 20:29:45 INFO: starting 128 workers with limit 64/s
    2023/04/07 20:29:45 INFO: starting work receiver
    2023/04/07 20:29:45 INFO: starting work generator
    www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
    www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
    2023/04/07 20:29:50 INFO: metrics@receiver:
    
      Processed: 3
        Success: 3 (Onion-Location:2)
        Failure: 0 (See breakdown below)
            Req: 0 (Before sending request)
            DNS: 0 (NotFound:0 Timeout:0 Other:0)
            TCP: 0 (Timeout:0 Syscall:0)
            TLS: 0 (Cert:0 Other:0)
            3xx: 0 (Too many redirects)
            EOF: 0 (Unclear meaning)
            CTX: 0 (Deadline exceeded)
            ???: 0 (Other errors)
    
    2023/04/07 20:29:51 INFO: about to exit in at most 11s, reading remaining answers
    2023/04/07 20:29:57 INFO: metrics@receiver: summary:
    
      Processed: 3
        Success: 3 (Onion-Location:2)
        Failure: 0 (See breakdown below)
            Req: 0 (Before sending request)
            DNS: 0 (NotFound:0 Timeout:0 Other:0)
            TCP: 0 (Timeout:0 Syscall:0)
            TLS: 0 (Cert:0 Other:0)
            3xx: 0 (Too many redirects)
            EOF: 0 (Unclear meaning)
            CTX: 0 (Deadline exceeded)
            ???: 0 (Other errors)
    
    2023/04/07 20:29:57 INFO: measurement duration was 12s

Sites with Onion-Location are printed to stdout, here showing that
`www.torproject.org` configures it with an HTTP header while `www.qubes-os.org`
does it with an HTML attribute.  All three sites connected successfully.

In case of errors, the type of error is identified with relatively few `???`.

### Scripts

Digest the results, here stored as `onion-grab.stdout`:

    $ cat onion-grab.stdout
    www.qubes-os.org header= attribute=http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
    www.torproject.org header=http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html attribute=
    $ ./scripts/digest.py -i onion-grab.stdout
    digest.py:25 INFO: found 1 HTTP headers with Onion-Location
    digest.py:26 INFO: found 1 HTML meta attributes with Onion-Location
    digest.py:27 INFO: found 2 unqiue domain names that set Onion-Location
    digest.py:28 INFO: found 2 unique two-label onion addresses in the process
    digest.py:30 INFO: storing domains with valid Onion-Location configurations in domains.txt
    digest.py:35 INFO: storing two-label onion addresses that domains referenced in onions.txt
    $ cat domains.txt
    www.qubes-os.org http://qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion/
    www.torproject.org http://2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion/index.html
    $ cat onions.txt
    qubesosfasa4zl44o4tws22di6kepyzfeqv3tg4e3ztknltfxqrymdad.onion www.qubes-os.org
    2gzyxa5ihm7nsggfxnu52rck2vv4rvmdlkiu3zzui5du4xyclen53wid.onion www.torproject.org

In other words, the digest script prints some information and writes two files:

  - `domains.txt`: domains that configured valid Onion-Location headers.  The
    listed Onion-Location values are de-duplicated and space-separated.
  - `onions.txt`: two-label `.onion` addresses that were discovered.  The listed
    domains referenced this address in their Onion-Location configuration,
    possibly with subdomains, paths, etc., that were removed.  Such pruning of
    the set Onion-Location values is useful to estimate the number of onions.

See [scripts/test.sh](./scripts/test.sh) and if you are looking to test
different `onion-grab` configuration.  You may find
[scripts/measure.sh](scripts/measure.sh) to be a useful measurement script.

## Running a larger measurement

See [docs/operations.md](TODO)
for measurements of [Tranco top-1M][] and [ct-sans][].

[Tranco top-1M]: https://tranco-list.eu/latest_list
[ct-sans]: https://git.cs.kau.se/rasmoste/ct-sans/-/blob/main/docs/operations.md

## Contact

  - rasmus (at) rgdd (dot) se

## Licence

BSD 2-Clause License