From d44f024386316a364ddf9bc17762883cea3ddfc0 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Mon, 27 Mar 2023 12:46:52 +0200 Subject: Clean-up debug notes in README --- README.md | 39 ++++++++++----------------------------- 1 file changed, 10 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 379a94e..c4444f6 100644 --- a/README.md +++ b/README.md @@ -118,40 +118,21 @@ Here's what would happen if the local system cannot handle the number of workers 2023/03/25 17:48:46 INFO: about to exit, reading remaining answers 2023/03/25 17:48:56 SUMMARY: 232/7442 connected, 2 sites configured Onion-Location -On a Debian system, it appears that all future HTTP GET requests made by -`onion-grab` will fail if a worker overload happens. The exact cause is -unclear. Other programs may be affected too, e.g., `curl` and `Firefox`. +This is most likely an OS problem; not an onion-grab problem. Debug hints: -To get back into a normal state, try: + - Stop and disable `systemd-resolved`, then specify a recursive resolver that + can handle the expected load. + - You may need to tinker with kernel tunables, see `ulimit -a` and `sysctl -a` + for what can be configured. For example, if you find that the error is + caused by too many open files, try increasing the value of `ulimit -n`. - # systemctl restart systemd-resolved +**Credit:** Björn Töpel helped debugging this issue. **Note:** domains with Onion-Location are likely to be missed if `-n 7442` is used here in a subsequent run. For example, with `-C 60s` and an average of 100 -domains/s, it would be wise to roll-back _at least_ 6000 lines. - -More debug notes: - - - My system is not fully utilized wrt. CPU/MEM/BW; an odd thing is that it - seems to work fine to run multiple onion-grab instances as separate - commands, e.g., 3x `-w 280` to get up to ~225 Mbps utilization (max). - Added options `-s START` and `-e END` to specify that only lines `[START, - END)` should be processed in the input file to make this less clunky. - - Tinkering with with options in http.Transport doesn't seem help. - - Using multiple http.Client doesn't help (e.g., one per worker) - - An odd thing is that after errors, it appears that only DNS is dead. E.g., - `curl https://www.rgdd.se` fails while `curl --resolve - www.rgdd.se:443:213.164.207.87` succeeds. Replacing the system's DNS with a - local unbound process doesn't seem to help though. (It appears that no UDP - connections are going through.) - - Tinkering with the options in `ulimit -a` and `sysctl -a` is probably the - right approach, but so far have not been able to make that work. - -After some rubber-ducking with Björn Töpel: disable systemd-resolved, then set -a DNS resolver in /etc/resolve.conf (or mullvad app) that can handle the load. - -XXX: document this properly, and still mention the ulimit / sysctl stuff since -this will likely become a bottleneck too on certain systems and qps loads. +domains/s, it would be wise to roll-back _at least_ 6000 lines. This should be +a last-resort option, and is mainly here to sanity-check long measurements. + ## Contact -- cgit v1.2.3