aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorRasmus Dahlberg <rasmus@rgdd.se>2025-03-12 18:18:45 +0100
committerRasmus Dahlberg <rasmus@rgdd.se>2025-03-12 18:37:10 +0100
commit30d279bf0f2868ccbf0b77ed2aa9a4eb1be49ae2 (patch)
treede5ab8be80d3f4ebd6f356fec3a273d4dda1e4fa
parentc32a70494dc025c707f1a5e59d2078abf094236a (diff)
timeline: Add rate-limit bug
-rw-r--r--docs/operations.md25
1 files changed, 25 insertions, 0 deletions
diff --git a/docs/operations.md b/docs/operations.md
index be3cee0..914f7b4 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -74,6 +74,7 @@ The versions of `git.cs.kau.se/rasmoste/ct-sans@VERSION` are listed below.
| 2023/04/03 | 16:12:38 | assemble done | 0.91B SANs (25.2GiB) from 3.74B certs |
| 2024/02/10 | 09:10:20 | snapshot and start collect | still running v0.0.2 [6] |
| 2024/02/12 | 03:54:13 | abort collection | not needed for our paper contribs [7] |
+| 2025/01/06 | | find rate-limit bug | [8] |
## Notes
@@ -232,3 +233,27 @@ We decided to abort another round of ct-sans (and following onion-grab)
measurements because it is not strictly needed to achieve our goals. If we want
to make more measurements for the sake of making the ct-sans data set available,
we should automate it rather than doing it manually as in this timeline.
+
+## 8
+
+The upstream library that we're using doesn't enforce rate-limits on the
+get-entries endpoint (despite the library having a codepath for this), see:
+
+ - <https://github.com/google/certificate-transparency-go/issues/898>
+
+A fix was applied to ct-sans on 2025-03-12, use tag v0.1.0.
+
+What this rate-limit bug means for us: our workers did not backoff on HTTP
+status 429. There were no log output for these kind of responses, so it is
+impossible to tell two years later if we were seing any status 429 or not.
+
+What can be said is that we are receiving such responses for Google and Let's
+Encrypt when running with the same number of workers today. For Let's Encrypt,
+throughput is roughly the same as when we measured when adjusting the number of
+works so that we don't see any/many status 429. For Google, we get roughly half
+the throughput compared to when we measured. Since we tuned the number of
+workers [manually][] for each log two years ago (including finding when more
+workers gave worse performance -- presumable because of too many status 429),
+then we were probably not overshooting by a lot. But again, hard to say.
+
+[manually]: https://git.rgdd.se/ct-sans/tree/utils_ct.go?h=v0.0.2#n42