timeline: Add rate-limit bug

author: Rasmus Dahlberg <rasmus@rgdd.se> 2025-03-12 18:18:45 +0100
committer: Rasmus Dahlberg <rasmus@rgdd.se> 2025-03-12 18:37:10 +0100
commit: 30d279bf0f2868ccbf0b77ed2aa9a4eb1be49ae2 (patch)
tree: de5ab8be80d3f4ebd6f356fec3a273d4dda1e4fa
parent: c32a70494dc025c707f1a5e59d2078abf094236a (diff)
1 files changed, 25 insertions, 0 deletions
diff --git a/docs/operations.md b/docs/operations.md
index be3cee0..914f7b4 100644
--- a/docs/operations.md
+++ b/docs/operations.md
@@ -74,6 +74,7 @@ The versions of `git.cs.kau.se/rasmoste/ct-sans@VERSION` are listed below.
 | 2023/04/03 | 16:12:38   | assemble done               | 0.91B SANs (25.2GiB) from 3.74B certs |
 | 2024/02/10 | 09:10:20   | snapshot and start collect  | still running v0.0.2 [6]              |
 | 2024/02/12 | 03:54:13   | abort collection            | not needed for our paper contribs [7] |
+| 2025/01/06 |            | find rate-limit bug         | [8]                                   |
 
 ## Notes
 
@@ -232,3 +233,27 @@ We decided to abort another round of ct-sans (and following onion-grab)
 measurements because it is not strictly needed to achieve our goals.  If we want
 to make more measurements for the sake of making the ct-sans data set available,
 we should automate it rather than doing it manually as in this timeline.
+
+## 8
+
+The upstream library that we're using doesn't enforce rate-limits on the
+get-entries endpoint (despite the library having a codepath for this), see:
+
+  - <https://github.com/google/certificate-transparency-go/issues/898>
+
+A fix was applied to ct-sans on 2025-03-12, use tag v0.1.0.
+
+What this rate-limit bug means for us: our workers did not backoff on HTTP
+status 429.  There were no log output for these kind of responses, so it is
+impossible to tell two years later if we were seing any status 429 or not.
+
+What can be said is that we are receiving such responses for Google and Let's
+Encrypt when running with the same number of workers today.  For Let's Encrypt,
+throughput is roughly the same as when we measured when adjusting the number of
+works so that we don't see any/many status 429.  For Google, we get roughly half
+the throughput compared to when we measured.  Since we tuned the number of
+workers [manually][] for each log two years ago (including finding when more
+workers gave worse performance -- presumable because of too many status 429),
+then we were probably not overshooting by a lot.  But again, hard to say.
+
+[manually]: https://git.rgdd.se/ct-sans/tree/utils_ct.go?h=v0.0.2#n42
author	Rasmus Dahlberg <rasmus@rgdd.se>	2025-03-12 18:18:45 +0100
committer	Rasmus Dahlberg <rasmus@rgdd.se>	2025-03-12 18:37:10 +0100
commit	30d279bf0f2868ccbf0b77ed2aa9a4eb1be49ae2 (patch)
tree	de5ab8be80d3f4ebd6f356fec3a273d4dda1e4fa
parent	c32a70494dc025c707f1a5e59d2078abf094236a (diff)