aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorRasmus Dahlberg <rgdd@glasklarteknik.se>2025-01-18 13:39:40 +0100
committerRasmus Dahlberg <rgdd@glasklarteknik.se>2025-01-18 16:34:14 +0100
commit8a17817c61f14a727a1017a5bcd4b1ea82964528 (patch)
treeec0fa96bfc683e906413106f2db2b99d710dc389 /docs
parent2d3b1f2cb0c05385c1702f1a7d74fa08d52c262f (diff)
prometheus: Refine based on input from anarcatmain
https://gitlab.torproject.org/tpo/tpa/team/-/issues/40677
Diffstat (limited to 'docs')
-rw-r--r--docs/metrics.md83
1 files changed, 52 insertions, 31 deletions
diff --git a/docs/metrics.md b/docs/metrics.md
index 1dea0ef..aac873e 100644
--- a/docs/metrics.md
+++ b/docs/metrics.md
@@ -1,59 +1,56 @@
# Metrics
-The `silentct-mon` program emits Prometheus metrics -- enable using the `-m`
-option. For a *bash example* of how to create appropriate alerts from these
-Prometheus metrics, see [scripts/silentct-check](../scripts/silentct-check).
+`silentct-mon` can output Prometheus metrics -- enable using the `-m` option.
-## `"silentct_log_size"`
+## Examples of useful alerts
+
+ - **The monitor is falling behind on downloading a particular log**, e.g.,
+ `silentct_log_size - silentct_log_index > 65536`.
+ - **The monitor hasn't seen a fresh timestamp from a particular log**, e.g.,
+ `time() - silentct_log_timestamp > 24*60*60`.
+ - **The monitor needs restarting**, e.g., `silentct_need_restart != 0`
+ - **Unexpected certificates have been found**, e.g.,
+ `silentct_unexpected_certificate_count > 0`.
+
+## `"silentct_error_counter"`
```
-# HELP silentct_log_size The number of entries in the log.
-# TYPE silentct_log_size gauge
-silentct_log_size{id="TnWjJ1yaEMM4W2zU3z9S6x3w4I4bjWnAsfpksWKaOd8="} 6.07308178e+08
+# HELP silentct_error_counter The number of errors propagated to the main loop.
+# TYPE silentct_error_counter counter
+silentct_error_counter 0
```
-`id` is a unique log identifier in base64 (computed as in RFC 6962, §3.2).
+Do not use for alerting, this metric is too noisy and currently used for debug.
## `"silentct_log_index"`
```
# HELP silentct_log_index The next log entry to be downloaded.
# TYPE silentct_log_index gauge
-silentct_log_index{id="TnWjJ1yaEMM4W2zU3z9S6x3w4I4bjWnAsfpksWKaOd8="} 6.07307424e+08
+silentct_log_index{log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df"} 7.30980064e+08
```
-`id` is a unique log identifier in base64 (computed as in RFC 6962, §3.2).
+`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2.
-## `"silentct_log_timestamp"`
-
-```
-# HELP silentct_log_timestamp The log's UNIX timestamp in ms.
-# TYPE silentct_log_timestamp gauge
-silentct_log_timestamp{id="TnWjJ1yaEMM4W2zU3z9S6x3w4I4bjWnAsfpksWKaOd8="} 1.735992491111e+12
-```
-
-`id` is a unique log identifier in base64 (computed as in RFC 6962, §3.2).
-
-## `"silentct_certificate_alert"`
+## `"silentct_log_size"`
```
-# HELP silentct_certificate_alert The time the certificate without allowlisting was found.
-# TYPE silentct_certificate_alert gauge
-silentct_certificate_alert{stored_at="/path/to/state/crt_found/<log-hex-id>-<log-index>.json"} 1.735992551e+09
+# HELP silentct_log_size The number of entries in the log.
+# TYPE silentct_log_size gauge
+silentct_log_size{log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df"} 7.31044085e+08
```
-`stored_at` is where the log entry is stored on the monitor's local file system.
-For convenience, the parsed log-entry certificate is also available as `.ascii`.
+`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2.
-## `"silentct_error_counter"`
+## `"silentct_log_timestamp"`
```
-# HELP silentct_error_counter The number of errors propagated to the main loop.
-# TYPE silentct_error_counter counter
-silentct_error_counter 0
+# HELP silentct_log_timestamp The log's UNIX timestamp in ms.
+# TYPE silentct_log_timestamp gauge
+silentct_log_timestamp{log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df"} 1.737202578179e+12
```
-Do not use for alerting, this metric is too noisy and currently used for debug.
+`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2.
## `"silentct_need_restart"`
@@ -65,3 +62,27 @@ silentct_need_restart 0
Restarts are normally not needed; but here's a metric until the `silentct-mon`
implementation can assure that all corner-cases are handled without restarts.
+
+## `"silentct_unexpected_certificate_count"`
+
+```
+# HELP silentct_unexpected_certificate_count Number of certificates without any allowlisting
+# TYPE silentct_unexpected_certificate_count gauge
+silentct_unexpected_certificate_count{crt_sans="example.org www.example.org",log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df",log_index="1234"} 1
+```
+
+`crt_sans` are the subject alternative names in the unexpected certificate,
+space separated.
+
+`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2.
+
+`log_index` specifies the log entry that contains the unexpected certificate.
+
+See `STATE_DIRECTORY/crt_found/<log_id>-<log_index>.*` for further details. The
+`.json` file contains the downloaded log entry. The `.ascii` file contains the
+parsed leaf certificate in a human-readable format to make debugging easier.
+
+Allowlist an unexpected certificate by ingesting it from a trusted certificate
+requester. Alternatively: stop the monitor, manually move the unexpected
+certificate from the "alerting" dictionary to the "legitimate" dictionary in
+`STATE_DIRECTORY/crt_index.json`, save, and then start the monitor again.