From 8a17817c61f14a727a1017a5bcd4b1ea82964528 Mon Sep 17 00:00:00 2001 From: Rasmus Dahlberg Date: Sat, 18 Jan 2025 13:39:40 +0100 Subject: prometheus: Refine based on input from anarcat https://gitlab.torproject.org/tpo/tpa/team/-/issues/40677 --- docs/metrics.md | 83 ++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 52 insertions(+), 31 deletions(-) (limited to 'docs/metrics.md') diff --git a/docs/metrics.md b/docs/metrics.md index 1dea0ef..aac873e 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -1,59 +1,56 @@ # Metrics -The `silentct-mon` program emits Prometheus metrics -- enable using the `-m` -option. For a *bash example* of how to create appropriate alerts from these -Prometheus metrics, see [scripts/silentct-check](../scripts/silentct-check). +`silentct-mon` can output Prometheus metrics -- enable using the `-m` option. -## `"silentct_log_size"` +## Examples of useful alerts + + - **The monitor is falling behind on downloading a particular log**, e.g., + `silentct_log_size - silentct_log_index > 65536`. + - **The monitor hasn't seen a fresh timestamp from a particular log**, e.g., + `time() - silentct_log_timestamp > 24*60*60`. + - **The monitor needs restarting**, e.g., `silentct_need_restart != 0` + - **Unexpected certificates have been found**, e.g., + `silentct_unexpected_certificate_count > 0`. + +## `"silentct_error_counter"` ``` -# HELP silentct_log_size The number of entries in the log. -# TYPE silentct_log_size gauge -silentct_log_size{id="TnWjJ1yaEMM4W2zU3z9S6x3w4I4bjWnAsfpksWKaOd8="} 6.07308178e+08 +# HELP silentct_error_counter The number of errors propagated to the main loop. +# TYPE silentct_error_counter counter +silentct_error_counter 0 ``` -`id` is a unique log identifier in base64 (computed as in RFC 6962, §3.2). +Do not use for alerting, this metric is too noisy and currently used for debug. ## `"silentct_log_index"` ``` # HELP silentct_log_index The next log entry to be downloaded. # TYPE silentct_log_index gauge -silentct_log_index{id="TnWjJ1yaEMM4W2zU3z9S6x3w4I4bjWnAsfpksWKaOd8="} 6.07307424e+08 +silentct_log_index{log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df"} 7.30980064e+08 ``` -`id` is a unique log identifier in base64 (computed as in RFC 6962, §3.2). +`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2. -## `"silentct_log_timestamp"` - -``` -# HELP silentct_log_timestamp The log's UNIX timestamp in ms. -# TYPE silentct_log_timestamp gauge -silentct_log_timestamp{id="TnWjJ1yaEMM4W2zU3z9S6x3w4I4bjWnAsfpksWKaOd8="} 1.735992491111e+12 -``` - -`id` is a unique log identifier in base64 (computed as in RFC 6962, §3.2). - -## `"silentct_certificate_alert"` +## `"silentct_log_size"` ``` -# HELP silentct_certificate_alert The time the certificate without allowlisting was found. -# TYPE silentct_certificate_alert gauge -silentct_certificate_alert{stored_at="/path/to/state/crt_found/-.json"} 1.735992551e+09 +# HELP silentct_log_size The number of entries in the log. +# TYPE silentct_log_size gauge +silentct_log_size{log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df"} 7.31044085e+08 ``` -`stored_at` is where the log entry is stored on the monitor's local file system. -For convenience, the parsed log-entry certificate is also available as `.ascii`. +`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2. -## `"silentct_error_counter"` +## `"silentct_log_timestamp"` ``` -# HELP silentct_error_counter The number of errors propagated to the main loop. -# TYPE silentct_error_counter counter -silentct_error_counter 0 +# HELP silentct_log_timestamp The log's UNIX timestamp in ms. +# TYPE silentct_log_timestamp gauge +silentct_log_timestamp{log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df"} 1.737202578179e+12 ``` -Do not use for alerting, this metric is too noisy and currently used for debug. +`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2. ## `"silentct_need_restart"` @@ -65,3 +62,27 @@ silentct_need_restart 0 Restarts are normally not needed; but here's a metric until the `silentct-mon` implementation can assure that all corner-cases are handled without restarts. + +## `"silentct_unexpected_certificate_count"` + +``` +# HELP silentct_unexpected_certificate_count Number of certificates without any allowlisting +# TYPE silentct_unexpected_certificate_count gauge +silentct_unexpected_certificate_count{crt_sans="example.org www.example.org",log_id="4e75a3275c9a10c3385b6cd4df3f52eb1df0e08e1b8d69c0b1fa64b1629a39df",log_index="1234"} 1 +``` + +`crt_sans` are the subject alternative names in the unexpected certificate, +space separated. + +`log_id` is a unique log identifier in hex, computed as in RFC 6962 §3.2. + +`log_index` specifies the log entry that contains the unexpected certificate. + +See `STATE_DIRECTORY/crt_found/-.*` for further details. The +`.json` file contains the downloaded log entry. The `.ascii` file contains the +parsed leaf certificate in a human-readable format to make debugging easier. + +Allowlist an unexpected certificate by ingesting it from a trusted certificate +requester. Alternatively: stop the monitor, manually move the unexpected +certificate from the "alerting" dictionary to the "legitimate" dictionary in +`STATE_DIRECTORY/crt_index.json`, save, and then start the monitor again. -- cgit v1.2.3