Event monitoring with Prometheus¶
It is also possible to use Prometheus software to monitor the operation of the service.
Prometheus¶
It is possible to trace the service operation with Prometheus, an open-source toolkit for systems monitoring and alerting. The feature is optional, so it needs to be explicitly enabled when starting the service.
For scraper implementers, the full metrics listing is available.
Metric naming¶
In the matter of naming, the service tries to follow the official Prometheus guidelines. The so-called application prefix for metrics is asr, e.g., asr_build_info or asr_uptime_seconds_total.
Metric families and metric label conventions¶
A metric family is a group of entries with the same name but different labels.
Example:
asr_requests_count_total{module="gender"} 7
asr_requests_count_total{module="speech", recognizer="pl.general_model"} 5
asr_requests_count_total{module="speech", recognizer="pl.context_model"} 2
In this example, asr_requests_count_total is the metric name, whereas module and recognizer are its labels.
Labels aim at keeping metrics orthogonal. This means that each unique event belonging to a specific family is recorded once, as it strictly matches only one of the family’s label sets.
Metrics listing¶
asr_build_info gauge
A pseudo-metric which provides application build information via labels. Its value is always 1
.
Label |
Description |
---|---|
version |
the SemVer version of the service |
asr_channels_usage_count gauge
The number of channels acquired (simultaneous requests handled) at the moment. A channel is an abstract concept interpreted as a separate connection maintained with the service.
asr_licence_info gauge
A pseudo-metric which provides licence information via labels. Its value is 1
if a licence is assigned to the service (it does not have to be valid though), 0
otherwise. Labels are exposed only when the value is 1
.
Label |
Description |
---|---|
channels |
the limit of the service channels (for definition see asr_channels_usage_count gauge) |
asr_licence_expiration_seconds gauge
The time left until the licence assigned to the service expires. The presence of the metric is dynamic. It is exposed if the expiration date of the licence is not unrestricted.
asr_processed_audio_seconds_total counter family
The duration of audio processed by the service using a configuration that matches the counter’s labels.
Label |
Description |
---|---|
module |
see asr_request_total gauge family |
recognizer |
see asr_request_total gauge family |
asr_requests_total counter family
A single counter from this family shows a number of requests handled by the service using a configuration that match the counter’s labels.
The counter is replicated for each label describing the outcome of the request, which can fall into one of three categories:
completed, when the request is accepted and returns an interpretable result;
dismissed, when the request is rejected ahead of time, e.g., because of its invalid input data;
error, when the request fails despite its data approval.
Label |
Description |
---|---|
module |
the name of the recognition module |
recognizer |
the name of the module’s used recognizer; it may be omitted for the module’s only recognizer |
status |
the outcome of the request |
asr_runtime_info gauge
A pseudo-metric which provides application runtime information via labels. Its value is always 1
.
Label |
Description |
---|---|
hostname |
the name of the service’s host machine |
asr_sessions_info gauge
A pseudo-metric which provides the session cache’s information via labels. Its value is always 1
.
Label |
Description |
---|---|
timeout_seconds |
the time after which an inactive session is marked for deletion |
asr_sessions_cached_count gauge
The number of sessions which data may be reused with another request.
asr_sessions_memory_usage_bytes gauge
The cumulative size of audio data from all sessions cached.
asr_status_info gauge family
A single gauge from this family shows current status of the service or its components. The number of gauges is dynamic and depends on the current service configuration. The service status gauge is always visible whereas gauges for modules that are not requested remain unexposed.
A status may be:
off, denoted as
0
, when a component is available but disabled, e.g., it might have been explicitly disabled and needs to be turned on, or it did not receive a proper configuration and failed to load;on, denoted as
1
, when a component is ready and serving;busy, denoted as
2
, when a component is available but temporarily inaccessible, e.g., in a transient state while loading its configuration.
Alternatively, a value of -1
may be noticed for a unknown condition. It may imply an internal failure that is not necessarily related to the component being observed.
Label |
Description |
---|---|
module |
the name of the service’s tracked component; the service’s status itself does not have this label assigned |
asr_uptime_seconds_total counter
It is what it seems to be. The time that elapsed since the service started.