Time Series Performance and Limits
When using Outlyer’s service to collect metrics it may be useful to be aware of certain performance trade-offs and constraints, as well as limits against which metrics will be validated.
Before reading on it is recommended that you read the key concepts in order to better understand the time series data model.
Performance Concepts
When storing and querying time series data there are certain trade-offs which must be made in order to maintain performance and availability. Some of the more important concepts are described below:
Label Count
The number of metric labels per metric or can negatively impact performance as the combination of name and value effectively creates a separate series which must be retrieved and aggregated on query when using a wide query.
While limits apply to the label count, it is recommended that the number of labels is kept to a reasonable minimum to maintain good query performance of large time ranges.
Label Name & Value Length
The length of label names and values impact on the performance of the index when storing a querying data. The longer the strings for either name or value, the poorer the performance of the time series index and the more storage is required to persist each data point.
As with label count limits apply, but it is still recommended to keep string lengths to a reasonable minimum where possible to maintain good storage and retrieval performance of your data.
Cardinality
Cardinality is the word used to describe the number of distinct values for a tag. Each value will effectively create a separate series. The higher the cardinality the worse the performance characteristics of storage and query of that series.
An example of a metric with high cardinality would have at least one label set by an unbounded data source, such as email address or user ID.
Churn
Churn describes the longevity of a set of metrics. If a set of metrics have a relatively short lifetime, particularly in the case of container metrics, they will have a high churn rate. At Outlyer we calculate this every hour. Churn is related into cardinality, such that a metric with a high cardinality will also churn often.
A high percentage churn is considered bad because series fragments will be stored separately and stitched together upon query, making query response times much slower.
You can monitor the hourly churn for your account by using the Outlyer Agent integration.
Validation & Limits
In order to maintain service availability, limits and validation are applied to each metric sample entering the system. Any metric sample marked as invalid will be discarded, however, the reason for validation failure will be recorded and may be tracked using the Outlyer Agent integration.
An invalid metric sample may have been marked as such by multiple rules, each of which will count as an individual failure, therefore there is a many to one relationship between validation failures at metric samples.
Label Names & Values
The following rules currently apply to metric samples sent to Outlyer:
- Each metric sample can contain up to 45 labels, each with a key of 40 characters and a value of 80 characters (exceptions to the value length can be made, to do so please contact customer support).
- The label’s key and value must both adhere to the following regular expression:
^[A-Za-z0-9][-._A-Za-z0-9^@/]*[A-Za-z0-9]$
. You can test keys and values against the expression here. - The label key cannot start with any of the following reserved prefixes:
ol.
,atlas.
Query
When querying, the following may occur due to performance limits of the system:
- The query takes too long and is terminated - this may be due to performance issues described above, the complexity of the query or the size of the time range.
- The query returns too many series - this may be due to querying a label on a metric with high cardinality. To work around this when querying, remove the label with high cardinality from the query, reduce the time window of the dashboard, or constrain the query further. Ideally to avoid this the cardinality and/or churn of the data being sent should be reduced.