Using Victoriametrics for Prometheus LTS
Not the LTS you are used to
Nowadays everyone knows about Prometheus, but there are few people that1 dig deeper in its core functionalities, such as the storage of the data. However, it is a critical point because it can be a performance contender as much as a liability since you often base your whole alerting on the metrics that are stored in this TSDB (Time Series DataBase). For periods of retention that are more than a simple buffer we talk about Long Term Storage or in short “LTS”.
Still, the vanilla prometheus TSDB is not the only one that exists, and there are alternatives : the 3 most known are Cortex, M3 and Thanos. There is also Mimir that I did not test myself and the one I wanted to cover in this little blog post : VictoriaMetrics.
The following is going to be expressed from the point of view of the SRE : deployment, management, usage, integration with the ecosystem and so on.
VictoriaMetrics is designed as single binaries, that can be easily installed, upgraded and tested. I’ve been running it in production at $DAYJOB for almost 2 years now, upgraded it a few times and never had an issue doing so. It’s all configured through its startup options. We use a single 4 CPUs instance (even it does clustering, see below) with SSD storage. It’s super efficient on storage since we have almost 2.5 Trillions datapoints on 1Tb (average of 3M active timeseries).
It also has a UI in case of need but we operate it mostly on the API. One of the features that we came to love is the ability to backfill data for record rules
To be explored
There are a few topics that I did not really explore yet about victoriametrics :
- clustering : the fact to run a HA cluster for storing the timeseries
- vmagent : a lightweight replacement for prometheis instances that are only acting as satellites. A coworker did a small scale experiment tho.