Nico
Nico
Creator of this small website
Dec 30, 2022 2 min read

Using Victoriametrics for Prometheus LTS

thumbnail for this post

Not the LTS you are used to

Nowadays everyone knows about Prometheus, but there are few people that1 dig deeper in its core functionalities, such as the storage of the data. However, it is a critical point because it can be a performance contender as much as a liability since you often base your whole alerting on the metrics that are stored in this TSDB (Time Series DataBase). For periods of retention that are more than a simple buffer we talk about Long Term Storage or in short “LTS”.

Still, the vanilla prometheus TSDB is not the only one that exists, and there are alternatives : the 3 most known are Cortex, M3 and Thanos. There is also Mimir that I did not test myself and the one I wanted to cover in this little blog post : VictoriaMetrics.

Using VictoriaMetrics

The following is going to be expressed from the point of view of the SRE : deployment, management, usage, integration with the ecosystem and so on.

VictoriaMetrics is designed as single binaries, that can be easily installed, upgraded and tested. I’ve been running it in production at $DAYJOB for almost 2 years now, upgraded it a few times and never had an issue doing so. It’s all configured through its startup options. We use a single 4 CPUs instance (even it does clustering, see below) with SSD storage. It’s super efficient on storage since we have almost 2.5 Trillions datapoints on 1Tb (average of 3M active timeseries).

It also has a UI in case of need but we operate it mostly on the API. One of the features that we came to love is the ability to backfill data for record rules

Since our different prometheus instances all write to this TSDB using remote write, we compute our alerts against it too. For this we use vmalert which compatible with the prometheus syntax.

Speaking of syntax, victoriametrics uses MetricsQL which is an improvement over PromQL while staying compatible with it

To be explored

There are a few topics that I did not really explore yet about victoriametrics :

  • clustering : the fact to run a HA cluster for storing the timeseries
  • vmagent : a lightweight replacement for prometheis instances that are only acting as satellites. A coworker did a small scale experiment tho.

Notes


  1. Nonetheless, some people do care, and they even present it at conferences, as Aurélien did ↩︎