[READMORE]
You can split a single InfluxDB measurement into smaller measurements for performance tuning or for application aspects you may not have planned while initially storing data.
In the influx
shell, run the following statement:
SELECT * INTO <destination_measurement> FROM <source_measurement> [WHERE <criteria>] GROUP BY *
Don't forget to use the GROUP BY *
at the end, or tags from the source measurement won't be copied over.
Background
We use InfluxDB as the backend for a customer-facing component that summarizes a large dataset updated on a daily basis. The use-case is read-intensive by nature and uses statistical features in influx to provide a high-level view of underlying data. Writes occur daily in large batches. The collected raw data in first recorded in RDS/PostgreSQL due to the mission-critical nature of the dataset and later shipped to Influx by a separate process. While the uptime of Influx is important, it's also critical for us to reload Influx from scratch in case something untoward occurs. We periodically drain the Postgres data into flat files and archive them in S3 so that our storage costs are balanced.
The InfluxDB instance is hosted on a single node AWS m5.large
machine (2 vCPU, 8GB RAM) for it's memory, compute, and network throughput features. CPU does hit 100% frequently during queries but response time is good for some ugly queries. We had issues managing write-throughput on a smaller instance and writes erroring out with failed to write point batch to database
/ cache maximum memory size exceeded
. Tweaking cache-max-memory-size
and reducing our fields and tags helped, but as the dataset grew, query performance wasn't good enough. Moving to a large instance was a helpful quick fix.
The critical fix that halved our query time however was splitting our measurements. Among some of InfluxDB's most interesting features is the ability to query multiple measurements with a single query using a regex in the FROM
clause.
So instead of:
SELECT <fields> FROM <measurement> WHERE type =~ /^(1|2)$/ GROUP BY __
you can run:
SELECT <fields> FROM /^measurement_(1|2)$/ GROUP BY type, __
This may or may not make sense depending on how/what data you're storing, but it's something to consider when deciding your schema. The good news is that you can experiment with the split at any time using SELECT * INTO
as shown above and update your queries to run on multiple measurements.