InfluxDB/Grafana fill(previous) Issue

Cjkeenan · September 8, 2021, 7:33am

So I have been using InfluxDB and Grafana to provide some history and data graphing for upwards of 6+ months now and it has been great so far. But one issue seems to keep popping up: when a data point is not available at the start of the time range, regardless of the fill type, there is a gap in the output graphic. I think it is due to the very nature of IoT devices being logged into a database on a irregular basis but I was hoping someone had experienced this and maybe thought up a workaround.

Example past 48h:

Example past 24h:

Notice that there has been no change for that particular metric for the past 24 hours (from the 48h graph), so when only looking at the past 24h there is “no data” to reference, even though it is not that there is “no data” but rather that the data has not changed.

Also note that even for the 48h graph, there is a bunch of data missing at the start because the data had not changed until before the time range started.

Both of these graphs were generated using this query:

After researching a bit online, some places say just to push data into the DB on a regular interval, regardless of when it changes. But this seems to be a very bad solution that doesn’t really solve the problem but rather just hides it, at the cost of efficiency and database size. I also found issues on the GitHub’s of both InfluxDB and Grafana that reference an issue like this, but both of them are quite old (5+ years), and still have gone unresolved despite seeming overwhelming community calls for it.

If anyone has any ideas on a way to fix this I would very much appreciate it.

RRodman · September 8, 2021, 7:30pm

You will not find a “solution” or “fix” to this aside from injecting that data you don’t want to reinject, as it is not actually a bug or problem, hence why the "issue"has been ignored for over 5 years.

This is proper functioning of a time based series of data and a query on said data. When you define a time range that is literally ALL the query looks at, so there is no way for it to pull a value from earlier than your range to back fill with.

The fill option will NEVER fill in based off of data outside of your time range, Period.

Changing the way a query works, making it ignore your start date and look back further for an existing data point would invalidate the data for 99% of actual use cases, not to mention adding overhead and in some cases dramatically increasing the time it takes a query to execute.

Thats not even getting into the logic issues, how far back should it look? a day, a week, a month, a year, 10 years? where is the cutoff? what happens if there is no previous value? How do you tell if your no longer receiving data or your source is corrupted if your charts always use the most recent point they have to flll in? You could be assuming everything is fine based off a 10 year old status because your data source stopped updating and you didnt notice an issue because your charts still show lines etc

Basic rule of thumb with time series data is that you inject the data at a minimum interval equal to the time interval you are going to use within grafana etc.

If your worried about database sizes due to the injection then you need to limit the age of your series data so it auto purges data older than x days.

In your examples it seems your doing a time interval every 2 hours??, in which case you would just need to inject the value once every 2 hours to eliminate your blank data issues.

Doing a pull and inject will also make sure your data isn’t stale and is always accurate, even if it really isnt changing

EDIT: Database sizes don’t seem to be much of a concern… 5 months of per second data recording of my pc and pi stats plus recording every data point from every IOT device in my house (including 4 particulate matter sensors and 16 energy sensors which update every second) has resulted in 400megs of data… So i’m looking at about a gig of data a year to record every possible data point my home creates with per second database writes.

Cjkeenan · September 8, 2021, 9:24pm

I figured as much, but at the same time then why haven’t they shut this down? Why haven’t they provided a definitive answer and reasoning for why it won’t change? This exact issue is even listed under their FAQ, even referring to the Github Issue, but there is no discussion for future steps, either to fix or to keep.

I am not sure I follow the logic of this feature breaking people’s setups. If their data is within their selected time range then no harm no foul. The only thing that would change is if the user wanted to prefetch a bit of data. Maybe you could even do this by not changing fill(previous) but rather adding a prefetch(30d) and then it would be completely optional and tailored towards the particular query.

That’s the thing, I don’t really have a set time range that I always use, I have a tendency to both look at a macro and micro view depending on the metric.

RRodman · September 8, 2021, 9:58pm

Grafana and influx are used in a LOT of corporate/commercial/industrial settings. Anything that is going to add overhead, change the query structure, add complications or require changing existing setups to update etc is considered a breaking change and very unlikely to make it in unless its a security issue.

That being said there may be a 3rd party plugin for grafana that provides the functioning you are after. But not one I am aware of… I just inject the data at a set interval if there hasn’t been an update (time varies depending on data source)

As you can see from just 2 of my 10 dashboards i am charting a LOT of data and like I said a years worth of stored info only runs about 1-1.5 gigs.

I think the best solution for you (assuming you don’t want to keep the data for a long period) would be to set up a retention period on your data.

Now if your interested in going down a rabbit hole what we could do is combine retention policies with continuous queries to give you detailed data for the past x days and then scrub all older data down to just one data point per x minutes or days etc… basically lowering the resolution of older data leaving just a summary…

Heres info from influx about using this combination to trim data so you can look into it in more detail
Downsampling and data retention | InfluxDB OSS 1.7 Documentation

april.brandt · February 19, 2023, 9:48am