Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Data grouping for large datasets #4053

Closed
jesusgp22 opened this issue Mar 21, 2017 · 27 comments · Fixed by #8255
Closed

[FEATURE] Data grouping for large datasets #4053

jesusgp22 opened this issue Mar 21, 2017 · 27 comments · Fixed by #8255

Comments

@jesusgp22
Copy link

Expected Behavior

Other charting libraries automatically group series data instead of rendering every single point (example: http://www.highcharts.com/stock/demo/data-grouping) and here is a detailed explanation of how it's done in amcharts: https://www.amcharts.com/kbase/understanding-data-grouping-of-stock-chart/

Possible Solution

The solution outlined by highcharts.js and amcharts are a bit different:

  • Highcharts calculates the point to pixel density, if each point rendered will be smaller than a pixel then there is no need to render each of them, instead data is grouped and by default the average value of the group is calculated (sum is used for bar charts)
  • Amcharts implements a customizable solution where the user sets the group size
  • In both cases the group size and value is calculated automatically
  • A group value function callback could be added for other solutions like max or min value instead of average
  • On zoom or resize groups should be recalculated

Context

Often data visualizations require to render thousands of points, this is not necesary becase:

  • the user might not be able to understand so much data
  • the rendering is too slow
  • calculating group averages might display more interesting values than each point
  • This solution improves performance because calculating groups should always be faster than rendering on the canvas [this might be up to devate]
@etimberg
Copy link
Member

This looks like an interesting feature that could form the basis of a lot of features. I feel that this could live as a plugin that ships from the main repository but that isn't included in the main build (due to size requirements that this introduces). I'm happy to look at a PR that starts these features out.

Another option is to do min max decimation which at most has 4 points per pixel

@jesusgp22
Copy link
Author

jesusgp22 commented Mar 22, 2017

Thank you for your quick answer @etimberg

I am just getting started with chartjs (but I already love it) could you point some hints to get started with this?

I have to get this implemented anyway and I'd be happy to collaborate.

@etimberg
Copy link
Member

Some examples of plugins are: https://github.com/chartjs/chartjs-plugin-zoom https://github.com/chartjs/chartjs-plugin-annotation

The plugin docs are: https://github.com/chartjs/Chart.js/blob/master/docs/developers/plugins.md

What I think you would need to do is to hook into beforeUpdate or resize and use the size of the chart to remove items from the chart.options.data object. Then the chart would render with less data until such time as beforeUpdate is called again. You would need to take care to keep a complete copy of the original data somewhere so that you can re-decimate on the fly.

@jesusgp22
Copy link
Author

I'll make sure to follow up here when I make some progress on this

@arikkfir
Copy link

arikkfir commented Apr 7, 2017

This would be highly beneficial for us too. Thanks for looking into this.

@drewhammond
Copy link

Another vote for this feature here

@CAPTelmo2165
Copy link

@jesusgp22 Did you ever implement a plugin for this? This would be super helpful right now.

@Evertvdw
Copy link

There is something like this, maybe you can use it as a starting point:
https://github.com/AlbinoDrought/chartjs-plugin-downsample

@neon-dev
Copy link

Because I had massive performance issues with charts containing tens of thousands of data points, I implemented pretty much what got suggested here. Thought I'd share it after I found this topic.

The following code can be registered globally as a default plugin via Chart.pluginService.register(reduceDataPointsPlugin) (not recommended), or per chart as plugins: [reduceDataPointsPlugin] in the options parameter.

var reduceDataPointsPlugin = {
  beforeUpdate: function (chart, options) {
    filterData(chart);
  }
};

function filterData(chart) {
  var maxRenderedPointsX = 800;
  var datasets = chart.data.datasets;
  if (!chart.data.origDatasetsData) {
    chart.data.origDatasetsData = [];
    for (var i in datasets) {
      chart.data.origDatasetsData.push(datasets[i].data);
    }
  }
  var originalDatasetsData = chart.data.origDatasetsData;
  var chartOptions = chart.options.scales.xAxes[0];
  var startX = chartOptions.time.min;
  var endX = chartOptions.time.max;

  if (startX && typeof startX === 'object')
    startX = startX._d.getTime();
  if (endX && typeof endX === 'object')
    endX = endX._d.getTime();

  for (var i = 0; i < originalDatasetsData.length; i++) {
    var originalData = originalDatasetsData[i];

    if (!originalData.length)
      continue;

    var firstElement = {index: 0, time: null};
    var lastElement = {index: originalData.length - 1, time: null};

    for (var j = 0; j < originalData.length; j++) {
      var time = originalData[j].x;
      if (time >= startX && (firstElement.time === null || time < firstElement.time)) {
        firstElement.index = j;
        firstElement.time = time;
      }
      if (time <= endX && (lastElement.time === null || time > lastElement.time)) {
        lastElement.index = j;
        lastElement.time = time;
      }
    }
    var startIndex = firstElement.index <= lastElement.index ? firstElement.index : lastElement.index;
    var endIndex = firstElement.index >= lastElement.index ? firstElement.index : lastElement.index;
    datasets[i].data = reduce(originalData.slice(startIndex, endIndex + 1), maxRenderedPointsX);
  }
}

// returns a reduced version of the data array, averaging x and y values
function reduce(data, maxCount) {
  if (data.length <= maxCount)
    return data;
  var blockSize = data.length / maxCount;
  var reduced = [];
  for (var i = 0; i < data.length;) {
    var chunk = data.slice(i, (i += blockSize) + 1);
    reduced.push(average(chunk));
  }
  return reduced;
}

function average(chunk) {
  var x = 0;
  var y = 0;
  for (var i = 0; i < chunk.length; i++) {
    x += chunk[i].x;
    y += chunk[i].y;
  }
  return {x: Math.round(x / chunk.length), y: y / chunk.length};
}

Disclaimer:

It's only a proof of concept and therefore does only work with time points (millis, not date objects) on the x axis. Also it assumes there is only one x axis, so it ignores limits of other axes. Updating (adding/removing) data only works if you hold the original data array references and work with them. The chart options datasets obviously cannot be used for that anymore, since they only hold the view values.
The code could also be quite a bit cleaner (notice the floating point array index access?).
You most likely will have to modify the code to your needs.
The plugin is partly based on @Evertvdw's code as well as @bbb31's code.

How it works and what it does:

First everything outside the visible range gets cut (massive performance gain on its own, this should happen natively inside Chart.js if you ask me), then the remaining data points get reduced to 800 averaged points of the data that gets dropped. This number can be configured via maxRenderedPointsX, but better would be to make it a chart option.
Zooming and panning is fully supported and (unlike chartjs-plugin-downsample) will maintain the granularity based on the currently displayed portion of the data, which means zooming in will show previously hidden (averaged) datapoints and zooming out will hide them again. With this you will always see a fixed number of data points, or less if there are not enough in the current range.

@michalkolek
Copy link

michalkolek commented May 4, 2018

@neon-dev I am getting this error when added your snippet:

Uncaught TypeError: Cannot read property '_children' of null
    at Object.dataset (Chart.js:11077)
    at createMapper (Chart.js:11236)
    at Object.afterDatasetsUpdate (Chart.js:11345)
    at Object.notify (Chart.js:6754)
    at Chart.updateDatasets (Chart.js:4288)
(...)
var mappers = {
	dataset: function(source) {
		var index = source.fill;
		var chart = source.chart;
		var meta = chart.getDatasetMeta(index);
		var visible = meta && chart.isDatasetVisible(index);
		var points = (visible && meta.dataset._children) || []; // <---- line 11077
		var length = points.length || 0;

Any hints?

@neon-dev
Copy link

neon-dev commented May 5, 2018

Since you did not provide a fiddle or further information about your data and chart version, I created a working fiddle using my plugin:
https://jsfiddle.net/donyxzqz/

Hope it helps. As I already said, it most likely needs to be modified to work for your use case.
For example I've already changed the reduce function to chunk the data based on their time distance. Currently data points are simply chunked by a counter, so a chart with varying distances between points will lose way too much information where points aren't as close to each other, if displayed along with many data points that are near each other.

@khumarahn
Copy link

@neon-dev Thanks for your code. I am trying to adapt it, but my axis is numerical (linear), not time. How do I find min and max values on the x axis?

@neon-dev
Copy link

@khumarahn Regular min and max values are stored in the ticks option per axis, so you need to change time to ticks:

  var startX = chartOptions.ticks.min;
  var endX = chartOptions.ticks.max;

Quick example: https://jsfiddle.net/donyxzqz/4/ (you should rename other variables which are still called or referring to time)

@khumarahn
Copy link

thanks! I found also that chart.scales["x-axis-0"].min and chart.scales["x-axis-0"].max work, probably set by the zoom plugin.

@PodaruDragos
Copy link

@neon-dev are you considering including this into a plugin ?

@neon-dev
Copy link

neon-dev commented Jun 14, 2018

@PodaruDragos sorry, no plans. Best would be to integrate the main performance boost (data truncation outside the visible area) directly into Chart.js with a PR in my opinion, but I don't have the time and experience with the Chart.js source for it. If we had that, the grouping feature could be a plugin on its own or also become a native feature.
Packing both into a single dirty plugin like I did is not really desirable, so I hope someone else tackles it to properly implement it in one or the other way.

@PodaruDragos
Copy link

PodaruDragos commented Jun 14, 2018

@neon-dev i am willing to help if i can, i don't have any experience with the library, but this seems like a good ideea

@outOFFspace
Copy link

any progress on this?

@almersawi
Copy link

Any update?

@leeoniya
Copy link

leeoniya commented Dec 16, 2019

Chart.js v3 should be much faster, but if you want something even faster than that, here's a shameless plug: https://github.com/leeoniya/uPlot

@benmccann
Copy link
Contributor

Yes, this has actually been implemented for line charts in v3:

function fastPath(ctx, points, spanGaps) {

It hasn't been implemented for other chart types yet

@almersawi
Copy link

Thank you!

@PodaruDragos
Copy link

@benmccann are there any benchmarks for how fast this is compared with master ?

@benmccann
Copy link
Contributor

benmccann commented Dec 16, 2019

development on v3 is occurring on master, so you probably mean how much faster is master than 2.9

we've made a lot of improvements in the most recent versions. 2.9.3 is a lot faster than 2.8. I would recommend checking out our new performance docs for other tips on speeding up your charts: https://www.chartjs.org/docs/latest/general/performance.html

we're working on deploying a benchmark currently, but 3.x is currently somewhere around 2-3x faster than 2.x. we're still working on additional improvements that should make at least another 2-3x faster

@dudelkuchen
Copy link

Based on the suggestions here I have written a plugin that displays only one data point per pixel. In my example fiddle, the plugin seems to be 2-8 times faster than plain chart.js 2.9.4, depending on the amount of data and canvas size.

https://jsfiddle.net/dudelkuchen/xboa584h/30/

https://github.com/dudelkuchen/chartjs-plugin-largedatasets

@benmccann benmccann linked a pull request Jan 3, 2021 that will close this issue
@wvanrensburg
Copy link

@etimberg Why was this closed? That PR didn't even solve the original users suggestion.

@benmccann
Copy link
Contributor

It's recommended to do this in a plugin. We added the hooks necessary to allow such a plugin. We also added data decimation in core: https://www.chartjs.org/docs/latest/configuration/decimation.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.