Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: improved diffing #103

Closed
wants to merge 7,942 commits into from
Closed

Feature: improved diffing #103

wants to merge 7,942 commits into from

Conversation

ramezrafla
Copy link

See Issue #45

Summary: the diff-ing engine used by Meteor lets through content that has not changed, especially relevant when we are display arrays. We added a fast comparison function and used it in two key locations to prevent needless changes to the DOM.

martijnwalraven and others added 30 commits February 9, 2016 12:43
Added check of params for template's methods events and helpers
@ramezrafla
Copy link
Author

Thanks @mitar, I was only asking as it was left in limbo on the forum https://forums.meteor.com/t/community-involvement-setup/29538/15

Essentially our goal was not incremental DOM per se, it was to improve rendering and avoid wasted calls to DOM. I don't think our solution is optimal, but it is very good with the constraints around it.

Look forward to you completing your review, in the mean time, I'll try to see if I can invest time in this:

If you want to help, you can in meantime investigate (maybe you already did) why looping in Minimongo does not work as expected. So we all agree that if one document is inserted in the middle, we would want only that DOM element to be added there and this is it. Why is this not happening? Could #each ... in help here? So instead of trying to mask a problem by diffing, I would like that we do not even trigger a change notification.

It's likely a Tracker / diff issue (something MDG seems to have known about, and so have you)

As far as urgency, there is a message to the community: "Blaze is getting love and it's getting better!"

This could be a temporary fix.

@mitar
Copy link
Contributor

mitar commented Sep 21, 2016

It's likely a Tracker / diff issue (something MDG seems to have known about, and so have you)

No, this is not what is known. It is known that if a document for a particular entry changes at all, that one will be rerun completely, because default equality for data context is conservative. But this for insertion is something new, at least to me. And is not how it was documented in the past. So if this is happening, I would see this as a bug, maybe even regression. And I would like to know where it is coming from. It should work both for arrays with _id and Minimongo.

What I think we have to get to is:

  • rerendering is triggered anytime anything in data context changes (then, we can optimize how data context changes are detected, with similar approach to yours)
  • but not if some other documents in Minimongo are changed, or some other ones are inserted somewhere else, that should be fixed on Minimongo/array tracking level

As far as urgency, there is a message to the community: "Blaze is getting love and it's getting better!"

I think nobody in Meteor community expects "be quick and break things" approach to development. In any case, I do not see that as a good reason to not do things properly. I have also not seen 1000 upvotes or comments to quickly merge on this pull request, so I do not see any urgency from others either.

I know that when you make a pull request you feel amazing and would love to get things merged. But again, when you make a pull request which was not discussed before, then you have to take responsibility to hold your own horses. This is on you.

@mitar
Copy link
Contributor

mitar commented Sep 22, 2016

With the current approach, the 'changed' objects trickle down to be compared with what is actually in the DOM, field by field. The comparison is far from perfect and converts content to DOM elements to compare them.

Yes, this is why I am proposing that we do comparison with previous value of HTMLJS for that object. So instead of converting to DOM elements, we compare HTMLJS itself. And only apply those changes which we find different between HTMLJS values.

Again, this is what I am saying that could be an interesting optimization which would work similar to virtual DOM, now that you investigated incremental DOM and are saying that it is not a good match.

And I agree that we should also optimize the whole pipeline for Tracker recomputations as well. But now imagine that we can make DOM rendering faster without Tracker recomputations optimization. Then when we have both, so great!

You have very ambitious plans for Blaze (e.g. different materializers).

No, this is a very simple change, for this particular approach to materializing with comparing HTMLJS, because you are interested in this. Or was.

Yes observe-sequence is where we need to work, except it looks like a big task at the moment (diff-ing engine). Will require more dedication. Not to mention it's a core package and we have to work with MDG developers who are super busy. Won't happen right away.

It would still be great if we can solve the issue there as well. I do not think we should worry too much about pull requests not getting in, they are pretty efficient on merging pull requests recently. And even if it goes late on, it goes. But somebody has to make, to even go in at all.

@@ -352,13 +352,16 @@ Blaze._materializeView = function (view, parentView, _workStack, _intoArray) {
view._isInRender = false;

if (! c.firstRun) {
// added by ramezrafla@zegenie to skip affecting DOM for unchanged content
if (Blaze._deepCompare(lastHtmljs,htmljs)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are comparing last and new HTMLJS already. Why you wrote (or I understood) that we are talking about data layer comparison.

So yea, this is similar to what I had in mind with virtual DOM. To compare old and new HTMLJS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this part of the change is something I like.

I do think we can do better though. :-)

First, why are you not simply using EJSON.equals? If you think that one can be optimized, than I would suggest make pull requests to it. Hm, but do we want to introduce a new dependency for Blaze?

So, one thing I think we can improve is that it seems that HTMLJS can contain not-rendered templates. See this code snippet. This means that if you are comparing any such lastHtmljs and htmljs they will not be equal, yes? So maybe it would be better if we make one pass over the HTMLJS tree, converting it into a pure HTMLJS tree. And then compare. The issue with that is that this might confuse how rendered callback is called. We would have to mark which segments of HTMLJS require their view 'rendered' to be called. But my worry is that now with any more complicated HTMLJS segment we are not improving much. What do you think?

Next idea: now we are all or nothing. We compare HTMLJS, if it is equal, great. But if there is one attribute difference, we still go in and then create DOM elements for whole HTMLJS tree and compare them, no, inserting/updating them? Can you point me to where DOM elements are compared?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently I worry that this prevents only the innermost HTMLJS trees from being rerendered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to where DOM elements are compared?

In the past, the _isContentEqual did just that. We are no longer doing that now, we are comparing data before it becomes DOM.

So currently I worry that this prevents only the innermost HTMLJS trees from being rerendered.

If the deepcompare finds the objects to be the same, it makes sense to skip re-rendering, unless I missed something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past, the _isContentEqual did just that. We are no longer doing that now, we are comparing data before it becomes DOM.

I mean, where it is comparing does it have to replace a DOM element with a new one. It seems to me that it does not do a new DOM element and then compares with current DOM, if necessary. But it removes a DOM element if previous view is invalided, it then creates a DOM element for new state of the view, and adds it. Even if comparing previous and new DOM element would show that it is almost the same, or even exactly the same.

If the deepcompare finds the objects to be the same, it makes sense to skip re-rendering, unless I missed something.

Yes, but I see those checks to be run only for very leaf rendering, nothing before. :-(

// Are the HTMLjs entities `a` and `b` the same? We could be
// more elaborate here but the point is to catch the most basic
// cases.
Blaze._isContentEqual = function (a, b) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you didn't just improve this method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have simply renamed the new function as ._isContentEqual, except this function is meant for DOM comparison, and we are comparing objects / primitives, so to avoid confusion I broke with the past.

@@ -226,6 +226,11 @@ Blaze.Each = function (argFunc, contentFunc, elseFunc) {
});
},
changedAt: function (id, newItem, oldItem, index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is at the data level, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have two additional diff levels right (see my original PR comment)

  1. Data level
  2. Field level in htmljs

I think 1 will being gone once we improve the original diff-ing which calls the observe handlers

@mitar
Copy link
Contributor

mitar commented Sep 22, 2016

Can you check difference for the 01-dbmon example between current Blaze version and one with this pull request?

@@ -352,13 +352,16 @@ Blaze._materializeView = function (view, parentView, _workStack, _intoArray) {
view._isInRender = false;

if (! c.firstRun) {
// added by ramezrafla@zegenie to skip affecting DOM for unchanged content
if (Blaze._deepCompare(lastHtmljs,htmljs)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into this change using this example and I could not get it to in fact come to this if (! c.firstRun) { condition with any other view than lookup:something.

If I did:

Collection.update({}, {$set: {value: 'a'}})

Change was from aa to a0.

And then if I did:

Collection.update({}, {$set: {value: 'a', test: 'b'}})

Change was from aa to aa.

So, I think that Blaze._isContentEqual moving up is a great idea for cases like above, where I am not minimizing my fields selector well in Minimongo. But I am not sure if there is any benefit of changing the function to deep-comparison? Can you please adapt my example to the case where you get something else than those simple values Blaze._isContentEqual is already checking? I could imagine only very extreme cases where you would be using {{value}} where value is not a simple variable, but an object, and you are using its stringified value to be shown in the template. But is anyone really doing that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we had deep variables. I think for a complex App, it is to be expected.

Here is your example adapted to our test case:
var obj = { a: { b : 1}}; Collection.update({}, {$set: obj});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not happen. So what I am saying is that you will never really do in your app:

{{value}}

Where:

value = {a: {b: 1}};

You might do:

{{value.a.b}}

But what I am saying is that in the comparison, Blaze just compares then values of bs, so primitive values.

So can you give me a realistic example where the value of lastHtmljs and htmljs is something else than what it is checked by Blaze._isContentEqual? I have updated my example with complex objects to test this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just used deepCompare for the sake of convenience and because I haven't gone through all the options. deepCompare short circuits comparison if it's a primitive so I am reusing the same function for risk mitigation as the data level.

But I did see somewhere in the code that htmljs can be views. Also, we skipped DOM comparison altogether that was there before, which was my biggest concern. If I was too cautious, then we can simply simplify the compare function. But, why would the original developers convert to DOM if it was always primitives?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I did see somewhere in the code that htmljs can be views. Also, we skipped DOM comparison altogether that was there before, which was my biggest concern. If I was too cautious, then we can simply simplify the compare function.

Yes, but at a higher level. These views which have !computation.firstRun seems to be very simple ones.

Which is also a problem probably, that higher-levels are recreated.

So can you make in your codebase (on which you are doing these optimizations here) a check to print out when computation.firstRun is false and htmljs is something complicated.

But, why would the original developers convert to DOM if it was always primitives?

I have no idea why they convert at all before the if. The fact that you moved if out is great, but I have no idea why they did it like this. Does converting to DOM has some side-effects we do not know?

@@ -226,6 +226,11 @@ Blaze.Each = function (argFunc, contentFunc, elseFunc) {
});
},
changedAt: function (id, newItem, oldItem, index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my example I could not get to this issue. If I instrument changedAt method to print out every time is called, and then I click "add 5" button, this is not called at all, not for 5 or any other value in the Minimongo database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried even to limit my fields selector to {value: 1, test: 1}. Then:

Collection.update({}, {$set: {value: 'a'}})
newItem, oldItem: Object {value: "a", _id: "s84f5D5bwFgPzwvmk"} Object {value: 0, _id: "s84f5D5bwFgPzwvmk"}
Collection.update({}, {$set: {value: 'a', test: 'b'}})
newItem, oldItem: Object {value: "a", _id: "s84f5D5bwFgPzwvmk", test: "b"} Object {value: "a", _id: "s84f5D5bwFgPzwvmk"}
Collection.update({}, {$set: {value: 'a', test: 'b', test2: 'c'}})

The last change changes a field outside of the projection. And changedAt is not called.

So, I could not reproduce that changedAt would be called unnecessary if a value was not really changed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, our test case was as per above comment:
var obj = { a: { b : 1}}; Collection.update({}, {$set: obj});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I could not reproduce that changedAt when data projected using fields didn't really change.

For example, if I have the following cursor over which I am doing #each:

Collection.find({}, {sort: {'value.a': 1}, fields: {'value.a': 1, test: 1}});

And values are inserted like:

Collection.insert({value: {a: 1}});

When I do:

Collection.update({}, {$set: {'value.b': 'b'}})

changedAt is not called at all.

And if I insert one one document in the middle, only Blaze for that one document is called, and no changedAt at all, for nothing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference with what I did is that it was server side update with method. If you can replicate it, it means that the problem is in the publish / diff-ing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make an example showing what you observed?

We need tests for these changes anyway.

@mitar
Copy link
Contributor

mitar commented Sep 22, 2016

(I wanted to make this an inline reply to a inline comment, but I am realizing it is pretty valuable, so I am moving it top-level.)

Yes, sorry about that. I understood something else under "data level". For me this is more data context level, which has its own issues. Sorry for misunderstanding.

Maybe we should define terminology here better. To my knowledge we have issues at:

  • data context level, equality being what it is, which triggers reruns on any non-primitive value change:
    • possible solutions:
      • get rid of data context
      • allow custom data context with custom equality function
  • tracker level, which is similar to data context level, but means that in general any reactive dependency might rerun more often than necessary
    • possible solutions:
      • using something like computed field to be precise about dependency changes
  • no tracker caching level: if you have autorun with multiple dependencies, when one of those change, also other are recomputed, instead of previously returned value just cached and returned
  • minimongo and array processing level: we want changes to be triggered only when data really changes, and only for those changes
  • blaze data level (what you are talking about here, I think): that propagation of data through blaze might be ineffective and be done when data has not really changed
  • HTMLJS materialization level: when HTMLJS does not really change, but Blaze still builds DOM elements and does diff against DOM
  • interacting with DOM level: querying DOM for state is often expensive

Did I miss anything?

@ramezrafla
Copy link
Author

ramezrafla commented Sep 22, 2016

@mitar
You obviously know a lot more of the Blaze history than I do, I just jumped recently to resolve the needless re-rendering when no data has changed. I am sure I will learn all these issues as I go carefully through the code and the issues.

ignore 3rd party changes in DOM and just trust the previous HTMLJS state and apply it blindly (should probably be configurable per template)

We have an App that affects the DOM directly and had no issues. I think it has to do with carefully managing collisions. We do not with JS affect anything that is handled by Blaze. If that separation is done, we're good

blaze data level (what you are talking about here, I think): that propagation of data through blaze might be ineffective and be done when data has not really changed
HTMLJS materialization level: when HTMLJS does not really change, but Blaze still builds DOM elements and does diff against DOM

Right! So the first one has to do with the diff-ing algorithm in the compare-sequence lib you had talked about before. This is where the real fix should be applied, unless it breaks something else (is this used by Tracker?)

The second is where I believe we have a lot of savings. We don't build DOM elements to diff against DOM. This is bad design. We should always diff raw data, not display layer. This is what makes Blaze fast (like VirtualDOM) and is the reason Incremental DOM is slower than Virtual DOM.

Edit: And given so many unchanged objects can be fed through, that DOM element creation for diffing is really problematic

@mitar mitar mentioned this pull request Sep 24, 2016
@mitar
Copy link
Contributor

mitar commented Sep 26, 2016

You obviously know a lot more of the Blaze history than I do, I just jumped recently to resolve the needless re-rendering when no data has changed. I am sure I will learn all these issues as I go carefully through the code and the issues.

Oh, maybe I do know about history, but about internals we are all learning. I think you looked much deeper than I ever did in details of rendering itself. So we are all learning here.

We have an App that affects the DOM directly and had no issues. I think it has to do with carefully managing collisions. We do not with JS affect anything that is handled by Blaze. If that separation is done, we're good

Yes, me too. Or you leave to Blaze, or you use something else. But do not mix. :-)

So the first one has to do with the diff-ing algorithm in the compare-sequence lib you had talked about before. This is where the real fix should be applied, unless it breaks something else (is this used by Tracker?)

But I have not been able to reproduce it so that I could look what is happening.

But I was using Minimongo and not arrays directly.

The second is where I believe we have a lot of savings. We don't build DOM elements to diff against DOM. This is bad design. We should always diff raw data, not display layer. This is what makes Blaze fast (like VirtualDOM) and is the reason Incremental DOM is slower than Virtual DOM.

Yes, but I could not really find where diffing is done. I have found diffing for attributes, but for DOM it seems like Blaze simply removes old element and adds a new one, when something changes. It does not try to diff and patch the old one?

@ramezrafla
Copy link
Author

ramezrafla commented Sep 26, 2016

From view.js

var rangesAndNodes = Blaze._materializeDOM(htmljs, [], view);
if (! Blaze._isContentEqual(lastHtmljs, htmljs)) {
domrange.setMembers(rangesAndNodes);
Blaze._fireCallbacks(view, 'rendered');
}

So this creates the DOM elements before comparing

@mitar
Copy link
Contributor

mitar commented Sep 27, 2016

So this creates the DOM elements before comparing

Yes, but it compares HTMLJS, not DOM.

For DOM it seems (inside setMembers) that it simply removes old and adds new, I do not see any logic to really compare old DOM and new DOM to see the minimal change. Hm.

@mitar
Copy link
Contributor

mitar commented Oct 4, 2016

BTW, this is waiting for tests/reproducible example for the changedAt issue.

The second change I like and I think we should add it to Blaze, but I will maybe add it with _isContentEqual, but also tests would be needed. This one seems pretty easy to do.

@mitar
Copy link
Contributor

mitar commented Nov 15, 2016

So I think this should be split into two pull requests. One for moving _isContentEqual, and one for data. And we need tests for both, to have a case when that code is match at all.

@mitar
Copy link
Contributor

mitar commented Dec 30, 2016

I cleaned master branch to remove all Meteor history (#112). This means the pull request should be also updated to new history to not base it on old commits. Sorry for this extra work.

mitar added a commit that referenced this pull request Dec 31, 2016
@mitar
Copy link
Contributor

mitar commented Dec 31, 2016

Based on this pull request, I made a change in e17991a, which prevents unnecessary materialization of DOM. This is what I have able to make a reproduction myself. For the other part, checking a change in data layer I have not been able to make a reproduction.

I am closing this pull request. If we find a good reproduction for that other case, we could look into optimizing that as well.

For more ideas how to optimize Blaze see #111.

@mitar mitar closed this Dec 31, 2016
@arggh
Copy link

arggh commented Dec 31, 2016

@mitar Did you benchmark this change on any of your test cases?

@mitar
Copy link
Contributor

mitar commented Dec 31, 2016

Sadly not, I do not have any real case where this is happening. I am relaying on the report in this pull request. Feel free to make a test case/benchmark and help here.

@arggh
Copy link

arggh commented Dec 31, 2016

I might have something useful in one of my projects for a benchmark.

@ramezrafla
Copy link
Author

Beautiful @mitar! Thanks!

@arggh, we caught this as we outputted to the console every re-render. This can improve performance dramatically by skipping DOM manipulation if we have a very dynamic page. I am guessing this change is going to be very noticeable for a lot of people. Look forward to your benchmark.

There is also room for improvement in the actual diffing but that's a bigger job.

@mitar
Copy link
Contributor

mitar commented Jan 1, 2017

One more thing. I was unable to get to not c.firstRun for anything else besides primitives for which Blaze._isContentEqual is good at checking equality. If anyone can get there with something which Blaze._isContentEqual is not comparing well, we might want to upgrade Blaze._isContentEqual to something which checks values in depth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.