-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array-extras adds set methods #299
Conversation
Intersection of two or more arrays. | ||
|
||
@method intersect | ||
@param {Array} First array (or array-like thing - is passed through Y.Array() first.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@param
tags require a type and a parameter name, not just type and description. Also, do you think you could format your doc comments to match the formatting of the other doc comments in array-extras.js? I worked hard to keep them consistent. :)
arrays. This function will receive two arguments: the first | ||
will be an item from the first array, the second will be an item | ||
from the second array. This might be useful for arrays of objects, | ||
for example. Note that the return value may contain duplicate values - you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unintuitive to me that the returned array might include duplicate values. If a value exists in the intersection, there's no reason (other than implementation laziness) to re-add it to the intersection if it's encountered again.
Edit: Urk, the diff changed before I posted this comment, so GitHub added it to the wrong line. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my test cases is 'numbers near'. it's a definite edge case, although with dates might not be so.
E.g.
A.intersect([1,5,7], [2, 5, 9], function (a, b) {return Math.abs(a-b) < 2;});
should give [1,2,5]. but it goes wrong if you try and use unique using the same comparison function - in this case, 1 and 2 will be considered the same and 2 will be removed.
With numbers it looks a bit unlikely, but I could see a 'dates +- 2 days' type use case.
(Edit: sorry, I won't push again!)
I have some reservations about the complexity of the implementations of |
OK, thanks Ryan. It's already a little while past the end of my day... |
I felt like A.intersect = function (array) {
var args = Array.prototype.slice.call(arguments, 1),
i, len, testFn;
if (typeof args[args.length - 1] === 'function') {
testFn = args.pop();
}
return A.filter(A.unique(array, testFn), function (value) {
return A.every(args, function (otherArray) {
if (testFn) {
for (i = 0, len = otherArray.length; i < len; i++) {
if (testFn(value, otherArray[i])) {
return true;
}
}
} else {
return A.indexOf(otherArray, value) > -1;
}
});
});
}; (note that this ensures uniqueness of the items in the first array, which I think makes sense, but I think @mattparker disagrees). I think something similar could be done for I'd also recommend that you mention in the API docs that these methods are going to be really slow for large arrays, or for large numbers of arrays, since the default implementation relies on |
There were two reasons why I ended up as it is, although they may not be good reasons. I'd initially thought of this as a standalone sub-module (array-set or something) and wasn't going to depend on array-extras. Obviously if they end up all together this is irrelevant (and may be in any case). Secondly, this was where I realised how shocking the performance of native array methods like each() are, compared to simple for() loops. I did a bit of testing on on jsperf and then came across others and (for each(), anyway), it was pretty significant. So I decided to bloat a bit for the sake of speed. Again, that may be wrong. I've done some simple jsperf tests again now: http://jsperf.com/a-intersect-variations The bloated, non-reuse is up to twice as fast on the few tests I've run so far. So I think there's a trade-off between code reuse and performance on the page (or k-weight vs on-the-page) - I'm happy to go either way. To be consistent I might want to look at use of indexOf too (http://jsperf.com/indexof-native) It did also make me wonder whether we could/should add a switch in to Y.Array (Y.Array.useNative(false) to turn off the use of the native each() etc implementations where/while their performance remains poor). On the uniqueness thing: I think we're talking about slightly different things. From your snippet Ryan it seems like you're unique-ing the first array passed in, before intersecting with the second. I was talking about not unique-ing the result after intersection. I'll try a better use-case example of why I don't think we should unique() anything in this method, but leave it up to the user. I have two arrays of objects: the first has objects which represent flowers; the second has objects which represent cars. Both flowers and cars have a property 'colour'. I want to use intersect to get an array of things where I know there's a car with a matching coloured flower (1). var matchingThings = Y.Array.intersect(flowers, cars, function (oFlower, oCar) {
return oFlower.colour === oCar.colour;
}); If I use my comparison function in a call to unique on the first array before intersecting, I'll reduce all my flowers down to a set that has just one of each colour. If I use it afterwards, I'll merge all my cars and flowers. Neither are what I want. I'm actually using these methods with Models and ModelLists, and my past experience has been: 1. expect objects (especially dates), and 2. don't assume too much. In general, the comparison function we pass into this method is for comparing two items from different arrays, not for comparing two items from the same array. Granted that with strings and numbers and in lots of object use-cases these could be the same; but I don't think we should assume it. We should note in the docs that calling unique first on your input arrays might reduce their size and increase performance; and that they may want to afterwards too. (1) So I can book the publicity photos, because as we all know flowers sell cars. |
As I mentioned above, this intersection algorithm is inherently slow regardless of how the iteration is performed. The only way to make it significantly faster would be to have knowledge of the contents and sortedness of the arrays, which are things we can't assume if we want it to be generic. The specific speed differences in the jsperf test cases drop significantly if you remove the I'm not overly concerned about k-weight here. I'm concerned about the readability and maintainability of the code.
There are much more efficient algorithms that could be used to determine the intersection of two ModelLists. I'm very wary of trying to stretch a generic I think there's a compelling case for a generic |
Maybe I'm over-arguing this!
Yes, fair enough, sorry.
(My use case examples have got muddled. and I forgot what I actually did. I'm not doing that for real! Sorry). I suppose my worry is that once an intersect() is there it gets used but then doesn't do enough for reasonable use-cases. I really think we ought to be able to accommodate arrays of Dates or other simple objects without making it too awful to read and maintain. What do you think about:
Also, do you have a view on the methods that ought to be in this: intersect(), union() and diff() seem reasonable to me; the original ticket mentioned xor too, for example? And separately, do you have a view on being able to optionally 'turn off' use of native methods in Y.Array? Matt |
The methods in
+1, obviously. :)
I think I've come around to not liking the comparison function. It encourages the use of this low-level
I still think that returning a result that contains duplicate values is unexpected behavior and less useful than returning a deduped result set, but I'm interested in hearing what others think about this.
Definitely opposed to this. Much of the value of
This is already possible. You can set the global These are my opinions as former maintainer of the array utils, but I'm no longer the gatekeeper. I'd love to hear opinions from the YUI team or anyone else. |
OK.
Fine. Benefit was some slight simplification of the code, and I didn't think there was going to be any performance benefit in accepting lots. But I'd prefer to keep it with lots.
Thank you, didn't know that.
Yes. I'll pause now until we get some other input... |
We discussed this in today's YUI Open Roundtable. The recording should show up eventually at http://www.youtube.com/watch?v=rKrzggbBpfY General consensus among those on the hangout was that |
Hey, Thanks, I watched the video - sorry I didn't realise this would be on the agenda. I'm happy to work on these along the lines discussed. I still think that even the low-level version should support arrays of Dates and object literals with a custom comparator, but am quite happy to be out-voted. The original idea for this was as a separate array-set gallery module. I might re-introduce this, with versions with comparator functions. I might also start thinking about more performant versions when something is known about the data. And an array-set module could include even more esoteric set operations like xor or whatever. Matt |
Sounds good Matt! I like the idea of a separate higher-level module with more functionality. I wish we'd had you in the hangout. It'd be good if the agenda for these things was set further in advance to ensure that everyone can attend; I only heard about the agenda in IRC a few minutes before the hangout began. |
@mattparker @rgrove Would a new PR work for this or do you want to keep it here? I figured a new one on a new branch would be best since we don't get all the old history. Trying to clean up older Pull Requests |
Yes a new PR would work: I got a bit confused with which branch I should be on which is partly why nothing's happened. Should it be in dev-3.x? |
Since this is new functionality, issue it against Just make sure you reference this one in your new one so we have "history". |
Adds methods intersect(), union() and diff() to Y.Array as part of the array-extras module. Addresses ticket #2532209.