Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window dense rank #248

Closed
wants to merge 4 commits into from
Closed

Conversation

frosforever
Copy link
Contributor

Builds on top of #236, and replaces #225.

Adds Window function support with denseRank as the first supported function.

TODO in other later PRs:

  • More window functions
  • Support for window framing
  • Support for TypedAggregates


//Need different name because companion class has `orderByMany` defined as well
//Need a class and not object in order to define what `T` is explicitly. Otherwise it's a mess
//This makes for some pretty horrid syntax though.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very open to suggestions on how to make this syntax nicer. https://github.com/typelevel/frameless/pull/248/files#diff-c5eeb2825da1c4337deff8bccfefb194R93 is pretty gross in my view.


}

//TODO: Move these to the other funcs?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should WindowFunction be its own object or should we lop them in with the other functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it done in vanilla? I would follow that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's in with the regular functions in functions.scala but that file is also 2500 lines long and seems a bit excessive.
We've also split out AggregateFunctions despite those also originally coming from functions.scala


trait WindowFunctions {

//TODO: TypedAggregate version that can be used in `agg`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punting to a later PR. I'm still unsure how to do this right.

@codecov-io
Copy link

codecov-io commented Feb 10, 2018

Codecov Report

Merging #248 into master will decrease coverage by 0.83%.
The diff coverage is 57.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #248      +/-   ##
==========================================
- Coverage   96.99%   96.15%   -0.84%     
==========================================
  Files          52       54       +2     
  Lines         866      885      +19     
  Branches       10        9       -1     
==========================================
+ Hits          840      851      +11     
- Misses         26       34       +8
Impacted Files Coverage Δ
...in/scala/frameless/functions/WindowFunctions.scala 100% <100%> (ø)
dataset/src/main/scala/frameless/TypedWindow.scala 55.55% <55.55%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba9abbe...cadb02b. Read the comment docs.

@OlivierBlanvillain OlivierBlanvillain mentioned this pull request Feb 11, 2018
76 tasks

}

//TODO: Move these to the other funcs?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it done in vanilla? I would follow that.

import shapeless.{ HList, ProductArgs }

trait OrderedWindow
trait PartitionedWindow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using a HKT instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by that here. I use these traits to ensure that a partition or an order has been assigned to a window as some window functions require one or both to be present (eg denseRank needs an ordering). An ordering might be on multiple disparate types and has no affect on the return type of the window. What would be gained by making OrderedWindow a HKT here or do you mean something else?

@imarios
Copy link
Contributor

imarios commented Mar 2, 2018

Hey @frosforever, anything else you want to add here? Should I go over it and merge if everything looks ok?

@frosforever
Copy link
Contributor Author

Hey @imarios, sorry this has languished a bit. I've left some comments on things I'm a bit stuck on that aren't really blockers but I'm very open to suggestions.
I'm also unsure how best to test a window function creation without testing a window function. Ideally those things could be tested separately but I don't believe there's any way to introspect a spark untyped window definition.

@OlivierBlanvillain
Copy link
Contributor

@frosforever Do you plan to work again on this PR? It's more than 6 months old, if not then we should probably close it...

@frosforever
Copy link
Contributor Author

Hey @OlivierBlanvillain! Sadly this PR is stuck in a bit of an impasse and I have not had time to return to it. I'd love to get back to this at some point but have no idea when that might happen.
I'm 100% behind "always be closing" so feel free to kill this and I'll open a new one when I get the time to work on it. Thanks for the maintenance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants