-
-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop a Roadmap #210
Comments
cc @stsievert, I wonder if you could post your summer plans in here, and we'd can use this issue to develop a wishlist, and from there a prioritized roadmap. |
Here's my scattered wishlist. Will add / cleanup things. Optimization
Algorithms
Scikit-Learn compatibility
Miscellaneous
|
My wishlist is similar. Should we edit this wishlist in some place that's more amendable to changes than a GitHub issue? Maybe a fork of dask-ml or a google doc.
I have put some thoughts down at https://docs.google.com/document/d/1jsCmPcXlXsSLgdFYgXgngj_P1EkumwZ3MrjkoVaMTjY but it's much less clear than this. |
Thanks @TomAugspurger @stsievert! I think that this is great! It would be awesome if you could expand some of those topics into 1 - 3 sentence descriptions each. |
Do you have a plan to support other algorithms like tree-based method and support vector machine? |
@stsievert I'd be interested to have a look at your notes (people in my team may be interested to combine dask with PyTorch and/or Tensorflow at one point in the future). It looks like your google document is not public though. Would you be willing to make it public? |
@lesteve I think I've made it public. I'd still label it as a work in progress though. |
Better integration with the various DL frameworks is certainly in scope. If you have / develop thoughts on how this should be done, then please share them :) |
Great, thanks!
I would say we are just getting started so we probably have more questions than answers for now. |
Especially with PyTorch. It certainly feels like there should be an integration with PyTorch because it has |
I recommend that we move integration with deep learning frameworks to a
separate issue.
@lesteve is this an issue that you would be comfortable starting?
…On Mon, Jul 2, 2018 at 11:08 AM, Scott Sievert ***@***.***> wrote:
If you have / develop thoughts on how this should be done, then please
share them
Especially with PyTorch. It certainly feels like there should be an
integration with PyTorch because it has torch.distributed and
torch.multiprocessing. I almost added it before, and I've added it now.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#210 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszAzguPJqyRU7aq7lOKNRNe2THM_Wks5uCjdbgaJpZM4Ui8Ze>
.
|
I opened #268. |
Is there any appetite to port many of the classifiers built for larger than memory datasets from spark-ml? I would love to see dask_ml have some of this functionality natively. https://spark.apache.org/docs/latest/ml-classification-regression.html Here's a specific example of a decision tree that has been adapted for larger than memory training datasets: Spark-ml also has many useful feature extraction, selection, and transformers. |
I think that most (all?) of those would be considered in scope.
LogisticRegression is already implemented (at least for the binary case,
not sure about multinomial).
Are there particular feature extraction / selection methods you're missing
from Dask-ML? http://dask-ml.readthedocs.io/en/latest/modules/api.html
…On Tue, Jul 10, 2018 at 7:44 AM, js3711 ***@***.***> wrote:
Is there any appetite to port many of the classifiers built for larger
than memory datasets from spark-ml? I would love to see dask_ml have some
of this functionality.
https://spark.apache.org/docs/latest/ml-classification-regression.html
Here's a specific example of a decision tree that has been adapted for
larger than memory training datasets:
https://spark.apache.org/docs/latest/mllib-decision-tree.html
Spark-ml also has many useful feature extraction, selection, and
transformers.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#210 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIlQP5SNG5Qx1mHOv428EVvHYhr21ks5uFKGVgaJpZM4Ui8Ze>
.
|
I am interested specifically in the distributed decision tree implementation with customized stopping criteria at the moment. I currently use trees as a feature extraction method on large datasets. It would be great not to hop over to scala more generally though :) |
Hello All,
I'd like to help Dask-ML develop a roadmap, and @TomAugspurger requested that I open an issue here to kickstart the discussion.
For those of you unfamiliar, a roadmap is a listing of near- and long-term goals that the project has, but which have yet to be implemented. These goals are generally larger than single pull request. "Add GPU Support" or "Port to Python 3" are examples of some roadmap items a project might have. However, goals can also include activities that seem small and mundane, but are critical to the project, such as "Improve Documentation" or "Achieve 100% Test Coverage."
The purpose of this issue is so that roadmap items can be listed and commented on in the discussion. Once we have a reasonable sense of what folks would like to see on the roadmap, the contents should make its way to the website/docs in some fashion. It is also a good idea if it ends up being listed in priority order, so that higher priority items are closer to the top.
Some good example of project roadmaps are:
A decent example of a project roadmap is Spyder's Roadmap which looks more like a timeline with milestones.
Roadmaps serve a joint purpose:
TL;DR: Roadmaps communicate a project's intentions. It would be a good idea for Dask-ML to have one, and I am happy to help facilitate!
So where would you like to see Dask-ML go?
This is a mirror issue to dask/dask#3589
The text was updated successfully, but these errors were encountered: