-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/spectral clustering #665
base: main
Are you sure you want to change the base?
Conversation
Add module boilerplate. Add `_kmeans` function with naive signature. Add test class and trivial clustering test case.
Add `spectral_clustering` function. Add test class and test overabundant cluster exception.
Add `numpy.random.default_rng` to `_kmeans` for random number sampling. Fixes bug with test cases where clusters could get merged from rare random conditions.
Add perfectly separable `spectral_clustering` test.
return clusters | ||
|
||
|
||
def _kmeans(X, k, seed=37): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add docstring. Also, is there a reason to specify the seed by default instead of seed=None
like in other functions?
H : Hypergraph | ||
Hypergraph | ||
k : int, optional | ||
Number of clusters to find. If unspecified, computes spectral gap. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work, the spectral gap is not an integer in general?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry I meant the location of the spectral gap. I will expand on this later this week with updates to the PR.
if k is None: | ||
raise NotImplementedError( | ||
"Choosing a number of clusters organically is currently unsupported. Please specify an integer value for paramater 'k'!" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say either we implement this k=None
default option, or we remove the default value and these lines (until it's maybe implemented).
"spectral_clustering", | ||
] | ||
|
||
MAX_ITERATIONS = 10_000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless this parameter is planned to be used in other functions, I'd define it in _kmeans(), the only function using it? Maybe even as a parameter of the function?
Thanks! @thomasrobiglio is probably best placed to review the method, I just made quick general comments more about the formatting |
Thank you @kaiser-dan! I will have a look over the week |
Sure thing, I will also be working on some better tests when I get a chance. Unclear when that will be, unfortunately. |
Summary
Partially addresses #604.
Add
xgi.communities
module.Add
spectral_clustering
method to cluster a given hypergraph into K-many groups.Description
Added a new module to begin resolving #604 and adding community detection capabilities.
Added a simple version of spectral clustering, clustering the given hypergraph into a specified number of communities based on a single K-means application of the hypergraph embedding given by the spectra as computed with
normalized_hypergraph_laplacian
- a heuristic suggested in [1].Notes
If the user is agnostic to the number of groups, there is reason to believe to spectral gap may give a nice default number of groups. However, I am unsure how to handle the case of a hypergraph having no discernible community structure. The spectral gap would be uninformative there. For this reason, I have not yet added that functionality and instead an error is raised if
k
is unspecified.Concerns
The tests do depend on a random seed being fixed - I am unsure how the underlying OS affects this. Furthermore, I am unsure if the tests I've written, specifically this one, are an ideal way to test the correctness of the clustering method.
References
[1] D. Zhou, J. Huang, and B. Schölkopf, “Learning with Hypergraphs: Clustering, Classification, and Embedding,” in Advances in Neural Information Processing Systems, MIT Press, 2006. Accessed: Nov. 10, 2023. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2006/hash/dff8e9c2ac33381546d96deea9922999-Abstract.html