Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/spectral clustering #665

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kaiser-dan
Copy link
Contributor

@kaiser-dan kaiser-dan commented Feb 14, 2025

Summary


Partially addresses #604.

Add xgi.communities module.
Add spectral_clustering method to cluster a given hypergraph into K-many groups.

Description


Added a new module to begin resolving #604 and adding community detection capabilities.
Added a simple version of spectral clustering, clustering the given hypergraph into a specified number of communities based on a single K-means application of the hypergraph embedding given by the spectra as computed with normalized_hypergraph_laplacian - a heuristic suggested in [1].

Notes


If the user is agnostic to the number of groups, there is reason to believe to spectral gap may give a nice default number of groups. However, I am unsure how to handle the case of a hypergraph having no discernible community structure. The spectral gap would be uninformative there. For this reason, I have not yet added that functionality and instead an error is raised if k is unspecified.

Concerns


The tests do depend on a random seed being fixed - I am unsure how the underlying OS affects this. Furthermore, I am unsure if the tests I've written, specifically this one, are an ideal way to test the correctness of the clustering method.

References


[1] D. Zhou, J. Huang, and B. Schölkopf, “Learning with Hypergraphs: Clustering, Classification, and Embedding,” in Advances in Neural Information Processing Systems, MIT Press, 2006. Accessed: Nov. 10, 2023. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2006/hash/dff8e9c2ac33381546d96deea9922999-Abstract.html

Add module boilerplate.
Add `_kmeans` function with naive signature.
Add test class and trivial clustering test case.
Add `spectral_clustering` function.
Add test class and test overabundant cluster exception.
@kaiser-dan kaiser-dan marked this pull request as ready for review February 17, 2025 00:34
return clusters


def _kmeans(X, k, seed=37):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add docstring. Also, is there a reason to specify the seed by default instead of seed=None like in other functions?

H : Hypergraph
Hypergraph
k : int, optional
Number of clusters to find. If unspecified, computes spectral gap.
Copy link
Collaborator

@maximelucas maximelucas Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work, the spectral gap is not an integer in general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry I meant the location of the spectral gap. I will expand on this later this week with updates to the PR.

Comment on lines +46 to +49
if k is None:
raise NotImplementedError(
"Choosing a number of clusters organically is currently unsupported. Please specify an integer value for paramater 'k'!"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say either we implement this k=None default option, or we remove the default value and these lines (until it's maybe implemented).

"spectral_clustering",
]

MAX_ITERATIONS = 10_000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this parameter is planned to be used in other functions, I'd define it in _kmeans(), the only function using it? Maybe even as a parameter of the function?

@maximelucas
Copy link
Collaborator

Thanks! @thomasrobiglio is probably best placed to review the method, I just made quick general comments more about the formatting

@thomasrobiglio
Copy link
Collaborator

Thank you @kaiser-dan! I will have a look over the week

@kaiser-dan
Copy link
Contributor Author

Thank you @kaiser-dan! I will have a look over the week

Sure thing, I will also be working on some better tests when I get a chance. Unclear when that will be, unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants