Feature/spectral clustering #665

kaiser-dan · 2025-02-14T03:56:23Z

Summary

Partially addresses #604.

Add xgi.communities module.
Add spectral_clustering method to cluster a given hypergraph into K-many groups.

Description

Added a new module to begin resolving #604 and adding community detection capabilities.
Added a simple version of spectral clustering, clustering the given hypergraph into a specified number of communities based on a single K-means application of the hypergraph embedding given by the spectra as computed with normalized_hypergraph_laplacian - a heuristic suggested in [1].

Notes

If the user is agnostic to the number of groups, there is reason to believe to spectral gap may give a nice default number of groups. However, I am unsure how to handle the case of a hypergraph having no discernible community structure. The spectral gap would be uninformative there. For this reason, I have not yet added that functionality and instead an error is raised if k is unspecified.

Concerns

The tests do depend on a random seed being fixed - I am unsure how the underlying OS affects this. Furthermore, I am unsure if the tests I've written, specifically this one, are an ideal way to test the correctness of the clustering method.

References

[1] D. Zhou, J. Huang, and B. Schölkopf, “Learning with Hypergraphs: Clustering, Classification, and Embedding,” in Advances in Neural Information Processing Systems, MIT Press, 2006. Accessed: Nov. 10, 2023. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2006/hash/dff8e9c2ac33381546d96deea9922999-Abstract.html

Add module boilerplate. Add `_kmeans` function with naive signature. Add test class and trivial clustering test case.

Add `spectral_clustering` function. Add test class and test overabundant cluster exception.

Add `numpy.random.default_rng` to `_kmeans` for random number sampling. Fixes bug with test cases where clusters could get merged from rare random conditions.

Add perfectly separable `spectral_clustering` test.

xgi/communities/spectral.py

maximelucas · 2025-02-17T08:56:02Z

xgi/communities/spectral.py

+    return clusters
+
+
+def _kmeans(X, k, seed=37):


Add docstring. Also, is there a reason to specify the seed by default instead of seed=None like in other functions?

xgi/communities/spectral.py

maximelucas · 2025-02-17T08:58:14Z

xgi/communities/spectral.py

+    H : Hypergraph
+        Hypergraph
+    k : int, optional
+        Number of clusters to find. If unspecified, computes spectral gap.


How would this work, the spectral gap is not an integer in general?

Ah, sorry I meant the location of the spectral gap. I will expand on this later this week with updates to the PR.

maximelucas · 2025-02-17T09:00:12Z

xgi/communities/spectral.py

+    if k is None:
+        raise NotImplementedError(
+            "Choosing a number of clusters organically is currently unsupported. Please specify an integer value for paramater 'k'!"
+        )


I'd say either we implement this k=None default option, or we remove the default value and these lines (until it's maybe implemented).

maximelucas · 2025-02-17T09:01:21Z

xgi/communities/spectral.py

+    "spectral_clustering",
+]
+
+MAX_ITERATIONS = 10_000


Unless this parameter is planned to be used in other functions, I'd define it in _kmeans(), the only function using it? Maybe even as a parameter of the function?

maximelucas · 2025-02-17T09:02:34Z

Thanks! @thomasrobiglio is probably best placed to review the method, I just made quick general comments more about the formatting

thomasrobiglio · 2025-02-17T09:23:12Z

Thank you @kaiser-dan! I will have a look over the week

kaiser-dan · 2025-02-17T14:49:01Z

Thank you @kaiser-dan! I will have a look over the week

Sure thing, I will also be working on some better tests when I get a chance. Unclear when that will be, unfortunately.

kaiser-dan added 2 commits February 13, 2025 22:49

feat: add kmeans skeleton

a30dfcf

Add module boilerplate. Add `_kmeans` function with naive signature. Add test class and trivial clustering test case.

feat: add spectral clustering skeleton

ebfec54

Add `spectral_clustering` function. Add test class and test overabundant cluster exception.

kaiser-dan mentioned this pull request Feb 14, 2025

Add community detection algorithms #604

Open

kaiser-dan added 10 commits February 14, 2025 18:48

refactor: rename commdetect to communities

7ab53aa

refactor: change _kmeans return type to dict

7928d1c

test(kmeans): add simple unit tests

cf3ab50

test(kmeans): add perfectly separable unit tests

b930ef7

feat(kmeans): implement kmeans

2f323ff

refactor(kmeans): add numpy rng, fix seed

15b4e37

Add `numpy.random.default_rng` to `_kmeans` for random number sampling. Fixes bug with test cases where clusters could get merged from rare random conditions.

test(spectral): add spectral clustering test

20f1371

Add perfectly separable `spectral_clustering` test.

fix(spectral): fix node indexing in cluster return

a423281

test(spectral): add spectral clustering test

ac3ac4a

doc: Add spectral_clustering docstring

bca61bc

kaiser-dan marked this pull request as ready for review February 17, 2025 00:34

maximelucas reviewed Feb 17, 2025

View reviewed changes

xgi/communities/spectral.py Show resolved Hide resolved

maximelucas reviewed Feb 17, 2025

View reviewed changes

xgi/communities/spectral.py Show resolved Hide resolved

maximelucas reviewed Feb 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/spectral clustering #665

Feature/spectral clustering #665

kaiser-dan commented Feb 14, 2025 •

edited

Loading

maximelucas Feb 17, 2025

maximelucas Feb 17, 2025 •

edited

Loading

kaiser-dan Feb 17, 2025

maximelucas Feb 17, 2025

maximelucas Feb 17, 2025

maximelucas commented Feb 17, 2025

thomasrobiglio commented Feb 17, 2025

kaiser-dan commented Feb 17, 2025

Feature/spectral clustering #665

Are you sure you want to change the base?

Feature/spectral clustering #665

Conversation

kaiser-dan commented Feb 14, 2025 • edited Loading

Summary

Description

Notes

Concerns

References

maximelucas Feb 17, 2025

Choose a reason for hiding this comment

maximelucas Feb 17, 2025 • edited Loading

Choose a reason for hiding this comment

kaiser-dan Feb 17, 2025

Choose a reason for hiding this comment

maximelucas Feb 17, 2025

Choose a reason for hiding this comment

maximelucas Feb 17, 2025

Choose a reason for hiding this comment

maximelucas commented Feb 17, 2025

thomasrobiglio commented Feb 17, 2025

kaiser-dan commented Feb 17, 2025

kaiser-dan commented Feb 14, 2025 •

edited

Loading

maximelucas Feb 17, 2025 •

edited

Loading