Clustering with deep learning: Taxonomy and new methods #18

standing-o · 2023-03-29T09:02:27Z

Clustering with deep learning: Taxonomy and new methods

Authors : Aljalbout, Elie and Golkov, Vladimir and Siddiqui, Yawar and Strobel, Maximilian and Cremers, Daniel
Journal : arXiv
Year : 2018
Link : https://arxiv.org/pdf/1801.07648.pdf

Abstract

Clustering methods based on deep neural networks ➔ High representational power

Introduction

The main objective of clustering is to separate data into groups of similar data points.
The performance of current clustering methods is however highly dependent on the input data.
Dimensionality reduction and representation learning have been extensively used alongside clustering in order to map the input data into a feature space where separation is easier. By utilizing deep neural networks, it is possible to learn non-linear mappings that allow transforming data into more clustering-friendly representations without manual feature extraction/selection.
The general pipeline of most deep-learning-based clustering methods
- Auto-encoder is trained with the standard mean squared error reconstruction loss. The auto-encoder fine-tuned with a combined loss function consisting of the auto-encoder reconstruction loss and a clustering-specific loss.

Architecture of Neural networks

Multi-layer Perceptron (MLP)

Feedforward network. The output of every hidden layer is the input to next one

Convolutional Neural network (CNN)

Useful for applications to images, if locality and shift-equivariance/invariance of feature extraction is desired.

Deep belief network (DBN)

Generative graphical model, consisting of several layers of latent variable.
Composed of several shallow networks such as restricted Boltzmann machines, s.t. the hidden layer of each sub-network serves as the visible layer of the next sub-network.

Generative Adversarial network (GAN)

The generator learns a distribution of interest to produce samples. The discriminator learns to distinguish between real samples and generated ones.

Variational Auto encoder (VAE)

A Bayesian network with an autoencoder architecture that learns the data distribution (generative model).

Non-clustering loss

No Non-clustering loss

Non additional non-clustering loss function is used and the network model is only constrained by the clustering loss.
Danger of worse representations/results, or theoretically even collapsing clusters

Auto-encoder reconstruction loss

During training, the decoder tries to reconstruct input x from representation z in a latent space Z, making sure that useful information has not been lost by the encoding phase.
Auto-encoders can successfully learn useful representations in the cases where the output's dimensionality is different from the input's or when random noise is injected to the input.

Self-augmentation loss

T is the augmentation function, f(x) is the representation generated by the model and s is some measure of similarity.
A loss term pushes together the representation of the original sample and their augmentations.

Clustering loss

No Clustering loss

Even if a neural network has only non-clustering losses, the features it extracts can be used for clustering after training.

K-means loss

Data points are evenly distributed around the cluster centers. To obtain such a distribution:

Cluster Assignment Hardening

Requires using soft assignments of data points of clusters. Student's t-distribution can be used as the kernel to measure the similarity between points and centroids. This distribution Q:
The cluster assignment hardening loss enforces making these soft assignment probabilities sticter.
- Cluster assignment probability distribution Q approach an auxiliary distribution P which guarantees this constraint.
Formulate the divergence between the two probability distributions (ex. KL-divergence)

Balanced assignments loss

Enforce having balanced cluster assignments.

,where U is the uniform distribution and G is the probability distribution of assigning a point to each cluster:

Locality-preserving loss

,where N_k(i) is the set of k nearest neighbors of the data point x_i, and s(x_i,x_j) is a similarity measure between the point x_i and x_j.

Group Sparsity loss

The hidden units were divided into G groups, where G is the assumed number of clusters.

Cluster classification loss

Agglomerative clustering loss

Method to combine the losses

,where L_c(θ) is the clustering loss, L_n(θ) si the non-clustering loss, and α is in [0,1].

The following are methods to assign and schedule the values of α.
- Pre-training, fine-tuning : First, α is set to 0 (the network is trained using the non-clustering loss only). Subsequently, α is set to 1 (the non-clustering network branches are removed and the clustering loss is used to train the obtained network)
- Joint training : 0 < α < 1
- Variable schedule : α is varied during the training dependent on a chosen schedule.

Cluster updates

Jointly updated with the network model
Alternatingly updated with the network model

After network training

Clustering a similar dataset
Obtaining better result

Validation metrics

Accuracy (ACC), normalized mutual information (NMI) in [0,1]

standing-o added Survey Clustering labels Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering with deep learning: Taxonomy and new methods #18

Clustering with deep learning: Taxonomy and new methods #18

standing-o commented Mar 29, 2023 •

edited

Loading

Clustering with deep learning: Taxonomy and new methods #18

Clustering with deep learning: Taxonomy and new methods #18

Comments

standing-o commented Mar 29, 2023 • edited Loading

Clustering with deep learning: Taxonomy and new methods

Abstract

Introduction

Architecture of Neural networks

Multi-layer Perceptron (MLP)

Convolutional Neural network (CNN)

Deep belief network (DBN)

Generative Adversarial network (GAN)

Variational Auto encoder (VAE)

Non-clustering loss

No Non-clustering loss

Auto-encoder reconstruction loss

Self-augmentation loss

Clustering loss

No Clustering loss

K-means loss

Cluster Assignment Hardening

Balanced assignments loss

Locality-preserving loss

Group Sparsity loss

Cluster classification loss

Agglomerative clustering loss

Method to combine the losses

Cluster updates

After network training

Validation metrics

standing-o commented Mar 29, 2023 •

edited

Loading