You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Clustering methods based on deep neural networks ➔ High representational power
Introduction
The main objective of clustering is to separate data into groups of similar data points.
The performance of current clustering methods is however highly dependent on the input data.
Dimensionality reduction and representation learning have been extensively used alongside clustering in order to map the input data into a feature space where separation is easier. By utilizing deep neural networks, it is possible to learn non-linear mappings that allow transforming data into more clustering-friendly representations without manual feature extraction/selection.
The general pipeline of most deep-learning-based clustering methods
Auto-encoder is trained with the standard mean squared error reconstruction loss. The auto-encoder fine-tuned with a combined loss function consisting of the auto-encoder reconstruction loss and a clustering-specific loss.
Architecture of Neural networks
Multi-layer Perceptron (MLP)
Feedforward network. The output of every hidden layer is the input to next one
Convolutional Neural network (CNN)
Useful for applications to images, if locality and shift-equivariance/invariance of feature extraction is desired.
Deep belief network (DBN)
Generative graphical model, consisting of several layers of latent variable.
Composed of several shallow networks such as restricted Boltzmann machines, s.t. the hidden layer of each sub-network serves as the visible layer of the next sub-network.
Generative Adversarial network (GAN)
The generator learns a distribution of interest to produce samples. The discriminator learns to distinguish between real samples and generated ones.
Variational Auto encoder (VAE)
A Bayesian network with an autoencoder architecture that learns the data distribution (generative model).
Non-clustering loss
No Non-clustering loss
Non additional non-clustering loss function is used and the network model is only constrained by the clustering loss.
Danger of worse representations/results, or theoretically even collapsing clusters
Auto-encoder reconstruction loss
During training, the decoder tries to reconstruct input x from representation z in a latent space Z, making sure that useful information has not been lost by the encoding phase.
Auto-encoders can successfully learn useful representations in the cases where the output's dimensionality is different from the input's or when random noise is injected to the input.
Self-augmentation loss
T is the augmentation function, f(x) is the representation generated by the model and s is some measure of similarity.
A loss term pushes together the representation of the original sample and their augmentations.
Clustering loss
No Clustering loss
Even if a neural network has only non-clustering losses, the features it extracts can be used for clustering after training.
K-means loss
Data points are evenly distributed around the cluster centers. To obtain such a distribution:
Cluster Assignment Hardening
Requires using soft assignments of data points of clusters. Student's t-distribution can be used as the kernel to measure the similarity between points and centroids. This distribution Q:
The cluster assignment hardening loss enforces making these soft assignment probabilities sticter.
Cluster assignment probability distribution Q approach an auxiliary distribution P which guarantees this constraint.
Formulate the divergence between the two probability distributions (ex. KL-divergence)
Balanced assignments loss
Enforce having balanced cluster assignments.
,where U is the uniform distribution and G is the probability distribution of assigning a point to each cluster:
Locality-preserving loss
,where Nk(i) is the set of k nearest neighbors of the data point xi, and s(xi,xj) is a similarity measure between the point xi and xj.
Group Sparsity loss
The hidden units were divided into G groups, where G is the assumed number of clusters.
Cluster classification loss
Agglomerative clustering loss
Method to combine the losses
,where Lc(θ) is the clustering loss, Ln(θ) si the non-clustering loss, and α is in [0,1].
The following are methods to assign and schedule the values of α.
Pre-training, fine-tuning : First, α is set to 0 (the network is trained using the non-clustering loss only). Subsequently, α is set to 1 (the non-clustering network branches are removed and the clustering loss is used to train the obtained network)
Joint training : 0 < α < 1
Variable schedule : α is varied during the training dependent on a chosen schedule.
Cluster updates
Jointly updated with the network model
Alternatingly updated with the network model
After network training
Clustering a similar dataset
Obtaining better result
Validation metrics
Accuracy (ACC), normalized mutual information (NMI) in [0,1]
The text was updated successfully, but these errors were encountered:
Clustering with deep learning: Taxonomy and new methods
Abstract
Introduction
Architecture of Neural networks
Multi-layer Perceptron (MLP)
Convolutional Neural network (CNN)
Deep belief network (DBN)
Generative Adversarial network (GAN)
Variational Auto encoder (VAE)
Non-clustering loss
No Non-clustering loss
Auto-encoder reconstruction loss
Self-augmentation loss
Clustering loss
No Clustering loss
K-means loss
Cluster Assignment Hardening
Balanced assignments loss
,where U is the uniform distribution and G is the probability distribution of assigning a point to each cluster:
Locality-preserving loss

,where Nk(i) is the set of k nearest neighbors of the data point xi, and s(xi,xj) is a similarity measure between the point xi and xj.Group Sparsity loss
Cluster classification loss
Agglomerative clustering loss
Method to combine the losses

,where Lc(θ) is the clustering loss, Ln(θ) si the non-clustering loss, and α is in [0,1].Cluster updates
After network training
Validation metrics
The text was updated successfully, but these errors were encountered: