Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering with deep learning: Taxonomy and new methods #18

Open
standing-o opened this issue Mar 29, 2023 · 0 comments
Open

Clustering with deep learning: Taxonomy and new methods #18

standing-o opened this issue Mar 29, 2023 · 0 comments

Comments

@standing-o
Copy link
Owner

standing-o commented Mar 29, 2023

Clustering with deep learning: Taxonomy and new methods

  • Authors : Aljalbout, Elie and Golkov, Vladimir and Siddiqui, Yawar and Strobel, Maximilian and Cremers, Daniel
  • Journal : arXiv
  • Year : 2018
  • Link : https://arxiv.org/pdf/1801.07648.pdf

Abstract

  • Clustering methods based on deep neural networks ➔ High representational power

Introduction

  • The main objective of clustering is to separate data into groups of similar data points.
  • The performance of current clustering methods is however highly dependent on the input data.
  • Dimensionality reduction and representation learning have been extensively used alongside clustering in order to map the input data into a feature space where separation is easier. By utilizing deep neural networks, it is possible to learn non-linear mappings that allow transforming data into more clustering-friendly representations without manual feature extraction/selection.
  • The general pipeline of most deep-learning-based clustering methods
    • Auto-encoder is trained with the standard mean squared error reconstruction loss. The auto-encoder fine-tuned with a combined loss function consisting of the auto-encoder reconstruction loss and a clustering-specific loss.

Architecture of Neural networks

Multi-layer Perceptron (MLP)

  • Feedforward network. The output of every hidden layer is the input to next one

Convolutional Neural network (CNN)

  • Useful for applications to images, if locality and shift-equivariance/invariance of feature extraction is desired.

Deep belief network (DBN)

  • Generative graphical model, consisting of several layers of latent variable.
  • Composed of several shallow networks such as restricted Boltzmann machines, s.t. the hidden layer of each sub-network serves as the visible layer of the next sub-network.

Generative Adversarial network (GAN)

  • The generator learns a distribution of interest to produce samples. The discriminator learns to distinguish between real samples and generated ones.

Variational Auto encoder (VAE)

  • A Bayesian network with an autoencoder architecture that learns the data distribution (generative model).

Non-clustering loss

No Non-clustering loss

  • Non additional non-clustering loss function is used and the network model is only constrained by the clustering loss.
  • Danger of worse representations/results, or theoretically even collapsing clusters

Auto-encoder reconstruction loss

  • During training, the decoder tries to reconstruct input x from representation z in a latent space Z, making sure that useful information has not been lost by the encoding phase.
  • Auto-encoders can successfully learn useful representations in the cases where the output's dimensionality is different from the input's or when random noise is injected to the input.

Self-augmentation loss

  • T is the augmentation function, f(x) is the representation generated by the model and s is some measure of similarity.
  • A loss term pushes together the representation of the original sample and their augmentations.

Clustering loss

No Clustering loss

  • Even if a neural network has only non-clustering losses, the features it extracts can be used for clustering after training.

K-means loss

  • Data points are evenly distributed around the cluster centers. To obtain such a distribution:

Cluster Assignment Hardening

  • Requires using soft assignments of data points of clusters. Student's t-distribution can be used as the kernel to measure the similarity between points and centroids. This distribution Q:
  • The cluster assignment hardening loss enforces making these soft assignment probabilities sticter.
    • Cluster assignment probability distribution Q approach an auxiliary distribution P which guarantees this constraint.
  • Formulate the divergence between the two probability distributions (ex. KL-divergence)

Balanced assignments loss

  • Enforce having balanced cluster assignments.

    ,where U is the uniform distribution and G is the probability distribution of assigning a point to each cluster:

Locality-preserving loss

,where Nk(i) is the set of k nearest neighbors of the data point xi, and s(xi,xj) is a similarity measure between the point xi and xj.

Group Sparsity loss

  • The hidden units were divided into G groups, where G is the assumed number of clusters.

Cluster classification loss

Agglomerative clustering loss

Method to combine the losses

,where Lc(θ) is the clustering loss, Ln(θ) si the non-clustering loss, and α is in [0,1].
  • The following are methods to assign and schedule the values of α.
    • Pre-training, fine-tuning : First, α is set to 0 (the network is trained using the non-clustering loss only). Subsequently, α is set to 1 (the non-clustering network branches are removed and the clustering loss is used to train the obtained network)
    • Joint training : 0 < α < 1
    • Variable schedule : α is varied during the training dependent on a chosen schedule.

Cluster updates

  • Jointly updated with the network model
  • Alternatingly updated with the network model

After network training

  • Clustering a similar dataset
  • Obtaining better result

Validation metrics

  • Accuracy (ACC), normalized mutual information (NMI) in [0,1]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant