You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To avoid extensive cost of collecting and annotating largescale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels.
General pipeline of Self-supervised learning (SSL)
ConvNets trained with pretext tasks can learn kernels that to capture low-level features
and high-level features that are helpful for other downstream tasks.
Term Definition
Pretext tasks: are pre-designed tasks for networks to solve, and visual features are learned by
learning objective functions of pretext tasks.
Downstream tasks: are computer vision applications that can be used to evaluate the quality of features learned by self-supervised learning. These applications can greatly benefit from the
pre-trained models when training data are scarce.
Pseudo label: The labels used in pretext task is referred as Pseudo labels which are generated based on the structure of data for pretext tasks.
Since no human annotations are needed to generate pseudo labels during selfsupervised training, a main advantage of self-supervised learning methods is that they can be easily scaled to large-scale datasets with very low cost.
Self-Supervised Learning
Formulation
SSL also trained with data Xi along with its pseudo label Pi while Pi is automatically generated for a pre-defined pretext task without involving any human annotation.
Given a set of N training data D = {Pi}Ni=0, the training loss function is defined as:
Architecture for learning image features
AlexNet, VGG, ResNet, GoogLeNet, DenseNet, RNN
Commonly used Pretext and Downstream tasks
Self-supervised visual feature learning schema.
The ConvNet is trained by minimizing errors between pseudo labels P and predictions O of the ConvNet. Since the pseudo labels are generated based on the structure of the data, no human annotations are involved during the whole process.
When choosing image classification as a downstream task to evaluate the quality of image features learned from selfsupervised learning methods, the self-supervised learned model is applied on each image to extract features which then are used to train a classifier such as SVM. The classification performance on testing data is compared with other self-supervised models to evaluate the quality of the learned features.
Qualitative Evaluation : qualitative visualization methods to evaluate the quality of self-supervised learning features
The performance of the transfer learning on these high-level vision tasks demonstrates the generalization ability of the learned features.
Image feature learning
Generation-based image feature learning
Image generation with GAN, with Inpainting, with Super resolution, with Colorization
Context-based image feature learning
Learning with context similarity, with Spatiakl context structure
Free semantic label-based image feature learning
Learning with labels generated by game engines, with labels generated by hard-code programs
Summary
Performance : comparable to the supervised methods on some downstream tasks
Reproducibility : most of the networks use AlexNet as a base network to pre-train on ImageNet dataset
and then evaluate on same downstream tasks for quality evaluation.
Evaluation Metrics : Another fact is that more evaluation metrics are needed to evaluate the quality of the learned features in different levels. The current solution is to use the performance on downstream tasks to indicate the quality of the features.
The text was updated successfully, but these errors were encountered:
Self-supervised visual feature learning with deep neural networks: A survey
Abstract
Introduction
Self-supervised learning (SSL)
and high-level features that are helpful for other downstream tasks.
learning objective functions of pretext tasks.
pre-trained models when training data are scarce.
Self-Supervised Learning
SSL
also trained with data Xi along with its pseudo label Pi while Pi is automatically generated for a pre-defined pretext task without involving any human annotation.Commonly used Pretext and Downstream tasks
Self-supervised visual feature learning schema.

The ConvNet is trained by minimizing errors between pseudo labels P and predictions O of the ConvNet. Since the pseudo labels are generated based on the structure of the data, no human annotations are involved during the whole process.
Pretext Task
Downstream Task
Image feature learning
Generation-based image feature learning
Context-based image feature learning
Free semantic label-based image feature learning
Summary
and then evaluate on same downstream tasks for quality evaluation.
The text was updated successfully, but these errors were encountered: