Recently there has been great interests of Transformer not only in NLP but also in Computer Vision (CV). We wonder if transformer can be used in face recognition by incorporating EfficientNet into ViT and whether it is better than CNNs. Therefore, we investigate the performance of Transformer models in face recognition. The models are trained on a large scale face recognition database Casia-Webface
and evaluated on several mainstream benchmarks, including LFW
, SLLFW
, CALFW
, CPLFW
, TALFW
, CFP-FP
& AGEDB
databases. We demonstrate that Transformer models achieve comparable performance as CNN with similar number of parameters and MACs. The Face-Transformer mainly uses ViT (Vision Transformer) architecture. Now we demonstrate if we can transfer learn and fine-tune the model with EfficientNet & merge it into ViT to get a better results.
-
To learn a representation of face images that is invariant to variations in lighting, pose, and expression.
-
To achieve state-of-the-art results on face recognition benchmarks by fine-tuning with EfficientNet and introduce the model into ViT.
-
To be robust to variations in the quality of the input images by evaluating
LFW
,SLLFW
,CALFW
,CPLFW
,TALFW
,CFP-FP
&AGEDB
evaluation databases. -
To make it efficient in terms of computational cost and memory.
This code is mainly adopted from Vision Transformer, DeiT & Face Evolve. In addition to PyTorch
and torchvision
, install vit_pytorch
by Phil Wang, efficientnet_pytorch
by Luke Melas-Kyriazi & package timm
by Ross Wightman. Sincerely appreciate for their contributions.
All needed Packages are found in requirements.txt
. Simply install all packages by:
pip install -r requirements.txt
Files of vit_pytorch
folder.
.
├── __init__.py
├── vit.py
├── vit_face.py
└── vits_face.py
Files of util
folder.
.
├── __init__.py
├── test.py
├── utils.py
└── verification.py
-
You can download the training databases, CASIA-Webface (version - casia-webface), and put it in folder
Data
.Dataset Baidu Netdisk Password Google Drive Onedrive Website GitHub ms1m-retinaface
LINK 4ouw
LINK CASIA-Webface
LINK LINK UMDFace
LINK LINK VGG2
LINK LINK MS1M-IBUG
LINK MS1M-ArcFace
LINK LINK MS1M-RetinaFace
LINK 8eb3
LINK Asian-Celeb
LINK Glint-Mini
LINK 10m5
Glint360K
LINK o3az
DeepGlint
LINK WebFace260M
LINK IMDB-Face
Celeb500k
MegaFace
LINK 5f8m
LINK DigiFace-1M
LINK LINK -
You can download the testing databases as follows and put them in folder
eval
.Dataset Baidu Netdisk Password Google Drive LFW
LINK dfj0
LINK SLLFW
LINK l1z6
LINK CALFW
LINK vvqe
LINK CPLFW
LINK jyp9
LINK TALFW
LINK izrg
LINK CFP_FP
LINK 4fem
LINK AGEDB
LINK rlqf
LINK refers to Insightface
-
Dataset Folder Google Drive Kaggle casia-webface
Data LINK LINK agedb_30
,calfw
,cfp_ff
,cfp_fp
,cplfw
,lfw
,sllfw
,talfw
eval LINK
-
EfficientNet + ViT
CUDA_VISIBLE_DEVICES='0' python3 -u train.py -b <batch_size> -w 0 -d casia -n <network_name> -head CosFace --outdir <path_to_model> --warmup-epochs 0 --lr 3e-5 -r <path_to_model>
Model | Google Drive |
---|---|
ViT-P8S8 |
LINK |
EfficientNet + ViT |
LINK |
The content of property
file for casia-webface
dataset is as follows:
python3 test.py --model <path_to_model> --network <network_name> --batch_size <batch_size> --target <eval_data>
This is the research paper of Face Transformer for Recognition [LINK], forked from zhongyy/Face-Transformer
.