Skip to content

Latest commit

 

History

History
184 lines (146 loc) · 10.2 KB

README.md

File metadata and controls

184 lines (146 loc) · 10.2 KB

Face Transformer - Rethinking model incorporating EfficientNet into ViT

Recently there has been great interests of Transformer not only in NLP but also in Computer Vision (CV). We wonder if transformer can be used in face recognition by incorporating EfficientNet into ViT and whether it is better than CNNs. Therefore, we investigate the performance of Transformer models in face recognition. The models are trained on a large scale face recognition database Casia-Webface and evaluated on several mainstream benchmarks, including LFW, SLLFW, CALFW, CPLFW, TALFW, CFP-FP & AGEDB databases. We demonstrate that Transformer models achieve comparable performance as CNN with similar number of parameters and MACs. The Face-Transformer mainly uses ViT (Vision Transformer) architecture. Now we demonstrate if we can transfer learn and fine-tune the model with EfficientNet & merge it into ViT to get a better results.

arch

Objectives

  • To learn a representation of face images that is invariant to variations in lighting, pose, and expression.

  • To achieve state-of-the-art results on face recognition benchmarks by fine-tuning with EfficientNet and introduce the model into ViT.

  • To be robust to variations in the quality of the input images by evaluating LFW, SLLFW, CALFW, CPLFW, TALFW, CFP-FP & AGEDB evaluation databases.

  • To make it efficient in terms of computational cost and memory.

Model Architecture

Model Architechture

Usage Instructions

1. Preparation

This code is mainly adopted from Vision Transformer, DeiT & Face Evolve. In addition to PyTorch and torchvision, install vit_pytorch by Phil Wang, efficientnet_pytorch by Luke Melas-Kyriazi & package timm by Ross Wightman. Sincerely appreciate for their contributions.

All needed Packages are found in requirements.txt. Simply install all packages by:

pip install -r requirements.txt

Files of vit_pytorch folder.

.
├── __init__.py
├── vit.py
├── vit_face.py
└── vits_face.py

Files of util folder.

.
├── __init__.py
├── test.py
├── utils.py
└── verification.py

2. Databases

  • You can download the training databases, CASIA-Webface (version - casia-webface), and put it in folder Data.

    Dataset Baidu Netdisk Password Google Drive Onedrive Website GitHub
    ms1m-retinaface LINK 4ouw LINK
    CASIA-Webface LINK LINK
    UMDFace LINK LINK
    VGG2 LINK LINK
    MS1M-IBUG LINK
    MS1M-ArcFace LINK LINK
    MS1M-RetinaFace LINK 8eb3 LINK
    Asian-Celeb LINK
    Glint-Mini LINK 10m5
    Glint360K LINK o3az
    DeepGlint LINK
    WebFace260M LINK
    IMDB-Face
    Celeb500k
    MegaFace LINK 5f8m LINK
    DigiFace-1M LINK LINK
  • You can download the testing databases as follows and put them in folder eval.

    Dataset Baidu Netdisk Password Google Drive
    LFW LINK dfj0 LINK
    SLLFW LINK l1z6 LINK
    CALFW LINK vvqe LINK
    CPLFW LINK jyp9 LINK
    TALFW LINK izrg LINK
    CFP_FP LINK 4fem LINK
    AGEDB LINK rlqf LINK

    refers to Insightface

  • Quick Links

    Dataset Folder Google Drive Kaggle
    casia-webface Data LINK LINK
    agedb_30, calfw, cfp_ff, cfp_fp, cplfw, lfw, sllfw, talfw eval LINK

3. Train Models

  • EfficientNet + ViT

    CUDA_VISIBLE_DEVICES='0' python3 -u train.py -b <batch_size> -w 0 -d casia -n <network_name> -head CosFace --outdir <path_to_model> --warmup-epochs 0 --lr 3e-5 -r <path_to_model>

4. Pretrained Models and Test Models (on LFW, SLLFW, CALFW, CPLFW, TALFW, CFP_FP, AGEDB)

You can download the following models -

Model Google Drive
ViT-P8S8 LINK
EfficientNet + ViT LINK

You can test Models -

The content of property file for casia-webface dataset is as follows: $10572, 112, 112$

python3 test.py --model <path_to_model> --network <network_name> --batch_size <batch_size> --target <eval_data>

References

This is the research paper of Face Transformer for Recognition [LINK], forked from zhongyy/Face-Transformer.