language:
- en
library_name: timm
tags:
- vision
- image-classification
- vit
- anti-spoofing
- face-liveness
- celeba-spoof
- deep-learning
- pytorch
- huggingface
datasets:
- celeba-spoof
model_name: ViT-Base-Patch16-224 Face Anti-Spoofing (CelebA Spoof PDA)
license: mit
tasks:
- name: Face Anti-Spoofing
type: image-classification
inference: true
metrics:
- accuracy
- f1
- auc
- precision
- recall
- specificity
- far
- frr
- eer
model-index:
- name: ViT-Base-Patch16-224 Anti-Spoofing (CelebA Spoof PDA)
results:
- task:
type: image-classification
name: Face Anti-Spoofing
dataset:
name: CelebA Spoof (PDA Splits 19–21)
type: celeba-spoof
split: test
size: 1747
metrics:
- name: Accuracy
type: accuracy
value: 0.8329
- name: F1-score
type: f1
value: 0.878
- name: AUC-ROC
type: auc
value: 0.9561
- name: Precision (PPV)
type: precision
value: 0.7974
- name: Recall (TPR)
type: recall
value: 0.9768
- name: Specificity
type: specificity
value: 0.6021
- name: FAR (False Acceptance Rate)
type: far
value: 0.3979
- name: FRR (False Rejection Rate)
type: frr
value: 0.0232
- name: EER (Equal Error Rate)
type: eer
value: 0.1083
Vision Transformer for Face Anti-Spoofing (CelebA Spoof PDA)
This repository contains a fine-tuned Vision Transformer (ViT-Base-Patch16-224) model for face anti-spoofing on the CelebA Spoof (PDA) dataset.
The model was trained on the first 18 splits of the dataset and evaluated on splits 19–21, following the standard CelebA Spoof partitioning strategy.
Overview
The objective of this project is to develop a robust deep learning–based system capable of distinguishing live from spoofed faces in real-world conditions.
The model leverages the ViT architecture fine-tuned on GPU-augmented CelebA Spoof data with advanced training techniques, including:
- Focal Loss for class imbalance
- Threshold optimization
- Weighted regularization
- Early stopping
- Hyperparameter tuning (via W&B sweeps)
Dataset
Dataset: CelebA Spoof (PDA)
- Training splits: 1–18
- Testing splits: 19–21
- Classes: Binary classification (Live vs Spoof)
- Total test samples: 1,747
- Live: 1,076
- Spoof: 671
Data Augmentation Pipeline
The augmentation process was GPU-accelerated using Kornia and executed on an NVIDIA RTX A5000 (32 vCPU).
Augmentation was designed to improve model generalization across lighting, pose, and spoof mediums.
Augmentation strategy:
| Class | Augmentations per image | Techniques |
|---|---|---|
| Live | 8× | Random flip, rotation, color jitter, Gaussian blur/noise, perspective, elastic transform, sharpness adjustment |
| Spoof | 2× | Same set, applied with lower probability |
Core augmentation methods:
- Heavy, medium, and light pipelines (with variable transform intensity)
- GPU-based batch processing with Kornia
- Normalization aligned with ViT preprocessing (
mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
The complete augmentation logic is implemented in augument_data.py:contentReference[oaicite:0]{index=0}.
Model Architecture
The base model is a ViT-Base-Patch16-224, initialized with pretrained ImageNet weights and fine-tuned for binary classification.
A custom classification head was added:
LayerNorm(embed_dim) → Dropout(0.1) → Linear(512) → GELU → Dropout(0.1) → Linear(2)
Model configuration:
- Patch size: 16
- Dropout: 0.1
- Optimizer:
AdamW - Scheduler: Cosine Annealing with warm-up
- Batch size: 128
- Mixed precision: Enabled (AMP)
- Early stopping and F1-based checkpointing
The full training procedure is implemented in train_advanced.py.
Training Details
| Parameter | Value |
|---|---|
| Dataset | Augmented CelebA Spoof (Splits 1–18) |
| Optimizer | AdamW |
| Learning Rate | 3e-4 (swept) |
| Weight Decay | 0.05 |
| Batch Size | 128 |
| Epochs | 50 |
| Loss | Focal Loss (α=0.25, γ=2.0) |
| Early Stopping | Patience = 10, Δ = 0.001 |
| Threshold Optimization | Enabled |
| Scheduler | CosineAnnealingLR |
| Mixed Precision | True |
| Device | NVIDIA RTX A5000 |
Training and validation metrics were tracked using Weights & Biases for all runs.
Testing Procedure
Testing was conducted on splits 19–21, following the CelebA Spoof PDA protocol.
The testing pipeline (test.py) evaluates the model on per-image and per-subject levels, generating:
- Accuracy, F1, AUC
- Precision, Recall, Specificity, NPV
- FAR, FRR, and EER
- Confusion Matrix
- ROC Curve
Results and plots are automatically exported to disk during testing.
Results
Overall Performance
| Metric | Score |
|---|---|
| Accuracy | 83.29% |
| AUC-ROC | 0.9561 |
| F1-Score | 0.8780 |
Detection Metrics
| Metric | Value |
|---|---|
| Precision (PPV) | 0.7974 |
| Recall (TPR) | 0.9768 |
| Specificity | 0.6021 |
| NPV | 0.9417 |
Error Rates
| Metric | Value |
|---|---|
| False Acceptance Rate (FAR) | 0.3979 |
| False Rejection Rate (FRR) | 0.0232 |
| Equal Error Rate (EER) | 0.1083 |
Confusion Matrix
| Predicted Spoof | Predicted Live | |
|---|---|---|
| Actual Spoof | 404 | 267 |
| Actual Live | 25 | 1051 |