Vision Transformer for Face Anti-Spoofing (CelebA Spoof PDA)

This repository contains a fine-tuned Vision Transformer (ViT-Base-Patch16-224) model for face anti-spoofing on the CelebA Spoof (PDA) dataset.
The model was trained on the first 18 splits of the dataset and evaluated on splits 19–21, following the standard CelebA Spoof partitioning strategy.


Overview

The objective of this project is to develop a robust deep learning–based system capable of distinguishing live from spoofed faces in real-world conditions.
The model leverages the ViT architecture fine-tuned on GPU-augmented CelebA Spoof data with advanced training techniques, including:

  • Focal Loss for class imbalance
  • Threshold optimization
  • Weighted regularization
  • Early stopping
  • Hyperparameter tuning (via W&B sweeps)

Dataset

Dataset: CelebA Spoof (PDA)

  • Training splits: 1–18
  • Testing splits: 19–21
  • Classes: Binary classification (Live vs Spoof)
  • Total test samples: 1,747
    • Live: 1,076
    • Spoof: 671

Data Augmentation Pipeline

The augmentation process was GPU-accelerated using Kornia and executed on an NVIDIA RTX A5000 (32 vCPU).
Augmentation was designed to improve model generalization across lighting, pose, and spoof mediums.

Augmentation strategy:

Class Augmentations per image Techniques
Live Random flip, rotation, color jitter, Gaussian blur/noise, perspective, elastic transform, sharpness adjustment
Spoof Same set, applied with lower probability

Core augmentation methods:

  • Heavy, medium, and light pipelines (with variable transform intensity)
  • GPU-based batch processing with Kornia
  • Normalization aligned with ViT preprocessing (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

The complete augmentation logic is implemented in augument_data.py:contentReference[oaicite:0]{index=0}.


Model Architecture

The base model is a ViT-Base-Patch16-224, initialized with pretrained ImageNet weights and fine-tuned for binary classification.
A custom classification head was added:

LayerNorm(embed_dim) → Dropout(0.1) → Linear(512) → GELU → Dropout(0.1) → Linear(2)

Model configuration:

  • Patch size: 16
  • Dropout: 0.1
  • Optimizer: AdamW
  • Scheduler: Cosine Annealing with warm-up
  • Batch size: 128
  • Mixed precision: Enabled (AMP)
  • Early stopping and F1-based checkpointing

The full training procedure is implemented in train_advanced.py.


Training Details

Parameter Value
Dataset Augmented CelebA Spoof (Splits 1–18)
Optimizer AdamW
Learning Rate 3e-4 (swept)
Weight Decay 0.05
Batch Size 128
Epochs 50
Loss Focal Loss (α=0.25, γ=2.0)
Early Stopping Patience = 10, Δ = 0.001
Threshold Optimization Enabled
Scheduler CosineAnnealingLR
Mixed Precision True
Device NVIDIA RTX A5000

Training and validation metrics were tracked using Weights & Biases for all runs.


Testing Procedure

Testing was conducted on splits 19–21, following the CelebA Spoof PDA protocol. The testing pipeline (test.py) evaluates the model on per-image and per-subject levels, generating:

  • Accuracy, F1, AUC
  • Precision, Recall, Specificity, NPV
  • FAR, FRR, and EER
  • Confusion Matrix
  • ROC Curve

Results and plots are automatically exported to disk during testing.


Results

Overall Performance

Metric Score
Accuracy 83.29%
AUC-ROC 0.9561
F1-Score 0.8780

Detection Metrics

Metric Value
Precision (PPV) 0.7974
Recall (TPR) 0.9768
Specificity 0.6021
NPV 0.9417

Error Rates

Metric Value
False Acceptance Rate (FAR) 0.3979
False Rejection Rate (FRR) 0.0232
Equal Error Rate (EER) 0.1083

Confusion Matrix

Predicted Spoof Predicted Live
Actual Spoof 404 267
Actual Live 25 1051
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results