Training with Hugging Face Jobs

Hugging Face Jobs provides fully managed cloud infrastructure for training models without the hassle of setting up GPUs, managing dependencies, or configuring environments locally. This is particularly valuable for SFT training, which can be resource-intensive and time-consuming.

Why Use Jobs for SFT Training?

Scalable Infrastructure: Access to high-end GPUs (A100, L4, etc.) without hardware investment
Zero Setup: No need to manage CUDA drivers, Docker containers, or environment configurations
Cost Effective: Pay only for compute time used, with automatic shutdown after completion
Integrated Workflow: Seamless integration with Hugging Face Hub for model storage and sharing
Monitoring: Built-in logging and progress tracking through the Hub interface

Requirements

To use Hugging Face Jobs, you need:

A Pro, Team, or Enterprise Hugging Face plan which you can get here
Authentication via hf auth login

Running SFT with Jobs: Two Approaches

The best way to run TRL with HF jobs is using the built-in scripts. They take advantage of uv to manage dependencies and hf jobs to run the training job.

This guide will walk you through using TRL’s built-in scripts to train a model with Hugging Face Jobs. If you want to use a custom script, you can implement uv dependencies and run the script with hf jobs run.

Create a custom training script with inline dependencies

# sft_training.py
# /// script
# dependencies = [
#     "trl[sft]>=0.7.0",
#     "transformers>=4.36.0", 
#     "datasets>=2.14.0",
#     "accelerate>=0.24.0",
#     "peft>=0.7.0"
# ]
# ///

from trl import SFTTrainer, SFTConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B-Base")
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B-Base")

# Load dataset
dataset = load_dataset("HuggingFaceTB/smoltalk2", "SFT")

# Configure training
config = SFTConfig(
    output_dir="./smollm3-jobs-sft",
    per_device_train_batch_size=4,
    learning_rate=5e-5,
    max_steps=1000,
    logging_steps=50,
    save_steps=200,
    push_to_hub=True,
    hub_model_id="your-username/smollm3-jobs-sft"
)

# Train
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["smoltalk_everyday_convs_reasoning_Qwen3_32B_think"],
    args=config,
)
trainer.train()

Then run with the Jobs CLI:

# Run the UV script on Jobs
hf jobs uv run \
    --flavor a10g-large \
    --timeout 2h \
    --secrets HF_TOKEN \
    sft_training.py

Hardware Selection for SFT

Choose the right hardware flavor based on your model size and training requirements:

For SmolLM3-3B (Recommended):

a10g-large: 24GB GPU memory, cost-effective for most SFT tasks
a100-large: 40GB GPU memory, fastest training with larger batch sizes
l4x1: 24GB GPU memory, multi-GPU setup for distributed training

For Larger Models (7B+):

a100-large: Required for 7B+ models
l4x4: Multi-GPU setup for distributed training

Budget Options:

t4-small: 16GB GPU memory, slower but economical for experimentation
l4x1: 24GB GPU memory, good balance of cost and performance

For a detailed comparison of the different hardware flavors, you can check out the Pricing Page page.

Advanced Jobs Configuration

# Use TRL's maintained SFT script directly
hf jobs uv run \
    --flavor a10g-large \
    --timeout 2h \
    --secrets HF_TOKEN \
    "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py" \
    --model_name_or_path HuggingFaceTB/SmolLM3-3B-Base \
    --dataset_name HuggingFaceTB/smoltalk2_everyday_convs_think \
    --learning_rate 5e-5 \
    --per_device_train_batch_size 4 \
    --max_steps 1000 \
    --output_dir smollm3-sft-jobs \
    --push_to_hub \
    --hub_model_id your-username/smollm3-sft \
    --report_to trackio

Environment Variables and Secrets:

If you’re working with a custom script, you can use the --secrets flag to pass in environment variables.

hf jobs uv run \
    --flavor a10g-large \
    --timeout 3h \
    --secrets HF_TOKEN=your_token \
    --secrets WANDB_API_KEY=your_wandb_key \
    --env WANDB_PROJECT=smollm3-sft \
    --env CUDA_VISIBLE_DEVICES=0 \
    my_sft_training.py

Monitoring Your Training Job

To check you training job, you can use the hf jobs command or you can go to Job Settings on the Hub.

Check Job Status:

# List all jobs
hf jobs ps -a

# Get detailed job information  
hf jobs inspect <job_id>

# Stream job logs in real-time
hf jobs logs <job_id> --follow

# Cancel a running job if needed
hf jobs cancel <job_id>

LoRA/PEFT on Jobs (optional)

Enable LoRA when using TRL’s maintained SFT script by passing PEFT flags. See the script for authoritative flags and defaults: https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py.

hf jobs uv run \
  --flavor a10g-large \
  --timeout 2h \
  --secrets HF_TOKEN \
  "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py" \
  --model_name_or_path HuggingFaceTB/SmolLM3-3B-Base \
  --dataset_name HuggingFaceTB/smoltalk2_everyday_convs_think \
  --output_dir smollm3-lora-sft-jobs \
  --per_device_train_batch_size 4 \
  --learning_rate 5e-5 \
  --max_steps 1000 \
  --report_to trackio \
  --push_to_hub \
  --hub_model_id your-username/smollm3-lora-sft \
  --use_peft \
  --lora_r 8 \
  --lora_alpha 16 \
  --lora_dropout 0.05 \
  --lora_target_modules all-linear

Notes:

Confirm flag names in the TRL SFT script before running: https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py.
LoRA trains small adapters, which you can keep separate or merge later for deployment.

Monitoring with Trackio

You can monitor your training job with Trackio.

Trackio Monitoring

Cost Estimation

Approximate costs for SmolLM3-3B SFT training (1000 steps):

l4x1: ~$3-4 per hour (24GB GPU memory)
a10g-large: ~$4-6 per hour (24GB GPU memory)
a100-large: ~$8-12 per hour (40GB GPU memory)

Training typically takes 30-90 minutes for 1000 steps depending on hardware and configuration, making Jobs cost-effective compared to local GPU rental or cloud instances.

Cost-Saving Tips:

Use smaller batch sizes with gradient accumulation to fit on cheaper GPUs
Start with shorter training runs (500 steps) to validate your setup
Use l4x1 for initial experiments, then scale to faster GPUs for production
Set appropriate timeouts to avoid unexpected charges

Troubleshooting Common Issues

Out of Memory Errors:

Reduce per_device_train_batch_size
Enable gradient checkpointing
Use smaller max_length

Timeout Issues:

Increase timeout parameter
Reduce training steps or use more powerful hardware
Optimize data loading and preprocessing

Authentication Errors:

Ensure HF_TOKEN is correctly set as a secret
Verify your Hugging Face account has the required plan
Check token permissions for model uploads

Resources and Further Reading

Hugging Face Jobs Documentation - Complete Jobs guide
TRL Jobs Training Guide - TRL-specific Jobs examples
Jobs Pricing - Current pricing for different hardware flavors
Jobs CLI Reference - Command-line interface details

Update on GitHub

a smol course

Training with Hugging Face Jobs

Why Use Jobs for SFT Training?

Requirements

Running SFT with Jobs: Two Approaches

Hardware Selection for SFT

Advanced Jobs Configuration

Monitoring Your Training Job

LoRA/PEFT on Jobs (optional)

Monitoring with Trackio

Cost Estimation

Troubleshooting Common Issues

Resources and Further Reading