CatkinChen/nethack-ppo-ablation-baseline
This repository contains a complete Sequential Skill RL model trained on NetHack/MiniHack environments.
Model Components
1. PPO Policy (ppo_policy.pth)
- Type: Proximal Policy Optimization agent
- Environment: MiniHack-Room-5x5-v0
- Training Steps: 50,000
- Features:
- Curiosity-driven exploration: True
- Random Network Distillation: False
2. VAE Model (vae_model.pth)
- Type: Variational Autoencoder
- Purpose: Encodes NetHack observations into latent skill representations
- Latent Dimension: 96
- Architecture: Unknown
3. HMM Model (hmm_model.pth)
- Type: Hidden Markov Model (Sticky HDP-HMM)
- Purpose: Models sequential skill transitions and dynamics
- Integration: Used for intrinsic reward computation
Usage
import torch
from training.online_rl import train_online_ppo_with_pretrained_models
# Load the complete model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Load individual components
ppo_checkpoint = torch.load('ppo_policy.pth', map_location=device)
vae_data = torch.load('vae_model.pth', map_location=device)
hmm_data = torch.load('hmm_model.pth', map_location=device)
# Use for inference or continued training
results = train_online_ppo_with_pretrained_models(
env_name="MiniHack-Room-5x5-v0",
vae_repo_id="CatkinChen/nethack-vae-hmm",
hmm_repo_id="CatkinChen/nethack-hmm",
test_mode=True
)
Training Configuration
- Environment: MiniHack-Room-5x5-v0
- Learning Rate: 0.0005
- Batch Size: 32
- Training Time: 0.02 seconds
- Device: cuda
- Seed: None
Performance
Training completed successfully with the following configuration:
- Curiosity-driven exploration: True
- Random Network Distillation: False
Generated on: 2025-09-19 14:38:50