CatkinChen/nethack-ppo-ablation-baseline_full_curiosity

This repository contains a complete Sequential Skill RL model trained on NetHack/MiniHack environments.

Model Components

1. PPO Policy (`ppo_policy.pth`)

Type: Proximal Policy Optimization agent
Environment: MiniHack-River-Narrow-v0
Training Steps: 195
Features:
- Curiosity-driven exploration: True
- Random Network Distillation: False

2. VAE Model (`vae_model.pth`)

Type: Variational Autoencoder
Purpose: Encodes NetHack observations into latent skill representations
Latent Dimension: 96
Architecture: Unknown

3. HMM Model (`hmm_model.pth`)

Type: Hidden Markov Model (Sticky HDP-HMM)
Purpose: Models sequential skill transitions and dynamics
Integration: Used for intrinsic reward computation

Usage

import torch
from training.online_rl import train_online_ppo_with_pretrained_models

# Load the complete model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load individual components
ppo_checkpoint = torch.load('ppo_policy.pth', map_location=device, weights_only=False)
vae_data = torch.load('vae_model.pth', map_location=device, weights_only=False)
hmm_data = torch.load('hmm_model.pth', map_location=device, weights_only=False)

# Use for inference or continued training
results = train_online_ppo_with_pretrained_models(
    env_name="MiniHack-River-Narrow-v0",
    vae_repo_id="None",
    hmm_repo_id="None",
    test_mode=True
)

Training Configuration

Environment: MiniHack-River-Narrow-v0
Learning Rate: 0.0003
Training Time: 4517.04 seconds
Device: cuda
Train Seed: 51
Eval Seed: 51

Performance

Training completed successfully with the following configuration:

Curiosity-driven exploration: True
Random Network Distillation: False

Generated on: 2025-09-28 03:14:19

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning