CatkinChen's picture
Add model card for ablation_baseline_full_curiosity_MiniHack_Room_Random_15x15_v0_20250927-174436
2938084 verified
|
raw
history blame
2.11 kB
metadata
library_name: pytorch
pipeline_tag: reinforcement-learning
tags:
  - nethack
  - ppo
  - vae
  - hmm
  - minihack
  - sequential-skills

CatkinChen/nethack-ppo-ablation-baseline_full_curiosity

This repository contains a complete Sequential Skill RL model trained on NetHack/MiniHack environments.

Model Components

1. PPO Policy (ppo_policy.pth)

  • Type: Proximal Policy Optimization agent
  • Environment: MiniHack-Room-Random-15x15-v0
  • Training Steps: 195
  • Features:
    • Curiosity-driven exploration: True
    • Random Network Distillation: False

2. VAE Model (vae_model.pth)

  • Type: Variational Autoencoder
  • Purpose: Encodes NetHack observations into latent skill representations
  • Latent Dimension: 96
  • Architecture: Unknown

3. HMM Model (hmm_model.pth)

  • Type: Hidden Markov Model (Sticky HDP-HMM)
  • Purpose: Models sequential skill transitions and dynamics
  • Integration: Used for intrinsic reward computation

Usage

import torch
from training.online_rl import train_online_ppo_with_pretrained_models

# Load the complete model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Load individual components
ppo_checkpoint = torch.load('ppo_policy.pth', map_location=device, weights_only=False)
vae_data = torch.load('vae_model.pth', map_location=device, weights_only=False)
hmm_data = torch.load('hmm_model.pth', map_location=device, weights_only=False)

# Use for inference or continued training
results = train_online_ppo_with_pretrained_models(
    env_name="MiniHack-Room-Random-15x15-v0",
    vae_repo_id="CatkinChen/nethack-vae-hmm",
    hmm_repo_id="CatkinChen/nethack-hmm",
    test_mode=True
)

Training Configuration

  • Environment: MiniHack-Room-Random-15x15-v0
  • Learning Rate: 0.0003
  • Training Time: 4420.29 seconds
  • Device: cuda
  • Train Seed: 51
  • Eval Seed: 51

Performance

Training completed successfully with the following configuration:

  • Curiosity-driven exploration: True
  • Random Network Distillation: False

Generated on: 2025-09-27 18:58:38