--- library_name: pytorch pipeline_tag: reinforcement-learning tags: - nethack - ppo - vae - hmm - minihack - sequential-skills --- # CatkinChen/nethack-ppo-ablation-baseline_full_curiosity This repository contains a complete Sequential Skill RL model trained on NetHack/MiniHack environments. ## Model Components ### 1. PPO Policy (`ppo_policy.pth`) - **Type**: Proximal Policy Optimization agent - **Environment**: MiniHack-Room-Random-15x15-v0 - **Training Steps**: 195 - **Features**: - Curiosity-driven exploration: True - Random Network Distillation: False ### 2. VAE Model (`vae_model.pth`) - **Type**: Variational Autoencoder - **Purpose**: Encodes NetHack observations into latent skill representations - **Latent Dimension**: 96 - **Architecture**: Unknown ### 3. HMM Model (`hmm_model.pth`) - **Type**: Hidden Markov Model (Sticky HDP-HMM) - **Purpose**: Models sequential skill transitions and dynamics - **Integration**: Used for intrinsic reward computation ## Usage ```python import torch from training.online_rl import train_online_ppo_with_pretrained_models # Load the complete model device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # Load individual components ppo_checkpoint = torch.load('ppo_policy.pth', map_location=device, weights_only=False) vae_data = torch.load('vae_model.pth', map_location=device, weights_only=False) hmm_data = torch.load('hmm_model.pth', map_location=device, weights_only=False) # Use for inference or continued training results = train_online_ppo_with_pretrained_models( env_name="MiniHack-Room-Random-15x15-v0", vae_repo_id="CatkinChen/nethack-vae-hmm", hmm_repo_id="CatkinChen/nethack-hmm", test_mode=True ) ``` ## Training Configuration - **Environment**: MiniHack-Room-Random-15x15-v0 - **Learning Rate**: 0.0003 - **Training Time**: 4420.29 seconds - **Device**: cuda - **Train Seed**: 51 - **Eval Seed**: 51 ## Performance Training completed successfully with the following configuration: - Curiosity-driven exploration: True - Random Network Distillation: False Generated on: 2025-09-27 18:58:38