--- tags: - LunarLander-v2 - ppo - deep-reinforcement-learning - reinforcement-learning - custom-implementation model-index: - name: PPO results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: LunarLander-v2 type: LunarLander-v2 metrics: - type: mean_reward value: -113.57 +/- 74.63 name: mean_reward verified: false --- # PPO Agent for LunarLander-v2 ## Model Description This is a Proximal Policy Optimization (PPO) agent trained to play the LunarLander-v2 environment from OpenAI Gym. The model was trained using a custom PyTorch implementation of the PPO algorithm. ## Model Details - **Model Type**: Reinforcement Learning Agent (PPO) - **Architecture**: Actor-Critic Neural Network - **Framework**: PyTorch - **Environment**: LunarLander-v2 (OpenAI Gym) - **Algorithm**: Proximal Policy Optimization (PPO) - **Training Library**: Custom PyTorch implementation ## Training Details ### Hyperparameters | Parameter | Value | |-----------|-------| | Total Timesteps | 50,000 | | Learning Rate | 0.00025 | | Number of Environments | 4 | | Steps per Environment | 128 | | Batch Size | 512 | | Minibatch Size | 128 | | Number of Minibatches | 4 | | Update Epochs | 4 | | Discount Factor (γ) | 0.99 | | GAE Lambda (λ) | 0.95 | | Clip Coefficient | 0.2 | | Value Function Coefficient | 0.5 | | Entropy Coefficient | 0.01 | | Max Gradient Norm | 0.5 | ### Training Configuration - **Seed**: 1 (for reproducibility) - **Device**: CUDA enabled - **Learning Rate Annealing**: Enabled - **Generalized Advantage Estimation (GAE)**: Enabled - **Advantage Normalization**: Enabled - **Value Loss Clipping**: Enabled ## Performance ### Evaluation Results - **Environment**: LunarLander-v2 - **Mean Reward**: -113.57 ± 74.63 The agent achieves a mean reward of -113.57 with a standard deviation of 74.63 over evaluation episodes. ## Usage This model can be used for: - Reinforcement learning research and experimentation - Educational purposes to understand PPO implementation - Baseline comparison for LunarLander-v2 experiments - Fine-tuning starting point for similar control tasks ## Technical Implementation ### Architecture Details The model uses an Actor-Critic architecture implemented in PyTorch: - **Actor Network**: Outputs action probabilities for the discrete action space - **Critic Network**: Estimates state values for advantage computation - **Shared Features**: Common feature extraction layers (if applicable) ### PPO Algorithm Features - **Clipped Surrogate Objective**: Prevents large policy updates - **Value Function Clipping**: Stabilizes value function learning - **Generalized Advantage Estimation**: Reduces variance in advantage estimates - **Multiple Epochs**: Updates policy multiple times per batch of experience ## Environment Information **LunarLander-v2** is a classic control task where an agent must learn to: - Land a lunar lander safely on a landing pad - Control thrust and rotation to manage descent - Balance fuel efficiency with landing accuracy - Handle continuous state space and discrete action space **Action Space**: Discrete(4) - 0: Do nothing - 1: Fire left orientation engine - 2: Fire main engine - 3: Fire right orientation engine **Observation Space**: Box(8) containing: - Position (x, y) - Velocity (x, y) - Angle and angular velocity - Left and right leg ground contact ## Training Environment - **Framework**: Custom PyTorch PPO implementation - **Parallel Environments**: 4 concurrent environments for data collection - **Total Training Time**: 50,000 timesteps across all environments - **Experience Collection**: On-policy learning with trajectory batches ## Limitations and Considerations - The model shows moderate performance with high variance in rewards - Training was limited to 50,000 timesteps, which may be insufficient for optimal performance - Performance may vary significantly across different episodes due to the stochastic nature of the environment - The model has not been tested on variations of the LunarLander environment ## Citation If you use this model in your research, please cite: ```bibtex @misc{cllv2_ppo_lunarlander, author = {Adilbai}, title = {PPO Agent for LunarLander-v2}, year = {2024}, publisher = {Hugging Face}, url = {https://huggingface.co/Adilbai/cLLv2} } ``` ## License Please refer to the repository license for usage terms and conditions. ## Contact For questions or issues regarding this model, please open an issue in the model repository or contact the model author.