Ksenia Se
Kseniase
AI & ML interests
None yet
Recent Activity
replied to
their
post
about 2 hours ago
11 Fascinating new Policy Optimization techniques
Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:
1. BAlanced Policy Optimization (BAPO) → https://huggingface.co/papers/2510.18927
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse
2. Training-Free GRPO → https://huggingface.co/papers/2510.08191
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior
3. Asymmetric Importance Sampling Policy Optimization (ASPO) → https://huggingface.co/papers/2510.06062
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable
4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence
5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively
6. Information Gain-based Policy Optimization (IGPO) → https://huggingface.co/papers/2510.14967
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning
Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
posted
an
update
about 3 hours ago
11 Fascinating new Policy Optimization techniques
Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them:
1. BAlanced Policy Optimization (BAPO) → https://huggingface.co/papers/2510.18927
Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse
2. Training-Free GRPO → https://huggingface.co/papers/2510.08191
Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior
3. Asymmetric Importance Sampling Policy Optimization (ASPO) → https://huggingface.co/papers/2510.06062
Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable
4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519
Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence
5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270
Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively
6. Information Gain-based Policy Optimization (IGPO) → https://huggingface.co/papers/2510.14967
Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning
Read further below ⬇️
If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
replied to
their
post
7 days ago
12 Awesome GitHub repos to upgrade your AI coding
Coding is the field where AI is welcomed with open arms. Here’s a collection to help you take your AI-assisted coding workflows to the next level of convenience and efficiency:
1. Smol Developer → https://github.com/smol-ai/developer
A lightweight AI “junior dev” that takes your product spec and automatically scaffolds or helps you build full codebases
2. Tabby → https://github.com/TabbyML/tabby
A self-hosted AI coding assistant that runs locally as an alternative to GitHub Copilot. Easy to integrate, GPU-friendly, and doesn’t rely on the cloud
3. Beads (bd) Issue Tracker → https://github.com/steveyegge/beads
Gives coding agents long-term memory, letting them organize, plan, and execute complex tasks reliably across sessions
4. MetaGPT → https://github.com/FoundationAgents/MetaGPT
A multi-agent framework that imitates a software company team using LLMs. It assigns AI agents roles like PM, Architect, and Developer to produce user stories, designs, specs, and final code
5. Open Interpreter → https://github.com/openinterpreter/open-interpreter
Gives you ChatGPT’s coding power with full local control – no limits, no sandbox – so you can automate, analyze, and create anything right from your desktop through a chat interface
6. OpenSpec → https://github.com/Fission-AI/OpenSpec
A lightweight, spec-driven development tool that helps humans and AI agree on what to build before any code is written
7. PR-Agent → https://github.com/qodo-ai/pr-agent
An AI code reviewer that automatically reviews, describes, and improves pull requests across GitHub, GitLab, and other platforms
8. BabyAGI → https://github.com/yoheinakajima/babyagi
A self-building AI framework that gives agents the ability to write, manage, and refine their own functions, turning them from passive tools into active, self-building systems
9 ...⬇️
Subscribe to the Turing Post: https://www.turingpost.com/subscribe – your shortcut to deep, clear AI analysis