Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post about 2 hours ago
11 Fascinating new Policy Optimization techniques Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them: 1. BAlanced Policy Optimization (BAPO) → https://huggingface.co/papers/2510.18927 Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse 2. Training-Free GRPO → https://huggingface.co/papers/2510.08191 Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior 3. Asymmetric Importance Sampling Policy Optimization (ASPO) → https://huggingface.co/papers/2510.06062 Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable 4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519 Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence 5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270 Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively 6. Information Gain-based Policy Optimization (IGPO) → https://huggingface.co/papers/2510.14967 Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
posted an update about 3 hours ago
11 Fascinating new Policy Optimization techniques Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them: 1. BAlanced Policy Optimization (BAPO) → https://huggingface.co/papers/2510.18927 Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse 2. Training-Free GRPO → https://huggingface.co/papers/2510.08191 Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior 3. Asymmetric Importance Sampling Policy Optimization (ASPO) → https://huggingface.co/papers/2510.06062 Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable 4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519 Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence 5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270 Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively 6. Information Gain-based Policy Optimization (IGPO) → https://huggingface.co/papers/2510.14967 Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
replied to their post 7 days ago
12 Awesome GitHub repos to upgrade your AI coding Coding is the field where AI is welcomed with open arms. Here’s a collection to help you take your AI-assisted coding workflows to the next level of convenience and efficiency: 1. Smol Developer → https://github.com/smol-ai/developer A lightweight AI “junior dev” that takes your product spec and automatically scaffolds or helps you build full codebases 2. Tabby → https://github.com/TabbyML/tabby A self-hosted AI coding assistant that runs locally as an alternative to GitHub Copilot. Easy to integrate, GPU-friendly, and doesn’t rely on the cloud 3. Beads (bd) Issue Tracker → https://github.com/steveyegge/beads Gives coding agents long-term memory, letting them organize, plan, and execute complex tasks reliably across sessions 4. MetaGPT → https://github.com/FoundationAgents/MetaGPT A multi-agent framework that imitates a software company team using LLMs. It assigns AI agents roles like PM, Architect, and Developer to produce user stories, designs, specs, and final code 5. Open Interpreter → https://github.com/openinterpreter/open-interpreter Gives you ChatGPT’s coding power with full local control – no limits, no sandbox – so you can automate, analyze, and create anything right from your desktop through a chat interface 6. OpenSpec → https://github.com/Fission-AI/OpenSpec A lightweight, spec-driven development tool that helps humans and AI agree on what to build before any code is written 7. PR-Agent → https://github.com/qodo-ai/pr-agent An AI code reviewer that automatically reviews, describes, and improves pull requests across GitHub, GitLab, and other platforms 8. BabyAGI → https://github.com/yoheinakajima/babyagi A self-building AI framework that gives agents the ability to write, manage, and refine their own functions, turning them from passive tools into active, self-building systems 9 ...⬇️ Subscribe to the Turing Post: https://www.turingpost.com/subscribe – your shortcut to deep, clear AI analysis
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture