Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
Abstract
Reinforcement learning approach enables large language models to translate unseen languages by leveraging in-context linguistic knowledge rather than memorizing specific languages.
Prior work has shown that large language models (LLMs) can translate unseen or low-resource languages by undergoing continued training or even by encoding a grammar book in their context. However, both methods typically overfit specific languages, with limited zero-shot transfer at test time. To translate extremely low-resource languages at scale, we argue that LLMs must acquire the meta-skill of utilizing in-context linguistic knowledge rather than memorizing specific languages. In this paper, we propose a reinforcement learning (RL) approach to unseen language translation given rich linguistic context, using a surface-level translation metric (chrF) as the reward. Empirically, despite the lightweight reward, our RL-trained models effectively extract and apply relevant linguistic information from the provided context, leading to better translations on completely unseen languages than in-context learning or supervised fine-tuning. Our analyses suggest that outcome-based RL can extend beyond conventional reasoning tasks like math and coding to serve as a recipe for language learning from context.
Community
In this paper, we propose a reinforcement learning approach to unseen language translation given
rich linguistic context, we argue that LLMs can acquire the meta-skill of utilizing context knowledge rather than memorizing specific languages thus have generalization ability to unseen languages.
The paper shows that reinforcement learning with a surface-level chrF reward can train models to extract and apply linguistic information from rich in-context descriptions, outperforming both in-context learning and supervised fine-tuning on completely unseen languages. This suggests outcome-based RL enables acquisition of a transferable meta-skill rather than language-specific memorization.
How might the chrF reward’s emphasis on surface overlap shape which aspects of the provided linguistic context the model learns to prioritize during RL?
I made a podcast on it with ResearchPod, it makes it easy to get the key concepts on the go:
https://researchpod.app/episode/07923618-2812-4b7c-b9ec-5e9a94a7451d
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Reasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation? (2026)
- Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax (2026)
- Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation (2026)
- Reference-Free Reinforcement Learning Fine-Tuning for MT: A Seq2Seq Perspective (2026)
- Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation (2026)
- Crosslingual On-Policy Self-Distillation for Multilingual Reasoning (2026)
- Rethinking the Multilingual Reasoning Gap with Layer Swap (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.06428 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper