new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Mar 13

BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

Can reinforcement learning with hard, verifiable rewards teach a compact language model to reason about physics, or does it primarily learn to pattern-match toward correct answers? We study this question by training a 1.5B-parameter reasoning model on beam statics, a classic engineering problem, using parameter-efficient RLVR with binary correctness rewards from symbolic solvers, without teacher-generated reasoning traces. The best BeamPERL checkpoint achieves a 66.7% improvement in Pass@1 over the base model. However, the learned competence is anisotropic: the model generalizes compositionally (more loads) but fails under topological shifts (moved supports) that require the same equilibrium equations. Intermediate checkpoints yield the strongest reasoning, while continued optimization degrades robustness while maintaining reward. These findings reveal a key limitation of outcome-level alignment: reinforcement learning with exact physics rewards induces procedural solution templates rather than internalization of governing equations. The precision of the reward signal - even when analytically exact - does not by itself guarantee transferable physical reasoning. Our results suggest that verifiable rewards may need to be paired with structured reasoning scaffolding to move beyond template matching toward robust scientific reasoning.

Impact of local bunching factors in single-pass THz free electron lasers

In simulations for modern free-electron lasers (FEL), shot noise plays a crucial role. While it is inversely proportional to the number of electrons, shot noise is typically modeled using macroparticles, with their bunching factors corresponding to the bunching factors of the much larger number of electrons. For short-wavelength FELs, the macroparticles are assumed to be uniformly distributed on the scale of the resonant wavelength, since shot noise dominates the initial radiation - for instance, in the self-amplified spontaneous emission (SASE) regime. In this paper, we show that this assumption does not hold at longer wavelengths, particularly in the THz range, where the bunch current profile is not uniform even within the length of the resonant wavelength. Instead, the current profile dominates the initial bunching factors, which can be several orders of magnitude higher than shot noise. The slice-based bunching factors and bunching phases are derived for Gaussian distributions and compared with shot noise under the assumption that the current within each slice remains constant. Using the THz FEL at the photoinjector test facility at DESY in Zeuthen (PITZ) as a case study, the influence of the current profile has been benchmarked through simulations under very low bunch charge, where the full number of electrons can be modeled using the Genesis1.3 code. Additional simulations with the nominal working parameters of PITZ THz FEL have been compared with experimental data, indicating better agreement when the actual current profile is taken into account.

  • 2 authors
·
Dec 1, 2025