QuantumCircuit_Optimization_RL

⚛️ Overview

The QuantumCircuit_Optimization_RL is a Reinforcement Learning (RL) agent based on the Proximal Policy Optimization (PPO) algorithm. It is trained to act within a quantum circuit simulation environment, where its goal is to rearrange, simplify, and rewrite sequences of quantum gates to achieve the same computational result with a minimum number of gates (circuit depth). This is crucial for running circuits on noise-prone near-term quantum hardware.

🧠 Model Architecture

The model is structured as an RL agent interacting with a dedicated quantum circuit environment:

Algorithm: PPO, known for stable and efficient training in complex continuous and discrete action spaces.
Policy Network: A standard Multi-Layer Perceptron (MLP) takes the current circuit state (encoded gate sequence) as input.
State Space: The representation of the current quantum circuit configuration and gate sequence (vectorized into 256 features).
Action Space: A discrete set of 10 primitive circuit optimization actions, such as "apply commutative rule," "cancel redundant gates," or "substitute a block with an equivalent lower-depth block."
Reward Function: A positive reward is issued proportional to the immediate reduction in circuit depth and a large terminal reward is given upon reaching the lowest possible depth.

🎯 Intended Use

This agent is intended for use by quantum programmers and hardware developers:

Automatic Circuit Compiling: Integrating the agent into quantum compilers (like Qiskit or Cirq) to automatically generate highly optimized, low-depth circuits before execution.
Hardware Calibration: Generating the shortest possible benchmark circuits to test the fidelity of quantum gates on new hardware.
Research: Studying the limits of circuit compressibility for various quantum algorithms.

⚠️ Limitations

Hardware Specificity: The optimal circuit may depend heavily on the target quantum hardware's specific gate set (e.g., native gates). This model is trained on a general set of unitary gates.
Computational Cost: The PPO training process is computationally expensive, requiring extensive interaction with a quantum circuit simulator.
Locally Optimal: RL agents can sometimes get stuck in a "local minimum" of circuit depth, failing to find the absolute globally minimum circuit for very complex problems.

Downloads last month: 12

Video Preview

Reinforcement Learning

Evaluation results

Average Percentage Depth Reduction
self-reported

18.500
Convergence Success Rate
self-reported

0.920