haielab's picture
Update README.md
5685ede verified
metadata
base_model: deepseek-ai/DeepSeek-Prover-V2-7B
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - trl-sft
  - mathematics
  - conjecture-proving
  - flash-attention-2

DeepSeek‑Prover‑V2‑7B · LoRA Adapter

This repository hosts a LoRA adapter fine‑tuned on top of deepseek-ai/DeepSeek-Prover-V2-7B using 🤗 trl’s SFTTrainer.


Training Setup

Hyper‑parameter Value
Learning rate 2 × 10⁻⁴
Batch size / device 16
Gradient accumulation steps 1
Effective batch size 16
Epochs 1
Scheduler linear
Warm‑up ratio 0.03
Weight decay 0.01
Seed 42
Sequence length 1 792
Flash‑Attention‑2 ✅ (use_flash_attention_2=True)

LoRA configuration

Setting Value
Rank r 16
α 32
Dropout 0.05
Target modules all linear layers
Modules saved embed_tokenslm_head
Bias none

RoPE scalingYARN, factor = 16.0, β_fast = 32.0, β_slow = 1.0

Training was performed on GPUs with bfloat16 precision (torch_dtype=torch.bfloat16).


Loss Curve

Loss curves


Usage

from transformers import AutoTokenizer, AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(
    "your‑username/DeepSeek-Prover-V2-7B-conjecture-chat-new-config-20250724_0955",
    trust_remote_code=True,
)
tok = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Prover-V2-7B", trust_remote_code=True)

prompt = "Prove that the sum of two even numbers is even."
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device), max_new_tokens=128)
print(tok.decode(out[0], skip_special_tokens=True))