maomaocun
/

dLLM-Var

Safetensors

llada

custom_code

Model card Files Files and versions

xet

Community

maomaocun commited on 20 days ago

Commit

0c170b9

verified ·

1 Parent(s): 5c8ae0b

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -12

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: apache-2.0
 ---
-# LLaDA-Prometheus
 ## Model Description
@@ -22,7 +22,7 @@ To load and use this model with Hugging Face Transformers:
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
-model_name = "maomaocun/LLaDA-Prometheus"
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to("cuda")
@@ -35,7 +35,7 @@ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 input_ids = inputs['input_ids']
 attention_mask = inputs.get('attention_mask', torch.ones_like(input_ids))
-for chunk in model.generate(
     input_ids=input_ids,
     attention_mask=attention_mask,
     max_gen_length=1024,
@@ -43,10 +43,9 @@ for chunk in model.generate(
     threshold=0.9,
     streaming=True,
     eos_token_id=126348
-):
-    all_generated_ids = torch.cat([input_ids, chunk], dim=-1)
-    text = tokenizer.batch_decode(all_generated_ids, skip_special_tokens=True)[0]
-    print(text, end='', flush=True)
 ```
 For block diffusion-style inference, customize the generation loop to manage KV cache and block outputs as needed.
@@ -55,10 +54,10 @@ For block diffusion-style inference, customize the generation loop to manage KV
 The following table compares performance across key evaluation benchmarks. Results are reported as accuracy percentages where applicable.
-| Model                          | GSM8K | GPQA  | BBH   | MATH  | HumanEval | MBPP  | MMLU-Pro | MMLU-Generate |
-|--------------------------------|-------|-------|-------|-------|-----------|-------|----------|---------------|
-| LLaDA 8B Base in Pure Diffusion  | 69.06 | 31.91 | 44.77 | 30.84 | 32.92     | 40.8  | 24.26    | 65.9          |
-| LLaDA 8B Instruct in Pure Diffusion | 77.48 | 29.01 | 51.49 | 22.32 | 38.71     | 39.2  | 36.41    | 65.5          |
-| LLaDA-Prometheus in Block Diffusion       | 77.4  | 33.03 | 48.74 | 31.94 | 40.24     | 42    | 33.45    | 65.53         |
 These results demonstrate competitive performance, particularly in code generation (HumanEval, MBPP) and reasoning tasks (BBH, MATH), with gains over the base instruct variant in several areas.

 ---
 license: apache-2.0
 ---
+# dLLM-Var
 ## Model Description
 import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "maomaocun/dLLM-Var"
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to("cuda")
 inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 input_ids = inputs['input_ids']
 attention_mask = inputs.get('attention_mask', torch.ones_like(input_ids))
+result = model.generate(
     input_ids=input_ids,
     attention_mask=attention_mask,
     max_gen_length=1024,
     threshold=0.9,
     streaming=True,
     eos_token_id=126348
+)
+text = tokenizer.batch_decode(result, skip_special_tokens=True)
+print(text)
 ```
 For block diffusion-style inference, customize the generation loop to manage KV cache and block outputs as needed.
 The following table compares performance across key evaluation benchmarks. Results are reported as accuracy percentages where applicable.
+| Model                          | GSM8K | GPQA  | BBH   | MATH  | HumanEval | MBPP  |  MMLU-Generate |
+|--------------------------------|-------|-------|-------|-------|-----------|-------|---------------|
+| LLaDA 8B Base in Pure Diffusion  | 69.06 | 31.91 | 44.77 | 30.84 | 32.92     | 40.80  | 65.9          |
+| LLaDA 8B Instruct in Semi-ar Diffusion | 77.48 | 29.01 | 51.49 | 22.32 | 38.71     | 39.20  | 65.5          |
+| dLLM-Var  Block Diffusion       | 77.40  | 33.03 | 48.74 | 31.94 | 40.24     | 42.00    | 65.53         |
 These results demonstrate competitive performance, particularly in code generation (HumanEval, MBPP) and reasoning tasks (BBH, MATH), with gains over the base instruct variant in several areas.