maomaocun commited on
Commit
0c170b9
·
verified ·
1 Parent(s): 5c8ae0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -12
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # LLaDA-Prometheus
5
 
6
  ## Model Description
7
 
@@ -22,7 +22,7 @@ To load and use this model with Hugging Face Transformers:
22
  import torch
23
  from transformers import AutoTokenizer, AutoModelForCausalLM
24
 
25
- model_name = "maomaocun/LLaDA-Prometheus"
26
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
27
  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to("cuda")
28
 
@@ -35,7 +35,7 @@ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_
35
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
36
  input_ids = inputs['input_ids']
37
  attention_mask = inputs.get('attention_mask', torch.ones_like(input_ids))
38
- for chunk in model.generate(
39
  input_ids=input_ids,
40
  attention_mask=attention_mask,
41
  max_gen_length=1024,
@@ -43,10 +43,9 @@ for chunk in model.generate(
43
  threshold=0.9,
44
  streaming=True,
45
  eos_token_id=126348
46
- ):
47
- all_generated_ids = torch.cat([input_ids, chunk], dim=-1)
48
- text = tokenizer.batch_decode(all_generated_ids, skip_special_tokens=True)[0]
49
- print(text, end='', flush=True)
50
  ```
51
 
52
  For block diffusion-style inference, customize the generation loop to manage KV cache and block outputs as needed.
@@ -55,10 +54,10 @@ For block diffusion-style inference, customize the generation loop to manage KV
55
 
56
  The following table compares performance across key evaluation benchmarks. Results are reported as accuracy percentages where applicable.
57
 
58
- | Model | GSM8K | GPQA | BBH | MATH | HumanEval | MBPP | MMLU-Pro | MMLU-Generate |
59
- |--------------------------------|-------|-------|-------|-------|-----------|-------|----------|---------------|
60
- | LLaDA 8B Base in Pure Diffusion | 69.06 | 31.91 | 44.77 | 30.84 | 32.92 | 40.8 | 24.26 | 65.9 |
61
- | LLaDA 8B Instruct in Pure Diffusion | 77.48 | 29.01 | 51.49 | 22.32 | 38.71 | 39.2 | 36.41 | 65.5 |
62
- | LLaDA-Prometheus in Block Diffusion | 77.4 | 33.03 | 48.74 | 31.94 | 40.24 | 42 | 33.45 | 65.53 |
63
 
64
  These results demonstrate competitive performance, particularly in code generation (HumanEval, MBPP) and reasoning tasks (BBH, MATH), with gains over the base instruct variant in several areas.
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # dLLM-Var
5
 
6
  ## Model Description
7
 
 
22
  import torch
23
  from transformers import AutoTokenizer, AutoModelForCausalLM
24
 
25
+ model_name = "maomaocun/dLLM-Var"
26
  tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
27
  model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to("cuda")
28
 
 
35
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
36
  input_ids = inputs['input_ids']
37
  attention_mask = inputs.get('attention_mask', torch.ones_like(input_ids))
38
+ result = model.generate(
39
  input_ids=input_ids,
40
  attention_mask=attention_mask,
41
  max_gen_length=1024,
 
43
  threshold=0.9,
44
  streaming=True,
45
  eos_token_id=126348
46
+ )
47
+ text = tokenizer.batch_decode(result, skip_special_tokens=True)
48
+ print(text)
 
49
  ```
50
 
51
  For block diffusion-style inference, customize the generation loop to manage KV cache and block outputs as needed.
 
54
 
55
  The following table compares performance across key evaluation benchmarks. Results are reported as accuracy percentages where applicable.
56
 
57
+ | Model | GSM8K | GPQA | BBH | MATH | HumanEval | MBPP | MMLU-Generate |
58
+ |--------------------------------|-------|-------|-------|-------|-----------|-------|---------------|
59
+ | LLaDA 8B Base in Pure Diffusion | 69.06 | 31.91 | 44.77 | 30.84 | 32.92 | 40.80 | 65.9 |
60
+ | LLaDA 8B Instruct in Semi-ar Diffusion | 77.48 | 29.01 | 51.49 | 22.32 | 38.71 | 39.20 | 65.5 |
61
+ | dLLM-Var Block Diffusion | 77.40 | 33.03 | 48.74 | 31.94 | 40.24 | 42.00 | 65.53 |
62
 
63
  These results demonstrate competitive performance, particularly in code generation (HumanEval, MBPP) and reasoning tasks (BBH, MATH), with gains over the base instruct variant in several areas.