File size: 11,691 Bytes
b1c6288
23ce2d1
b1c6288
 
 
 
 
 
2e87108
23ce2d1
1347ca6
 
52b89d2
3061078
 
 
 
b9bd97b
3061078
23ce2d1
1347ca6
23ce2d1
 
 
 
1347ca6
 
23ce2d1
1347ca6
23ce2d1
 
 
 
ba87338
1347ca6
23ce2d1
1347ca6
23ce2d1
1347ca6
23ce2d1
 
1347ca6
23ce2d1
1347ca6
23ce2d1
 
 
 
 
1347ca6
23ce2d1
 
 
 
 
 
1347ca6
 
23ce2d1
 
 
 
1347ca6
 
23ce2d1
1347ca6
23ce2d1
1347ca6
23ce2d1
 
 
 
1347ca6
23ce2d1
1347ca6
23ce2d1
 
 
 
1347ca6
23ce2d1
1347ca6
 
 
5f43deb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
03074d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55e5f5d
03074d7
 
 
 
 
d4a9730
23ce2d1
 
d4a9730
23ce2d1
d4a9730
1d15146
 
 
 
 
 
23ce2d1
d4a9730
23ce2d1
1347ca6
 
ba16809
1347ca6
ba16809
 
 
 
 
 
 
 
 
 
 
 
 
1347ca6
ba16809
1347ca6
23ce2d1
1d15146
 
23ce2d1
 
 
 
 
2bf725e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23ce2d1
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
---
license: llama3
language:
- en
- ja
- zh
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
library_name: transformers
---

# <span style="color: red; ">New version of Llama3-ELAINE-medLLM-instruct-8B is available</span>

[Llama3-ELAINE-medLLM-instruct-8B_v0.1](https://huggingface.co/kenyano/Llama3-ELAINE-medLLM-instruct-8B_v0.1)


-----------------------

# ELAINE-medllm - Build with Llama3-8B

ELAINE (EngLish-jApanese-chINesE)-medLLM is a trilingual (English, Japanese, Chinese) large language mol adapted for the bio-medical domain based on Llama-3-8B.
The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model.
The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT).
ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model's capability.


## Model Details

* **Model type**: Please refer to [Llama 3 Github](https://github.com/meta-llama/llama3) for details on the model architecture.
* **Language(s)**: English, Japanese, Chinese
* **Library**: [DeepSpeed](hhttps://github.com/microsoft/DeepSpeed) 
* **Tokenizer**: Please refer to [Llama 3 blog](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the tokenizer.


## Model Performance

## Evaluation Benchmarks

The evaluation behchmark dataset and evaluation code can be obtained from [this Github site](https://github.com/aistairc/medLLM_QA_benchmark).
The details of the bechmark are as follows.

### English evaluation benchmarks

  - [MedQA](https://arxiv.org/abs/2009.13081)
  - [MedQA-4op](https://arxiv.org/abs/2009.13081)
  - [MMLU](https://arxiv.org/abs/2009.03300)
  - [MedMCQA](https://proceedings.mlr.press/v174/pal22a.html)
  - [PubMedQA](https://doi.org/10.18653/v1/D19-1259)

### Japanese evaluation benchmarks
  - [IgakuQA](https://github.com/jungokasai/IgakuQA)
  	- We concatenate the original exam data from 2018 to 2022 into a single JSON file.
  - [JJSIMQA](https://arxiv.org/abs/2310.10083)
  - DenQA
  	- It contains the exam problems from the Japan National Dentistry Examination and their answers in the past two years (from 2023 through 2024) extracted from the official website of the Ministry of Health, Labor and Welfare in Japan (https://www.mhlw.go.jp/stf/english/index.html).


### Chinese evaluation benchmarks
  - [MedQA](https://arxiv.org/abs/2009.13081)
  - [MedQA-4op](https://arxiv.org/abs/2009.13081)
  - [CMExam](https://arxiv.org/abs/2306.03030)


## Training Datasets

### Continued pre-training

For continued pretraining, we collected English, Japanese, and Chinese text in the bio-medical domain.
The domain text collected is classified into six categories: 1) scientific papers, 2) medical guidelines, 3) web text related to biomedical, 4) textbook of biomedical, 5) PubMed abstracts, and 6) PubMed Central (PMC) archives.
For the Japanese PubMed abstract, we used the original English PubMed abstract translated in Japanese.
We used only open-licensed text except for the Japanese biomedical papers from [J-STAGE](https://www.jstage.jst.go.jp/browse/-char/en).

### Instruction supervised fine-tuning

We collected various conversational QA datasets in the bio-medical domain from different data sources.
For English, we used Medical Meadow in MedAlpca, HealthCareMagic, and iClilic dataset used in ChatDoctor.
We adapted the augmented QA dataset from HuatuoGPT-2 for Chinese and English.
For Japanese, we used existing alpaca datasets in the general domain translated in Japanese.

### Results



## English benchmark


| model_name                                     | MMLU   | MedMCQA | MedQA  | MedQA-4op | PubMedQA | Avg    |
|------------------------------------------------|--------|---------|--------|-----------|----------|--------|
| google_gemma-7b-it                             | 50.55  | 41.07   | 33.12  | 39.67     | 67.07    | 46.30  |
| meta-llama_Llama-2-7b-chat-hf                  | 48.71  | 35.97   | 30.99  | 38.09     | 63.64    | 43.48  |
| meta-llama_Meta-Llama-3-8B-Instruct            | 72.79  | 60.89   | 57.65  | 61.28     | 78.99    | 66.32  |
| tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 66.88  | 53.85   | 47.95  | 56.07     | 64.65    | 57.88  |
| medalpaca_medalpaca-7b                         | 51.48  | 36.02   | 31.15  | 39.35     | 55.15    | 42.63  |
| epfl-llm_meditron-7b                           | 47.32  | 34.35   | 29.18  | 32.26     | 39.19    | 36.46  |
| aaditya_Llama3-OpenBioLLM-8B                   | 73.43  | 55.03   | 50.00  | 56.78     | 65.86    | 60.22  |
| FreedomIntelligence_Apollo-7B                  | 68.17  | 53.85   | 45.98  | 53.86     | 75.35    | 59.44  |
| Llama3-ELAINE-medLLM-instruct-8B               | 72.69  | 55.07   | 55.76  | 61.36     | 75.35    | 64.05  |


## Japanese benchmark

| model_name                                     | DenQA  | IgakuQA | JJSIMQA | Avg    |
|------------------------------------------------|--------|---------|---------|--------|
| google_gemma-7b-it                             | 13.71  | 25.51   | 12.09   | 17.10  |
| meta-llama_Llama-2-7b-chat-hf                  | 12.03  | 20.80   | 10.55   | 14.46  |
| meta-llama_Meta-Llama-3-8B-Instruct            | 19.72  | 40.45   | 25.93   | 28.70  |
| tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 23.78  | 44.01   | 26.81   | 31.53  |
| medalpaca_medalpaca-7b                         | 10.91  | 17.74   | 10.77   | 13.14  |
| epfl-llm_meditron-7b                           | 9.79   | 18.20   | 8.35    | 12.11  |
| aaditya_Llama3-OpenBioLLM-8B                   | 18.18  | 33.03   | 21.98   | 24.40  |
| FreedomIntelligence_Apollo-7B                  | 17.90  | 32.28   | 20.66   | 23.61  |
| Llama3-ELAINE-medLLM-instruct-8B               | 22.24  | 43.36   | 24.40   | 30.00  |


## Chinese benchmark

| model_name                                     | CMExam | MedQA  | MedQA-4op | Avg    |
|------------------------------------------------|--------|--------|-----------|--------|
| google_gemma-7b-it                             | 30.90  | 29.03  | 34.96     | 31.63  |
| meta-llama_Llama-2-7b-chat-hf                  | 25.43  | 25.37  | 32.30     | 27.70  |
| meta-llama_Meta-Llama-3-8B-Instruct            | 52.01  | 62.99  | 68.40     | 61.13  |
| tokyotech-llm_Llama-3-Swallow-8B-Instruct-v0.1 | 41.11  | 45.05  | 51.27     | 45.81  |
| medalpaca_medalpaca-7b                         | 23.58  | 24.99  | 30.11     | 26.23  |
| epfl-llm_meditron-7b                           | 23.85  | 25.46  | 29.82     | 26.38  |
| aaditya_Llama3-OpenBioLLM-8B                   | 39.07  | 42.59  | 48.73     | 43.46  |
| FreedomIntelligence_Apollo-7B                  | 49.99  | 58.29  | 62.99     | 57.09  |
| Llama3-ELAINE-medLLM-instruct-8B               | 48.85  | 55.80  | 61.59     | 55.41  |


## samples

```
import torch
from transformers import pipeline

pipe = pipeline("text-generation", 
                model="kenyano/Llama3-ELAINE-medLLM-instruct-8B",
                torch_dtype=torch.float16,
                device_map="auto", 
                trust_remote_code=True)



messages_ja = [
            {"role": "system", "content": "あなたはAIヘルスアシスタントです" },
            {"role": "user", "content": "高血圧とはどれくらいの血圧でしょうか?"},
            {"role": "user", "content": "うつ病はどのようにすれば治りますか?"},
            {"role": "user", "content": "自閉症はどんな原因が考えられますか?"},
            {"role": "user", "content": "アレルギー性鼻炎がありますが、いい薬はありますか?"},
            {"role": "user", "content": "脳梗塞とはどんな病気で、治療法はあるでしょうか?"},
            {"role": "user", "content": "突発性難聴とはどんな病気ですか?治療法はありますか?"},
            {"role": "user", "content": "緑内障と白内障の違いを教えて"},
          ]

messages_ch = [
            {"role": "system", "content": "你是一名人工智能健康助理。" },
            {"role": "user", "content": "高血压有多高?"},
            {"role": "user", "content": "如何治愈抑郁症?"},
            {"role": "user", "content": "自闭症的可能病因是什么?"},
            {"role": "user", "content": "我有过敏性鼻炎,有什么好药吗?"},
            {"role": "user", "content": "什么是中风,有治疗方法吗?"},
            {"role": "user", "content": "什么是突发性听力损失? 有治疗方法吗?"},
            {"role": "user", "content": "青光眼和白内障有什么区别?"},
          ]

messages_en = [
            {"role": "system", "content": "You are an AI Health Assistant"},
            {"role": "user", "content": "How high is hypertension?"},
            {"role": "user", "content": "How can depression be cured?"},
            {"role": "user", "content": "What are the possible causes of autism?"},
            {"role": "user", "content": "I have allergic rhinitis, are there any good medications?"},
            {"role": "user", "content": "What is a stroke and is there a treatment for it?"},
            {"role": "user", "content": "What is sudden hearing loss? Is there a treatment?"},
            {"role": "user", "content": "Tell me the difference between glaucoma and cataract."},
          ]


messages = messages_ja

for i in range(len(messages)-1):

    inputs = [messages[0], messages[i+1]]
    prompt = pipe.tokenizer.apply_chat_template(inputs, tokenize=False, add_generation_prompt=True)

    outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.01)

    print("-"*10)
    print(f"{messages[i+1]['role']}: {messages[i+1]['content']}")
    print(outputs[0]["generated_text"][len(prompt):])
    
```

 
## Risks and Limitations

The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.


## Known issues
The current model has some issues continuing to generate text when it should stop. We plan to fix this issue in the coming months.



## Acknowledgements

We thank Meta Research for releasing Llama 3 under a generous open license.


## Authors

- Ken Yano
- Zheheng Luo
- Jimin Huang
- Qianqian Xie
- Masaki Asada
- Chenhan Yuan
- Kailai Yang
- Makoto Miwa
- Sophia Ananiadou
- Jun'ichi Tsujii


## Contact

- Ken Yano [[email protected]]




## How to cite

If you find our work helpful, please feel free to cite these papers.

```
@inproceedings{yano-etal-2025-elaine,
    title = "{ELAINE}-med{LLM}: Lightweight {E}nglish {J}apanese {C}hinese Trilingual Large Language Model for Bio-medical Domain",
    author = "Yano, Ken  and
      Luo, Zheheng  and
      Huang, Jimin  and
      Xie, Qianqian  and
      Asada, Masaki  and
      Yuan, Chenhan  and
      Yang, Kailai  and
      Miwa, Makoto  and
      Ananiadou, Sophia  and
      Tsujii, Jun{'}ichi",
    editor = "Rambow, Owen  and
      Wanner, Leo  and
      Apidianaki, Marianna  and
      Al-Khalifa, Hend  and
      Eugenio, Barbara Di  and
      Schockaert, Steven",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-main.313/",
    pages = "4670--4688",
 }
```