jmedroberta-base-manbyo-wordpiece

jmedroberta-base-manbyo-wordpiece is a Japanese RoBERTa-based model optimized for medical text understanding. It has been fine-tuned on CT-RATE-JPN, a large-scale dataset of Japanese chest CT reports, for multi-label classification of 18 common thoracic CT findings.

The model leverages the medical-domain vocabulary coverage of JMedRoBERTa and achieves strong and stable performance on Japanese radiology reports.

Model Overview

Base model: alabnii/jmedroberta-base-manbyo-wordpiece
Task: Multi-label classification (18 abnormal findings)
Training data: CT-RATE-JPN (Japanese translations of CT-RATE reports)
Input: Japanese radiology reports
Output: Probabilities (0–1) for each finding

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "alabnii/jmedroberta-base-manbyo-wordpiece"
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=18,
    problem_type="multi_label_classification"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def infer(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits
    return torch.sigmoid(logits)

texts = ["両肺に淡い浸潤影を認めます。"]
probs = infer(texts)

License

Trained on CT-RATE-JPN, released under CC BY-NC-SA
Model weights and outputs are for non-commercial research use only

Citation

Please cite the following when using this model or the dataset:

@misc{yamagishi2024ctrep,
  title={Development of a Large-scale Dataset of Chest Computed Tomography Reports in Japanese and a High-performance Finding Classification Model},
  author={Yosuke Yamagishi et al.},
  year={2024},
  eprint={2412.15907},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{yamagishi2025modernber,
  title={ModernBERT is More Efficient than Conventional BERT for Chest CT Findings Classification in Japanese Radiology Reports},
  author={Yosuke Yamagishi et al.},
  year={2025},
  eprint={2503.05060},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Downloads last month: 52

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YYama0/CT-JMedRoBERTa

Base model

alabnii/jmedroberta-base-manbyo-wordpiece

Finetuned

(1)

this model

YYama0
/

CT-JMedRoBERTa