Automatic Schema Induction(text-to-schema) Model

This model is a sub-task of text-to-json task that generates a JSON template given a text.

Usage

import json
import torch
from transformers import AutoModel, AutoTokenizer

model_name = "chnaaam/luSI-v1.0"

if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

text = """์•„์ด์œ (IU, ๋ณธ๋ช…: ์ด์ง€์€, ๆŽ็Ÿฅๆฉ[1], 1993๋…„ 5์›” 16์ผ~)๋Š” ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์‹ฑ์–ด์†ก๋ผ์ดํ„ฐ, ์ž‘๊ณก๊ฐ€, ๋ฐฐ์šฐ์ด๋‹ค. 2007๋…„ ๋กœ์—” ์—”ํ„ฐํ…Œ์ธ๋จผํŠธ(ํ˜„ ์นด์นด์˜ค ์—”ํ„ฐํ…Œ์ธ๋จผํŠธ) ์—ฐ์Šต์ƒ์œผ๋กœ ์ „์† ๊ณ„์•ฝ์„ ๋งบ๊ณ  15์„ธ์˜ ๋‚˜์ด์— 2008๋…„ ์ฒซ EP์ธ ๋กœ์ŠคํŠธ ์•ค ํŒŒ์šด๋“œ(Lost and Found)๋ฅผ ํ†ตํ•ด ๊ฐ€์ˆ˜๋กœ ๋ฐ๋ท”ํ–ˆ๋‹ค."""

messages = [
    {"role": "user", "content": text}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
    temperature=0.0
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

json_template = json.loads(response)

print(json_template)

Output

{
  'Person': {
    'Name': '',
    'Stage name': '',
    'Real name': '',
    'Birth date': '',
    'Nationality': '',
    'Occupations': [],
    'Debut': {
      'Age': '',
      'Year': '',
      'Company': '',
      'Contract type': '',
      'EP': '',
      'EP title': ''
    }
  }
}
Downloads last month
1
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for chnaaam/luSI-v1.0

Finetuned
(12)
this model