Automatic Schema Induction(text-to-schema) Model

This model is a sub-task of text-to-json task that generates a JSON template given a text.

Usage

import json
import torch
from transformers import AutoModel, AutoTokenizer

model_name = "chnaaam/luSI-v1.0"

if torch.cuda.is_available():
    device = "cuda"
elif torch.backends.mps.is_available():
    device = "mps"
else:
    device = "cpu"

model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

text = """아이유(IU, 본명: 이지은, 李知恩[1], 1993년 5월 16일~)는 대한민국의 싱어송라이터, 작곡가, 배우이다. 2007년 로엔 엔터테인먼트(현 카카오 엔터테인먼트) 연습생으로 전속 계약을 맺고 15세의 나이에 2008년 첫 EP인 로스트 앤 파운드(Lost and Found)를 통해 가수로 데뷔했다."""

messages = [
    {"role": "user", "content": text}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
    temperature=0.0
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

json_template = json.loads(response)

print(json_template)

Output

{
  'Person': {
    'Name': '',
    'Stage name': '',
    'Real name': '',
    'Birth date': '',
    'Nationality': '',
    'Occupations': [],
    'Debut': {
      'Age': '',
      'Year': '',
      'Company': '',
      'Contract type': '',
      'EP': '',
      'EP title': ''
    }
  }
}

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for chnaaam/luSI-v1.0

Base model

microsoft/Phi-3.5-mini-instruct

Finetuned

numind/NuExtract-1.5

Finetuned

(12)

this model