T5-Json-Parsing-Model: Improved Evaluation Report

Fine-tuned T5 model for structured JSON metadata extraction from unstructured text


Model Overview

T5-Json-Parsing-Model is a text-to-JSON generation model based on T5, specifically fine-tuned to:

  • Parse unstructured input
  • Identify key entities and attributes
  • Output valid JSON with consistent schema

Goal: "extract metadata: ..." → {"type": "...", "name": "...", ...}


Evaluation Results

Metric Score Interpretation
Exact-Match Accuracy 4.33% Very low — strict JSON format not followed
JSON Structural Accuracy 1.33% Almost no outputs are valid JSON
ROUGE-1 53.92 Good unigram overlap
ROUGE-2 38.33 Moderate bigram matching
ROUGE-L 51.53 Strong sequence preservation
BLEU Score 27.69 Decent n-gram precision

Insight:

The model understands the content well (high ROUGE/BLEU), but fails to produce valid JSON syntax.


Inference Example

Input Prompt

extract metadata: John Smith, born in 1980, lives in New York, works as a data scientist.

### Inference : 

```python
input_text = "extract metadata: John Smith, born in 1980, lives in New York, works as a data scientist."
input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True).input_ids

# Generate text using the model's generate method
generated_ids = model.generate(input_ids, max_new_tokens=50, num_beams=5, early_stopping=True)

# Decode the generated IDs to text
decoded_output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(json.loads("{" + decoded_output[1:-1] + "}"))

output :

{'type': 'person', 'name': 'John Smith', 'yr_born': '1980', 'location': 'New York'}

Downloads last month
8
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for boopathiraj/T5-Json-Parsing-Model

Base model

google-t5/t5-small
Finetuned
(2219)
this model

Dataset used to train boopathiraj/T5-Json-Parsing-Model