T5-Json-Parsing-Model: Improved Evaluation Report

Fine-tuned T5 model for structured JSON metadata extraction from unstructured text

Model Overview

T5-Json-Parsing-Model is a text-to-JSON generation model based on T5, specifically fine-tuned to:

Parse unstructured input
Identify key entities and attributes
Output valid JSON with consistent schema

Goal: "extract metadata: ..." → {"type": "...", "name": "...", ...}

Evaluation Results

Metric	Score	Interpretation
Exact-Match Accuracy	4.33%	Very low — strict JSON format not followed
JSON Structural Accuracy	1.33%	Almost no outputs are valid JSON
ROUGE-1	53.92	Good unigram overlap
ROUGE-2	38.33	Moderate bigram matching
ROUGE-L	51.53	Strong sequence preservation
BLEU Score	27.69	Decent n-gram precision

Insight:

The model understands the content well (high ROUGE/BLEU), but fails to produce valid JSON syntax.

Inference Example

Input Prompt

extract metadata: John Smith, born in 1980, lives in New York, works as a data scientist.

### Inference : 

```python
input_text = "extract metadata: John Smith, born in 1980, lives in New York, works as a data scientist."
input_ids = tokenizer(input_text, return_tensors="pt", truncation=True, padding=True).input_ids

# Generate text using the model's generate method
generated_ids = model.generate(input_ids, max_new_tokens=50, num_beams=5, early_stopping=True)

# Decode the generated IDs to text
decoded_output = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(json.loads("{" + decoded_output[1:-1] + "}"))

output :

{'type': 'person', 'name': 'John Smith', 'yr_born': '1980', 'location': 'New York'}

Downloads last month: 8

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for boopathiraj/T5-Json-Parsing-Model

Base model

google-t5/t5-small

Finetuned

(2219)

this model

boopathiraj
/

T5-Json-Parsing-Model