--- license: mit metrics: - rouge tags: - summarization - legal - led - transformers - huggingface - legal-domain - india - india-legal - indian-law language: - en base_model: - allenai/led-base-16384 --- # 🧠 Legal Summarizer Model (Indian Legal Domain) This model is a fine-tuned version of [`allenai/led-base-16384`](https://huggingface.co/allenai/led-base-16384), specifically trained on a curated dataset of **Indian legal documents**. It is optimized for summarizing long legal texts such as court judgments, case laws, contracts, and regulatory documents originating from the Indian judiciary and legal system. ## 📌 Model Use Case Designed to generate concise and informative summaries of lengthy legal documents, such as: - Contracts - Legal notices - Judgments - Regulatory texts ## 📌 Model Use Case This model is intended for summarizing complex and lengthy legal documents from the **Indian legal system**, including: - Court judgments (Supreme Court, High Courts) - Government acts and bills - Contracts governed by Indian law - Legal notices and petitions ## 🇮🇳 Domain Specialization Unlike general-purpose summarization models, this model has been trained specifically on Indian legal content. This includes: - Judgments and case laws sourced from Indian court databases - Indian statutes, acts, and amendments - Public legal notices and contract templates relevant to Indian jurisprudence The vocabulary, phrasing, and structure of Indian legal writing have been captured more accurately by this model. ## 📈 Evaluation Metrics | Metric | Score | |------------|--------| | ROUGE-1 | 50.13 | | ROUGE-2 | 27.15 | | ROUGE-L | 28.14 | | ROUGE-Lsum | 44.75 | ## 🚀 How to Use ```python from transformers import LEDTokenizer, LEDForConditionalGeneration tokenizer = LEDTokenizer.from_pretrained("TheGod-2003/legal-summarizer") model = LEDForConditionalGeneration.from_pretrained("TheGod-2003/legal-summarizer") text = "Your long legal document here..." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=16384) summary_ids = model.generate(inputs["input_ids"], max_length=512, num_beams=4) summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True) print(summary)