--- base_model: - google-t5/t5-base library_name: transformers license: apache-2.0 datasets: - iedeveci/WillowNLtoFOL - yuan-yang/MALLS-v0 language: - en pipeline_tag: translation tags: - NLTOFOL - NL - FOL --- # Model Card for fvossel/t5-base-nl-to-fol This model is a fully fine-tuned version of [`google-t5/t5-base`](https://huggingface.co/google-t5/t5-base). It was trained to translate **natural language statements into First-Order Logic (FOL)** representations. ## Model Details ### Model Description - **Developed by:** Vossel et al. at Osnabrück University - **Funded by:** Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 456666331 - **Model type:** Encoder-decoder sequence-to-sequence model (T5 architecture) - **Language(s) (NLP):** English, FOL - **License:** This model was fine-tuned from [`google/t5-base`](https://huggingface.co/google/t5-base), which is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), and is itself released under the **Apache 2.0 License**. - **Finetuned from model:** google/t5-base ## Uses ### Direct Use This model is designed to translate natural language (NL) sentences into corresponding first-order logic (FOL) expressions. Use cases include: - Automated semantic parsing and formalization of NL statements into symbolic logic. - Supporting explainable AI systems that require symbolic reasoning based on language input. - Research in neurosymbolic AI, logic-based natural language understanding, and formal verification. - Integration into pipelines for natural language inference, question answering, or knowledge base population. Users should verify and validate symbolic formulas generated by the model for correctness depending on the application. ### Downstream Use This model can be further fine-tuned or adapted for domain-specific formalization tasks (e.g., legal, biomedical). Suitable for interactive systems requiring formal reasoning. ### Out-of-Scope Use - Not designed for general natural language generation. - May struggle with ambiguous, highly figurative, or out-of-domain input. - Outputs should not be used as final decisions in critical areas without expert review. ### Recommendations - Validate outputs carefully before use in critical applications. - Be aware of possible biases from training data and synthetic data sources. - Specialized for English NL and FOL; may not generalize to other languages or logics. - Use human-in-the-loop workflows for sensitive tasks. - Intended for research and prototyping, not standalone critical systems. ## How to Get Started with the Model ```python import torch from transformers import T5Tokenizer, T5ForConditionalGeneration # Load tokenizer and model model_path = "fvossel/t5-base-nl-to-fol" tokenizer = T5Tokenizer.from_pretrained(model_path) model = T5ForConditionalGeneration.from_pretrained(model_path).to("cuda") # Example NL input nl_input = "All dogs are animals." # Preprocess prompt input_text = "translate English natural language statements into first-order logic (FOL): " + nl_input inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda") # Generate prediction with torch.no_grad(): outputs = model.generate( inputs["input_ids"], max_length=256, min_length=1, num_beams=5, length_penalty=2.0, early_stopping=True, ) # Decode and print result print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Details ### Training Data The model was fine-tuned on two datasets: - **WillowNLtoFOL:** Contains over 16,000 NL-FOL pairs. Published in: Deveci, İ. E. (2024). *Transformer models for translating natural language sentences into formal logical expressions.* Licensed under CC BY-NC-ND 4.0. - **MALLS-v0:** 34,000 NL-FOL pairs generated by GPT-4, syntactically checked. Licensed under Attribution-NonCommercial 4.0, subject to OpenAI terms. ### Training Procedure The model was fully fine-tuned (no LoRA) from `google/t5-base` with: - Prompt-based instruction tuning - Single-GPU training with float32 precision - Preprocessing replaced FOL quantifiers (e.g., `∀`) with tokens like `FORALL` - Maximum input/output sequence length was 250 tokens ### Training Hyperparameters - **Training regime:** float32 precision - **Batch size:** 8 (per device) - **Learning rate:** 0.001 - **Number of epochs:** 12 - **Optimizer:** AdamW - **Adam epsilon:** 1e-8 - **Scheduler:** Linear warmup with 500 steps - **Gradient accumulation steps:** 1 - **Weight decay:** 0.01 - **LoRA:** Not used (full fine-tuning) - **Task type:** SEQ_2_SEQ_LM - **Early stopping patience:** 4 epochs - **Evaluation strategy:** per epoch - **Save strategy:** per epoch - **Save total limit:** 5 checkpoints - **Best model selection metric:** eval_loss