---
base_model:
- google-t5/t5-base
library_name: transformers
license: apache-2.0
datasets:
- iedeveci/WillowNLtoFOL
- yuan-yang/MALLS-v0
language:
- en
pipeline_tag: translation
tags:
- NLTOFOL
- NL
- FOL
---

# Model Card for fvossel/t5-base-nl-to-fol

This model is a fully fine-tuned version of [`google-t5/t5-base`](https://huggingface.co/google-t5/t5-base). It was trained to translate **natural language statements into First-Order Logic (FOL)** representations.

## Model Details

### Model Description

- **Developed by:** Vossel et al. at Osnabrück University  
- **Funded by:** Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) 456666331  
- **Model type:** Encoder-decoder sequence-to-sequence model (T5 architecture)  
- **Language(s) (NLP):** English, FOL  
- **License:** This model was fine-tuned from [`google/t5-base`](https://huggingface.co/google/t5-base), which is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0), and is itself released under the **Apache 2.0 License**.  
- **Finetuned from model:** google/t5-base

## Uses

### Direct Use

This model is designed to translate natural language (NL) sentences into corresponding first-order logic (FOL) expressions. Use cases include:

- Automated semantic parsing and formalization of NL statements into symbolic logic.  
- Supporting explainable AI systems that require symbolic reasoning based on language input.  
- Research in neurosymbolic AI, logic-based natural language understanding, and formal verification.  
- Integration into pipelines for natural language inference, question answering, or knowledge base population.

Users should verify and validate symbolic formulas generated by the model for correctness depending on the application.

### Downstream Use

This model can be further fine-tuned or adapted for domain-specific formalization tasks (e.g., legal, biomedical). Suitable for interactive systems requiring formal reasoning.

### Out-of-Scope Use

- Not designed for general natural language generation.  
- May struggle with ambiguous, highly figurative, or out-of-domain input.  
- Outputs should not be used as final decisions in critical areas without expert review.

### Recommendations

- Validate outputs carefully before use in critical applications.  
- Be aware of possible biases from training data and synthetic data sources.  
- Specialized for English NL and FOL; may not generalize to other languages or logics.  
- Use human-in-the-loop workflows for sensitive tasks.  
- Intended for research and prototyping, not standalone critical systems.

## How to Get Started with the Model

```python
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Load tokenizer and model
model_path = "fvossel/t5-base-nl-to-fol"
tokenizer = T5Tokenizer.from_pretrained(model_path)
model = T5ForConditionalGeneration.from_pretrained(model_path).to("cuda")

# Example NL input
nl_input = "All dogs are animals."

# Preprocess prompt
input_text = "translate English natural language statements into first-order logic (FOL): " + nl_input
inputs = tokenizer(input_text, return_tensors="pt", padding=True).to("cuda")

# Generate prediction
with torch.no_grad():
    outputs = model.generate(
        inputs["input_ids"],
        max_length=256,
        min_length=1,
        num_beams=5,
        length_penalty=2.0,
        early_stopping=True,
    )

# Decode and print result
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

## Training Details

### Training Data

The model was fine-tuned on two datasets:

- **WillowNLtoFOL:** Contains over 16,000 NL-FOL pairs. Published in:  
  Deveci, İ. E. (2024). *Transformer models for translating natural language sentences into formal logical expressions.*  
  Licensed under CC BY-NC-ND 4.0.

- **MALLS-v0:** 34,000 NL-FOL pairs generated by GPT-4, syntactically checked.  
  Licensed under Attribution-NonCommercial 4.0, subject to OpenAI terms.

### Training Procedure

The model was fully fine-tuned (no LoRA) from `google/t5-base` with:

- Prompt-based instruction tuning  
- Single-GPU training with float32 precision  
- Preprocessing replaced FOL quantifiers (e.g., `∀`) with tokens like `FORALL`  
- Maximum input/output sequence length was 250 tokens

### Training Hyperparameters

- **Training regime:** float32 precision  
- **Batch size:** 8 (per device)  
- **Learning rate:** 0.001  
- **Number of epochs:** 12  
- **Optimizer:** AdamW  
- **Adam epsilon:** 1e-8  
- **Scheduler:** Linear warmup with 500 steps  
- **Gradient accumulation steps:** 1  
- **Weight decay:** 0.01  
- **LoRA:** Not used (full fine-tuning)  
- **Task type:** SEQ_2_SEQ_LM  
- **Early stopping patience:** 4 epochs  
- **Evaluation strategy:** per epoch  
- **Save strategy:** per epoch  
- **Save total limit:** 5 checkpoints  
- **Best model selection metric:** eval_loss