Model Card: InferenceVision QA Fine-Tuned GPT-Neo 1.3B
Model Description
This model is a GPT-Neo (1.3B parameters) causal language model fine-tuned for question-answering tasks based on the InferenceVision domain. It uses a structured prompt format with:
Q: <question>
A: <answer>
This model is built upon GPT‑Neo 1.3B—an open-source autoregressive transformer model developed by EleutherAI. Originally designed to replicate aspects of GPT‑3, GPT‑Neo 1.3B contains approximately 1.3 billion parameters and was pretrained on the curated text corpus known as The Pile.
At its core, the model uses a transformer decoder architecture trained with a causal language modeling objective, allowing it to generate fluent text based on input prompts. It demonstrates strong performance on natural language benchmarks—scoring ~57% accuracy on LAMBADA, ~55% on Winogrande, and ~38% on Hellaswag.
Intended Use
The primary use of this model is to accurately answer domain-specific questions by leveraging the InferenceVision documentation. It is designed to provide precise and contextually relevant responses, making it an effective tool for assisting users seeking information related to InferenceVision.
Use Cases:
- Developer chat assistants
- Technical support chatbots
- Documentation search interfaces
- Internal developer tools
Out-of-Scope:
- Legal, financial, or healthcare guidance
- Creative writing or generalized question-answering
- Questions unrelated to InferenceVision
Training Data
The model was trained on a custom dataset named qa_data.jsonl which includes question–answer pairs from the InferenceVision project. This dataset was split into a 90% training set and 10% evaluation set using Hugging Face's train_test_split. The NVIDIA A100 GPU utilized for the training process with 40GB VRAM.
Preprocessing
Each example in the dataset was formatted into a standardized prompt structure following the pattern:
Q: <question>
A: <answer>
This clear question-and-answer format helps the model learn to predict answers based on questions as input. The text prompts were then tokenized using the EleutherAI/gpt-neo-1.3B tokenizer, which converts raw text into numerical token IDs compatible with the model’s vocabulary. To ensure consistent input lengths and efficient training, tokenized sequences were truncated or padded to a fixed maximum length of 512 tokens. Padding was applied using the model’s end-of-sequence token (eos_token), by setting the pad_token_id to match it. This ensured that padding tokens did not negatively affect loss computation.
Finally, the input token IDs were duplicated into the labels field, enabling supervised learning where the model is trained to predict the next token in the sequence given the current context.
Training Procedure
Fine-tuned using Hugging Face's Trainer with the following hyperparameters:
TrainingArguments(
output_dir="./gpt-neo-qa",
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
num_train_epochs=16,
learning_rate=5e-5,
fp16=True,
logging_steps=10,
save_steps=2000,
save_total_limit=2,
report_to="none"
)
- Mixed precision training (
fp16=True) - Only the last two checkpoints retained
Evaluation Results
After 16 epochs of training, the model achieved the following metrics on the InferenceVision QA evaluation set:
- Final Training Loss: 0.0457
- Training Runtime: 698.31 seconds
- Samples per Second: 11.18
- Steps per Second: 2.80
- Total FLOPs: 2.90 × 10¹⁶
Evaluation Metrics (QA Quality):
- ROUGE-1: 0.2642
- ROUGE-L: 0.2293
- BERTScore Precision: 0.8510
- BERTScore Recall: 0.8829
- BERTScore F1: 0.8665
Inference Provider
This section provides a simple way to run inference using the fine-tuned doguilmak/inferencevision-gpt-neo-1.3B model. It uses Hugging Face Transformers to load the model and generate answers for InferenceVision-related questions. The model is optimized for domain-specific QA and works best when given clear queries formatted as questions.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "doguilmak/inferencevision-gpt-neo-1.3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def ask_question(question, max_new_tokens=50):
prompt = f"Q: {question}\nA:"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=max_new_tokens,
temperature=0.7,
top_p=0.95,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
return answer.replace(prompt, "").strip()
question = "What is InferenceVision?"
answer = ask_question(question)
print("Answer:", answer)
Limitations
- Limited to InferenceVision-specific domain knowledge
- May hallucinate when asked about out-of-distribution topics
- Input limited to 512 tokens — long documents or history must be shortened
- Downloads last month
- 2
Model tree for doguilmak/inferencevision-gpt-neo-1.3B
Base model
EleutherAI/gpt-neo-1.3BEvaluation results
- training_loss on InferenceVision QA Eval Setself-reported0.046
- train_runtime_seconds on InferenceVision QA Eval Setself-reported698.310
- samples_per_second on InferenceVision QA Eval Setself-reported11.181
- steps_per_second on InferenceVision QA Eval Setself-reported2.795
- total_flops on InferenceVision QA Eval Setself-reported28986218347757568.000
- rouge1 on InferenceVision QA Eval Setself-reported0.264
- rougeL on InferenceVision QA Eval Setself-reported0.229
- bertscore_precision on InferenceVision QA Eval Setself-reported0.851
- bertscore_recall on InferenceVision QA Eval Setself-reported0.883
- bertscore_f1 on InferenceVision QA Eval Setself-reported0.867
