Model Card: InferenceVision QA Fine-Tuned GPT-Neo 1.3B

Model Description

This model is a GPT-Neo (1.3B parameters) causal language model fine-tuned for question-answering tasks based on the InferenceVision domain. It uses a structured prompt format with:

Q: <question>
A: <answer>

This model is built upon GPT‑Neo 1.3B—an open-source autoregressive transformer model developed by EleutherAI. Originally designed to replicate aspects of GPT‑3, GPT‑Neo 1.3B contains approximately 1.3 billion parameters and was pretrained on the curated text corpus known as The Pile.

At its core, the model uses a transformer decoder architecture trained with a causal language modeling objective, allowing it to generate fluent text based on input prompts. It demonstrates strong performance on natural language benchmarks—scoring ~57% accuracy on LAMBADA, ~55% on Winogrande, and ~38% on Hellaswag.

Intended Use

The primary use of this model is to accurately answer domain-specific questions by leveraging the InferenceVision documentation. It is designed to provide precise and contextually relevant responses, making it an effective tool for assisting users seeking information related to InferenceVision.

Use Cases:

Developer chat assistants
Technical support chatbots
Documentation search interfaces
Internal developer tools

Out-of-Scope:

Legal, financial, or healthcare guidance
Creative writing or generalized question-answering
Questions unrelated to InferenceVision

Training Data

The model was trained on a custom dataset named qa_data.jsonl which includes question–answer pairs from the InferenceVision project. This dataset was split into a 90% training set and 10% evaluation set using Hugging Face's train_test_split. The NVIDIA A100 GPU utilized for the training process with 40GB VRAM.

Preprocessing

Each example in the dataset was formatted into a standardized prompt structure following the pattern:

Q: <question>
A: <answer>

This clear question-and-answer format helps the model learn to predict answers based on questions as input. The text prompts were then tokenized using the EleutherAI/gpt-neo-1.3B tokenizer, which converts raw text into numerical token IDs compatible with the model’s vocabulary. To ensure consistent input lengths and efficient training, tokenized sequences were truncated or padded to a fixed maximum length of 512 tokens. Padding was applied using the model’s end-of-sequence token (eos_token), by setting the pad_token_id to match it. This ensured that padding tokens did not negatively affect loss computation.

Finally, the input token IDs were duplicated into the labels field, enabling supervised learning where the model is trained to predict the next token in the sequence given the current context.

Training Procedure

Fine-tuned using Hugging Face's Trainer with the following hyperparameters:

TrainingArguments(
    output_dir="./gpt-neo-qa",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    num_train_epochs=16,
    learning_rate=5e-5,
    fp16=True,
    logging_steps=10,
    save_steps=2000,
    save_total_limit=2,
    report_to="none"
)

Mixed precision training (fp16=True)
Only the last two checkpoints retained

Evaluation Results

After 16 epochs of training, the model achieved the following metrics on the InferenceVision QA evaluation set:

Final Training Loss: 0.0457
Training Runtime: 698.31 seconds
Samples per Second: 11.18
Steps per Second: 2.80
Total FLOPs: 2.90 × 10¹⁶

Evaluation Metrics (QA Quality):

ROUGE-1: 0.2642
ROUGE-L: 0.2293
BERTScore Precision: 0.8510
BERTScore Recall: 0.8829
BERTScore F1: 0.8665

Inference Provider

This section provides a simple way to run inference using the fine-tuned doguilmak/inferencevision-gpt-neo-1.3B model. It uses Hugging Face Transformers to load the model and generate answers for InferenceVision-related questions. The model is optimized for domain-specific QA and works best when given clear queries formatted as questions.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "doguilmak/inferencevision-gpt-neo-1.3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def ask_question(question, max_new_tokens=50):
    prompt = f"Q: {question}\nA:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            top_p=0.95,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return answer.replace(prompt, "").strip()

question = "What is InferenceVision?"
answer = ask_question(question)
print("Answer:", answer)

Limitations

Limited to InferenceVision-specific domain knowledge
May hallucinate when asked about out-of-distribution topics
Input limited to 512 tokens — long documents or history must be shortened

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

F32

Model tree for doguilmak/inferencevision-gpt-neo-1.3B

Base model

EleutherAI/gpt-neo-1.3B

Finetuned

(37)

this model

Evaluation results

training_loss on InferenceVision QA Eval Set
self-reported

0.046
train_runtime_seconds on InferenceVision QA Eval Set
self-reported

698.310
samples_per_second on InferenceVision QA Eval Set
self-reported

11.181
steps_per_second on InferenceVision QA Eval Set
self-reported

2.795
total_flops on InferenceVision QA Eval Set
self-reported

28986218347757568.000
rouge1 on InferenceVision QA Eval Set
self-reported

0.264
rougeL on InferenceVision QA Eval Set
self-reported

0.229
bertscore_precision on InferenceVision QA Eval Set
self-reported

0.851
bertscore_recall on InferenceVision QA Eval Set
self-reported

0.883
bertscore_f1 on InferenceVision QA Eval Set
self-reported

0.867

View on Papers With Code