Model Card: InferenceVision QA Fine-Tuned GPT-Neo 1.3B

InferenceVisionCover

Model Description

This model is a GPT-Neo (1.3B parameters) causal language model fine-tuned for question-answering tasks based on the InferenceVision domain. It uses a structured prompt format with:

Q: <question>
A: <answer>

This model is built upon GPT‑Neo 1.3B—an open-source autoregressive transformer model developed by EleutherAI. Originally designed to replicate aspects of GPT‑3, GPT‑Neo 1.3B contains approximately 1.3 billion parameters and was pretrained on the curated text corpus known as The Pile.

At its core, the model uses a transformer decoder architecture trained with a causal language modeling objective, allowing it to generate fluent text based on input prompts. It demonstrates strong performance on natural language benchmarks—scoring ~57% accuracy on LAMBADA, ~55% on Winogrande, and ~38% on Hellaswag.

Intended Use

The primary use of this model is to accurately answer domain-specific questions by leveraging the InferenceVision documentation. It is designed to provide precise and contextually relevant responses, making it an effective tool for assisting users seeking information related to InferenceVision.

Use Cases:

  • Developer chat assistants
  • Technical support chatbots
  • Documentation search interfaces
  • Internal developer tools

Out-of-Scope:

  • Legal, financial, or healthcare guidance
  • Creative writing or generalized question-answering
  • Questions unrelated to InferenceVision

Training Data

The model was trained on a custom dataset named qa_data.jsonl which includes question–answer pairs from the InferenceVision project. This dataset was split into a 90% training set and 10% evaluation set using Hugging Face's train_test_split. The NVIDIA A100 GPU utilized for the training process with 40GB VRAM.

Preprocessing

Each example in the dataset was formatted into a standardized prompt structure following the pattern:

Q: <question>
A: <answer>

This clear question-and-answer format helps the model learn to predict answers based on questions as input. The text prompts were then tokenized using the EleutherAI/gpt-neo-1.3B tokenizer, which converts raw text into numerical token IDs compatible with the model’s vocabulary. To ensure consistent input lengths and efficient training, tokenized sequences were truncated or padded to a fixed maximum length of 512 tokens. Padding was applied using the model’s end-of-sequence token (eos_token), by setting the pad_token_id to match it. This ensured that padding tokens did not negatively affect loss computation.

Finally, the input token IDs were duplicated into the labels field, enabling supervised learning where the model is trained to predict the next token in the sequence given the current context.

Training Procedure

Fine-tuned using Hugging Face's Trainer with the following hyperparameters:

TrainingArguments(
    output_dir="./gpt-neo-qa",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    num_train_epochs=16,
    learning_rate=5e-5,
    fp16=True,
    logging_steps=10,
    save_steps=2000,
    save_total_limit=2,
    report_to="none"
)
  • Mixed precision training (fp16=True)
  • Only the last two checkpoints retained

Evaluation Results

After 16 epochs of training, the model achieved the following metrics on the InferenceVision QA evaluation set:

  • Final Training Loss: 0.0457
  • Training Runtime: 698.31 seconds
  • Samples per Second: 11.18
  • Steps per Second: 2.80
  • Total FLOPs: 2.90 × 10¹⁶

Evaluation Metrics (QA Quality):

  • ROUGE-1: 0.2642
  • ROUGE-L: 0.2293
  • BERTScore Precision: 0.8510
  • BERTScore Recall: 0.8829
  • BERTScore F1: 0.8665

Inference Provider

This section provides a simple way to run inference using the fine-tuned doguilmak/inferencevision-gpt-neo-1.3B model. It uses Hugging Face Transformers to load the model and generate answers for InferenceVision-related questions. The model is optimized for domain-specific QA and works best when given clear queries formatted as questions.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "doguilmak/inferencevision-gpt-neo-1.3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def ask_question(question, max_new_tokens=50):
    prompt = f"Q: {question}\nA:"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=0.7,
            top_p=0.95,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return answer.replace(prompt, "").strip()

question = "What is InferenceVision?"
answer = ask_question(question)
print("Answer:", answer)

Limitations

  • Limited to InferenceVision-specific domain knowledge
  • May hallucinate when asked about out-of-distribution topics
  • Input limited to 512 tokens — long documents or history must be shortened
Downloads last month
2
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for doguilmak/inferencevision-gpt-neo-1.3B

Finetuned
(37)
this model

Evaluation results