DialoGPT-Medium LoRA Fine-tuned on Alpaca Dataset

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of microsoft/DialoGPT-medium trained on a subset of the Alpaca instruction-following dataset.

Model Details

  • Base Model: microsoft/DialoGPT-medium (345M parameters)
  • Training Method: LoRA (Low-Rank Adaptation)
  • LoRA Configuration:
    • Rank (r): 32
    • Alpha: 64
    • Dropout: 0.1
    • Target modules: c_attn, c_proj, c_fc
  • Dataset: Stanford Alpaca (1000 samples)
  • Training Split: 800 train, 200 validation
  • Epochs: 3
  • Final Training Loss: 3.45

Training Setup

  • Hardware: Apple Silicon (MPS)
  • Precision: FP32 for numerical stability
  • Batch Size: 4 per device
  • Gradient Accumulation: 2 steps
  • Learning Rate: 1e-4
  • Scheduler: Cosine

Performance

This model demonstrates improved instruction-following capabilities compared to the base DialoGPT-medium model, with responses that:

  • Follow instruction format better
  • Provide more detailed explanations
  • Handle diverse question types (AI/ML, technical concepts, etc.)

Usage

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model and tokenizer
base_model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(base_model_name)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "Tanaybh/dialogpt-medium-qlora-alpaca")

# Add padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

Inference

def generate_response(prompt):
    input_text = f"Instruction: {prompt}\nResponse:"
    inputs = tokenizer.encode(input_text, return_tensors="pt")

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_new_tokens=80,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            top_p=0.8,
            repetition_penalty=1.2
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response.split("Response:")[-1].strip()

# Example usage
response = generate_response("What is machine learning?")
print(response)

Example Outputs

Q: What is artificial intelligence? A: It's a computer program designed to detect patterns in images and predict them based on what they are. It can be used for machine learning, image recognition, or machine translation, among other things.

Q: What is the difference between AI and ML? A: AI is a programming language for AI systems. ML is an algorithm for ML systems, like Deep Learning, Deep Recognition, etc. It uses the same algorithms as other algorithms, but is used in more advanced applications.

Training Details

This model was fine-tuned using a Mac-optimized approach that provides QLoRA-like benefits without requiring CUDA-specific quantization libraries:

  • Used higher LoRA rank (32 vs typical 8-16)
  • Targeted more modules for broader adaptation
  • Leveraged Apple Silicon GPU (MPS) for efficient training
  • Applied FP32 precision for numerical stability

Limitations

  • Responses may occasionally be verbose or repetitive
  • Performance varies by question complexity
  • Optimized for instructional/educational content
  • May require generation parameter tuning for best results

Technical Notes

  • This is a LoRA adapter, not a full model
  • Requires the base DialoGPT-medium model to function
  • Trained on Mac hardware using MPS acceleration
  • Compatible with standard PEFT/transformers libraries

Citation

If you use this model, please cite:

@misc{dialogpt-medium-qlora-alpaca,
  author = {Tanay Bhardwaj},
  title = {DialoGPT-Medium LoRA Fine-tuned on Alpaca Dataset},
  year = {2025},
  url = {https://huggingface.co/Tanaybh/dialogpt-medium-qlora-alpaca}
}

License

MIT License - see base model license for additional terms.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Tanaybh/dialogpt-medium-qlora-alpaca

Adapter
(31)
this model