StemLink Finetuning Demo Model (PEFT Adapter)

This repository contains a PEFT adapter trained as part of the StemLink AI Engineering Bootcamp during Class 11.

The adapter was fine-tuned on the [OpenFinAL/Financial_Question_Answering](https://huggingface.co/datasets/OpenFinAL/Financial_Question_Answering this demo is pedagogical: to show students how to integrate a PEFT adapter into a base model, and how to run inference using both **pipeline-based** and **granular tokenizer/generate approaches**—**without hardcoding** the base model; instead, we read it from the adapter’s own config (base_model_name_or_path).

From the adapter config, the base model is "mistralai/Mistral-7B-v0.1".

2. Model Overview

Base Model: Read from adapter config (base_model_name_or_path)
Adapter Type: LoRA (Low-Rank Adaptation)
Domain: Financial QA
Training Framework: [PEFT](https://github.com/huggingbash

pip install transformers peft accelerate

Ensure you have a GPU-enabled environment for smooth inference.

4. Quick Start: Pipeline Demo

This is the easiest way to run inference using the adapter. No hardcoded base model—we fetch it from the adapter config.

from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

# Adapter repo on Hugging Face
adapter_model_name = "sachithgunasekara/stemlink-finetuning-demo-model"

# Read the base model name/path from the adapter's own config
peft_config = PeftConfig.from_pretrained(adapter_model_name)
base_model_name = peft_config.base_model_name_or_path  # e.g., "mistralai/Mistral-7B-v0.1"

# Load base model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, device_map="auto")

# Load PEFT adapter
model = PeftModel.from_pretrained(base_model, adapter_model_name)

# Create pipeline
qa_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Demo financial question
prompt = "What is the difference between a stock and a bond?"
response = qa_pipeline(prompt, max_new_tokens=120, do_sample=True, temperature=0.7, top_p=0.9)
print(response[0]["generated_text"])

5. Granular Demo: Tokenizer + Generate

For students who want to understand the tokenization and generation process, this example shows manual tokenization and generate—again reading the base model dynamically from the adapter config.

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel, PeftConfig
import torch

adapter_model_name = "sachithgunasekara/stemlink-finetuning-demo-model"

# Discover base model from adapter config
peft_config = PeftConfig.from_pretrained(adapter_model_name)
base_model_name = peft_config.base_model_name_or_path

# Load tokenizer, base model, and attach adapter
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(base_model_name, device_map="auto")
model = PeftModel.from_pretrained(base_model, adapter_model_name)

# Prepare input
question = "Explain the concept of compound interest in simple terms."
inputs = tokenizer(question, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate with common decoding params
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=160,
        do_sample=True,
        temperature=0.8,
        top_p=0.95,
        eos_token_id=tokenizer.eos_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

6. Example Financial Questions

Try these prompts to see the adapter’s specialization:

"What is the difference between a stock and a bond?"
"Explain the concept of compound interest in simple terms."
"What does diversification mean in investment?"
"How do interest rates affect inflation?"
"How do ETFs differ from mutual funds for long-term investors?"
"What are the main risks associated with corporate bonds?"

7. Notes & Tips

No hardcoding: Always fetch base_model_name_or_path via PeftConfig.from_pretrained(...).
Token limits: Adjust max_new_tokens based on your GPU memory and desired response length.
Decoding: Experiment with temperature, top_p, and do_sample to balance determinism vs. creativity.
Safety: The adapter focuses on financial Q&A; avoid out-of-domain tasks for best results.

Downloads last month: 25

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

sachithgunasekara
/

stemlink-finetuning-demo-model