YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card: QLoRA Tuned TinyLlama 1.1B Chat v1.0

Model Overview

• Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0

• Fine Tuning Strategy: QLoRA (Efficient Fine Tuning of Quantized LLMs)

• Architecture & Quantization Details:

o The model is loaded with a 4 bit quantization configuration (using BitsAndBytes with nf4 quantization type and bfloat16 compute data type), enabling significant memory savings.

o PEFT is applied through a LoraConfig with carefully chosen hyperparameters such as lora_alpha, r, and lora_dropout to balance robustness and training speed.

o Target modules are selected from the projection layers (e.g., q_proj, k_proj, v_proj, and o_proj) where LoRA–style adaptation is applied.

Intended Use

• Primary Application: Low resource sentiment classification on tweets, leveraging large model capacity with reduced memory footprint.

• Task Suitability: Ideal for environments with constrained computational resources while still benefiting from a 1.1B parameter model’s performance.

• Deployment: Suitable for both research and production settings where the combination of quantization and parameter efficient fine tuning offers an optimal trade off between latency and accuracy.

Training Data & Process

• Dataset: cardiffnlp/tweet_eval (sentiment subset)

• Task: Classify tweets into one of three sentiment categories: negative, neutral, or positive.

• Processing Pipeline:

o A conversational prompt structure is used with a system prompt instructing for a single-word sentiment output.

o Tokenization and prompt formatting are carefully managed to ensure compatibility with SFTTrainer.

o Advanced training settings include gradient accumulation, cosine learning rate scheduling, and optimized batch sizes to accommodate the quantized model.

• Quantization Configuration:

o The BitsAndBytes configuration employs 4 bit quantization with nf4 and double quantization enabled, ensuring an efficient memory usage profile during training and inference.

Evaluation & Performance

• Metrics: Evaluated on Macro F1 score using the standard test split for sentiment classification.

• Observations: The QLoRA tuned model shows competitive performance despite its reduced memory footprint, successfully balancing speed and accuracy in sentiment prediction.

• Usage Recommendation: This model card is ideal for users seeking to deploy a larger conversational model in memory constrained environments without significantly compromising performance.

Results： tried my best to get as much F1 as I could , but the best was 0.21, was sure it could be better , but somehow I cant re-realize that result, either it's because of the input or the hyper params. Not sure so far.

Limitations & Caveats

• Quantization may introduce variability in output quality; therefore, benchmarking across different hardware is recommended.

• The extensive prompt‐processing pipeline requires careful tuning; minor changes in prompt template may affect performance.

• Domain adaptation is critical: while optimized for tweet sentiment, further tuning might be necessary for other text genres.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support