Model Card: QLoRA Tuned TinyLlama 1.1B Chat v1.0
Model Overview
• Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
• Fine Tuning Strategy: QLoRA (Efficient Fine Tuning of Quantized LLMs)
• Architecture & Quantization Details:
o The model is loaded with a 4 bit quantization configuration (using BitsAndBytes with nf4 quantization type and bfloat16 compute data type), enabling significant memory savings.
o PEFT is applied through a LoraConfig with carefully chosen hyperparameters such as lora_alpha, r, and lora_dropout to balance robustness and training speed.
o Target modules are selected from the projection layers (e.g., q_proj, k_proj, v_proj, and o_proj) where LoRA–style adaptation is applied.
Intended Use
• Primary Application: Low resource sentiment classification on tweets, leveraging large model capacity with reduced memory footprint.
• Task Suitability: Ideal for environments with constrained computational resources while still benefiting from a 1.1B parameter model’s performance.
• Deployment: Suitable for both research and production settings where the combination of quantization and parameter efficient fine tuning offers an optimal trade off between latency and accuracy.
Training Data & Process
• Dataset: cardiffnlp/tweet_eval (sentiment subset)
• Task: Classify tweets into one of three sentiment categories: negative, neutral, or positive.
• Processing Pipeline:
o A conversational prompt structure is used with a system prompt instructing for a single-word sentiment output.
o Tokenization and prompt formatting are carefully managed to ensure compatibility with SFTTrainer.
o Advanced training settings include gradient accumulation, cosine learning rate scheduling, and optimized batch sizes to accommodate the quantized model.
• Quantization Configuration:
o The BitsAndBytes configuration employs 4 bit quantization with nf4 and double quantization enabled, ensuring an efficient memory usage profile during training and inference.
Evaluation & Performance
• Metrics: Evaluated on Macro F1 score using the standard test split for sentiment classification.
• Observations: The QLoRA tuned model shows competitive performance despite its reduced memory footprint, successfully balancing speed and accuracy in sentiment prediction.
• Usage Recommendation: This model card is ideal for users seeking to deploy a larger conversational model in memory constrained environments without significantly compromising performance.
Results: tried my best to get as much F1 as I could , but the best was 0.21, was sure it could be better , but somehow I cant re-realize that result, either it's because of the input or the hyper params. Not sure so far.
Limitations & Caveats
• Quantization may introduce variability in output quality; therefore, benchmarking across different hardware is recommended.
• The extensive prompt‐processing pipeline requires careful tuning; minor changes in prompt template may affect performance.
• Domain adaptation is critical: while optimized for tweet sentiment, further tuning might be necessary for other text genres.