--- { "language": ["en"], "license": "apache-2.0", "tags": [ "text-generation", "causal-lm", "instruction-tuning", "supervised-fine-tuning", "synthetic-qa", "lora", "axolotl", "deepspeed", "transformers", "commandr", "cohere", "eu-hpc" ], "datasets": [ "axolotl_deduplicated_synthetic_qa" ], "metrics": [ "loss" ], "library_name": "transformers", "framework": "pytorch", "base_model": "CohereLabs/c4ai-command-r-v01", "model_name": "commandr-35b-sft", "pipeline_tag": "text-generation", "task_categories": ["text-generation", "instruction-following"], "model_type": "AutoModelForCausalLM", "inference": { "parameters": { "max_new_tokens": 512, "temperature": 0.7, "top_p": 0.9 } }, "trained_on": [ "Leonardo EuroHPC" ], "description": "Supervised fine-tuning (SFT) of Cohere Command-R 35B on the synthetic QA dataset using LoRA and Axolotl. The model improves conversational reasoning and instruction-following capabilities." } --- # Command-R 35B — SFT (Supervised Fine-Tuning on Synthetic QA) **Model type:** Causal Language Model **Base model:** [CohereLabs/c4ai-command-r-v01](https://huggingface.co/CohereLabs/c4ai-command-r-v01) **License:** Apache 2.0 **Framework:** [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) --- ## Overview `commandr-35b-sft` is a **supervised fine-tuned** variant of Cohere’s Command-R 35B model. Fine-tuning was performed on a high-quality instruction-following dataset using LoRA adapters, enabling improved conversational reasoning and question answering. Training was conducted on the **Leonardo EuroHPC** system. --- ## Training Setup **Objective:** Supervised fine-tuning (instruction following) **Adapter type:** LoRA **Precision:** bfloat16 **Hardware:** 8 nodes × 2 × NVIDIA A100 64GB GPUs **Framework:** DeepSpeed ZeRO-1, Axolotl, PyTorch 2.5.1+cu121 **Runtime:** ~6 hours **Dataset split:** 70% train / 30% validation --- ## Dataset **Name:** `axolotl_deduplicated_synthetic_qa.jsonl` **Type:** Instruction-following synthetic QA dataset Each sample follows a QA/chat format used in the `alpaca_chat.load_qa` schema. --- ## Hyperparameters | Parameter | Value | |------------|-------| | Sequence length | 2048 | | Micro batch size | 1 | | Gradient accumulation | 2 | | Epochs | 1 | | Learning rate | 0.0001 | | LR scheduler | cosine | | Optimizer | AdamW (8-bit) | | Warmup steps | 20 | | Weight decay | 0.0 | | LoRA rank (r) | 16 | | LoRA alpha | 32 | | LoRA dropout | 0.05 | | LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Gradient checkpointing | ✅ | | Flash attention | ✅ | | Auto resume | ✅ | | Loss watchdog threshold | 8.0 | | Loss watchdog patience | 20 | --- ## Tokenizer **Tokenizer type:** `AutoTokenizer` **Special token:** `<|end_of_text|>` as `pad_token`