--- license: apache-2.0 tags: - text-generation - kimi_k2 - muon datasets: - loggenix-rca language: - en pipeline_tag: text-generation --- # loggenix-nanoKimi2-test This model was trained using the following configuration: ## Training Details - **Base Architecture**: kimi_k2 - **Optimizer**: muon - **Learning Rate**: 0.02 - **Weight Decay**: 0.1 - **Dataset**: loggenix-rca - **Hidden Size**: 1024 - **Epochs**: 1 ## Model Architecture This is a Mixture of Experts (MoE) model based on DeepseekV3 architecture. ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test") model = AutoModelForCausalLM.from_pretrained("kshitijthakkar/loggenix-nanoKimi2-test") # Generate text input_text = "Hello, how are you?" inputs = tokenizer(input_text, return_tensors="pt") outputs = model.generate(**inputs, max_length=50, do_sample=True, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Training Script This model was trained using a custom training script with the Muon optimizer (if specified).