|  | --- | 
					
						
						|  | license: apache-2.0 | 
					
						
						|  | base_model: Qwen/Qwen2.5-1.5B-Instruct | 
					
						
						|  | tags: | 
					
						
						|  | - triton | 
					
						
						|  | - kernel | 
					
						
						|  | - code-generation | 
					
						
						|  | - fine-tuned | 
					
						
						|  | datasets: | 
					
						
						|  | - triton-kernels-6k | 
					
						
						|  | language: | 
					
						
						|  | - en | 
					
						
						|  | pipeline_tag: text-generation | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Triton Kernel Code Generation Model | 
					
						
						|  |  | 
					
						
						|  | This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct specialized for generating Triton GPU kernels. | 
					
						
						|  |  | 
					
						
						|  | ## Model Details | 
					
						
						|  |  | 
					
						
						|  | - **Base Model**: Qwen/Qwen2.5-1.5B-Instruct | 
					
						
						|  | - **Fine-tuned on**: 6000 examples of Triton kernel code | 
					
						
						|  | - **Eval Loss**: 0.20 | 
					
						
						|  | - **Eval Perplexity**: 1.22 | 
					
						
						|  |  | 
					
						
						|  | ## Usage | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from transformers import AutoTokenizer, AutoModelForCausalLM | 
					
						
						|  |  | 
					
						
						|  | model = AutoModelForCausalLM.from_pretrained("cdreetz/kwen2.5-1.5b") | 
					
						
						|  | tokenizer = AutoTokenizer.from_pretrained("cdreetz/kwen2.5-1.5b") | 
					
						
						|  |  | 
					
						
						|  | prompt = "Write a Triton kernel for element-wise addition:" | 
					
						
						|  | inputs = tokenizer(prompt, return_tensors="pt") | 
					
						
						|  | outputs = model.generate(**inputs, max_new_tokens=512) | 
					
						
						|  | response = tokenizer.decode(outputs[0], skip_special_tokens=True) | 
					
						
						|  | print(response) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## Training Details | 
					
						
						|  |  | 
					
						
						|  | - **Epochs**: 2 | 
					
						
						|  | - **Batch Size**: 2 | 
					
						
						|  | - **Learning Rate**: 1e-5 | 
					
						
						|  | - **Dataset Size**: 6000 examples | 
					
						
						|  |  | 
					
						
						|  | ## Performance | 
					
						
						|  |  | 
					
						
						|  | The model generates syntactically correct Triton kernels with proper: | 
					
						
						|  | - `@triton.jit` decorators | 
					
						
						|  | - Kernel function signatures | 
					
						
						|  | - Launch function implementations | 
					
						
						|  | - Memory access patterns | 
					
						
						|  | - Grid configurations | 
					
						
						|  |  | 
					
						
						|  | ## Limitations | 
					
						
						|  |  | 
					
						
						|  | - Specialized for Triton kernel generation only | 
					
						
						|  | - May require prompt engineering for optimal results | 
					
						
						|  | - Generated kernels should be tested before production use | 
					
						
						|  |  |