Instructions to use muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA", max_seq_length=2048, )
Ghost-Coder: Qwen2.5-32B CUDA-to-HIP Translator
Ghost-Coder is a specialized LLM designed to bridge the gap between NVIDIA's proprietary CUDA and AMD's open ROCm ecosystem. This model is a fine-tuned version of Qwen2.5-Coder-32B-Instruct, optimized specifically for high-fidelity translation of GPU kernels.
Developed for the Lablab.ai AMD Developer Hackathon (2026).
π Model Highlights
- Specialization: Maps complex CUDA logic (memory management, warp primitives, kernels) to functional AMD HIP code.
- Hardware-Aware: Fine-tuned specifically for execution on AMD Instinct hardware.
- Agent-Ready: Designed to be the "brain" of an autonomous, self-healing compiler loop.
π οΈ Training Details
The model was fine-tuned using the Unsloth framework on a high-speed sprint configuration to maximize generalization.
- Hardware: AMD Instinct MI300X (192GB VRAM)
- Base Model: Qwen2.5-Coder-32B-Instruct (4-bit QLoRA)
- Dataset: Curated subset of CASS (CUDA-to-HIP mapping pairs)
- Context Length: 4096
- Training Steps: 200
- Global Batch Size: 64
π§ Intended Use
Ghost-Coder is intended for use in the Ghost-Harness, an agentic workflow that:
- Translates CUDA source code to HIP.
- Attempts compilation via
hipcc. - Self-corrects based on compiler error feedback.
π Acknowledgements
Special thanks to AMD and Lablab.ai for providing the compute resources and the platform to build across the AI stack.
Created by Talha
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for muhammadtlha944/Ghost-Coder-Qwen2.5-32B-LoRA
Base model
Qwen/Qwen2.5-32B Finetuned
Qwen/Qwen2.5-Coder-32B Finetuned
Qwen/Qwen2.5-Coder-32B-Instruct