--- license: apache-2.0 base_model: svara-tts-v1-gptqmodel tags: - quantized - gptq - text-to-speech - tts - orpheus - 8bit library_name: transformers pipeline_tag: text-generation --- # Orpheus 3B 8-bit GPTQ This is an 8-bit GPTQ quantized version of [baseten/orpheus-3b-0.1-ft](https://huggingface.co/baseten/orpheus-3b-0.1-ft). ## Model Details - **Base Model**: svara-tts-v1-gptqmodel - **Quantization**: 8-bit GPTQ - **Group Size**: 128 - **Calibration Dataset**: canopylabs/zac-sample-dataset (TTS-specific) - **Library**: auto-gptq ## Usage ```python from auto_gptq import AutoGPTQForCausalLM from transformers import AutoTokenizer # Load the quantized model model = AutoGPTQForCausalLM.from_quantized( "Hariprasath28/svara-tts-v1-gptq", device="cuda:0", # or "cpu" use_triton=False, trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained("Hariprasath28/svara-tts-v1-gptq", trust_remote_code=True) # Generate TTS tokens text = "tara: Hello, this is a test of the quantized Orpheus model." inputs = tokenizer(text, return_tensors="pt").to("cuda:0") with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=100, temperature=0.7, do_sample=True ) generated = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated)