Instructions to use Casual-Autopsy/RP-Spectrum-24B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Casual-Autopsy/RP-Spectrum-24B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Casual-Autopsy/RP-Spectrum-24B")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Casual-Autopsy/RP-Spectrum-24B", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Casual-Autopsy/RP-Spectrum-24B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Casual-Autopsy/RP-Spectrum-24B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Casual-Autopsy/RP-Spectrum-24B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Casual-Autopsy/RP-Spectrum-24B
- SGLang
How to use Casual-Autopsy/RP-Spectrum-24B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Casual-Autopsy/RP-Spectrum-24B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Casual-Autopsy/RP-Spectrum-24B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Casual-Autopsy/RP-Spectrum-24B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Casual-Autopsy/RP-Spectrum-24B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Casual-Autopsy/RP-Spectrum-24B with Docker Model Runner:
docker model run hf.co/Casual-Autopsy/RP-Spectrum-24B
Broken token? π€
Hi there, I tried this one out... just produced something like this for me?!
5 *Men: I am looking to purchase a pair of sne-1 two there.0} (a few times) *1 Around (the only 1 (7 Only) 1+Model: 1+1=1 -86: 1+2 in+1 not+1: puppet show) *4 *planned events
In Other Words, Black Doting *Type: Nearest Neighbor *9 - enters Does not enter; On a scale that goes: 1 - Aschre- - - - 't Your favorite Food Store, which contains: Grocery, Super Store, 8 by Air, World-Express Stamp Books. -5les (Deaced for school. (Have a memorable vacation, a getaway (a) to make a new plan, looking for an opening, through a crack in the wall.
Any ideas?
Are you using a temp above 0.7? The model might also be a bit too creative for adaptive P and XTC as well, so those should be turned off.
I can say though that on my end the model works, as I've been using it to generate trackers. Doubt it could be a tokenizer issue.
The high creativity is intentional as the original intention of this merge was to get a lot of tasks to generalize with creative tasks. Wanted something for some more advance RP post/pre-response tasks.
Maybe something went wrong when converting it to MLX... had an issue before with Mistral merges, though it's not something that happens often. Yes, low temperatures had the same issue, I think I'll give it a new try with a different quantization... π€