Instructions to use ruggsea/Llama3.1-Instruct-SEP-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ruggsea/Llama3.1-Instruct-SEP-Chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ruggsea/Llama3.1-Instruct-SEP-Chat") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ruggsea/Llama3.1-Instruct-SEP-Chat") model = AutoModelForCausalLM.from_pretrained("ruggsea/Llama3.1-Instruct-SEP-Chat") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ruggsea/Llama3.1-Instruct-SEP-Chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ruggsea/Llama3.1-Instruct-SEP-Chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruggsea/Llama3.1-Instruct-SEP-Chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ruggsea/Llama3.1-Instruct-SEP-Chat
- SGLang
How to use ruggsea/Llama3.1-Instruct-SEP-Chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ruggsea/Llama3.1-Instruct-SEP-Chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruggsea/Llama3.1-Instruct-SEP-Chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ruggsea/Llama3.1-Instruct-SEP-Chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ruggsea/Llama3.1-Instruct-SEP-Chat", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ruggsea/Llama3.1-Instruct-SEP-Chat with Docker Model Runner:
docker model run hf.co/ruggsea/Llama3.1-Instruct-SEP-Chat
Llama3-SEP-Chat: Philosophy Expert Assistant
This model is a LoRA-finetuned version of meta-llama/Meta-Llama-3.1-8B-instruct on a curated dataset of Stanford Encyclopedia of Philosophy (SEP) conversations. The model is designed to engage in philosophical discussions with a formal yet accessible tone, leveraging the comprehensive knowledge from SEP.
Model Description
The model was trained using direct finetuning on the instruct variant of Llama 3, preserving its native chat format and instruction-following capabilities while enhancing its philosophical expertise.
Training Dataset
The training data consists of multi-turn conversations derived from the Stanford Encyclopedia of Philosophy, formatted as chat interactions between a user and an assistant. The conversations maintain academic rigor while ensuring accessibility.
Chat Format
The model uses Llama 3's native chat format, which is automatically applied by the tokenizer. No additional tokens or formatting were added during finetuning.
Training Details
Model Configuration
- Base Model:
meta-llama/Meta-Llama-3.1-8B-instruct - Training Type: LoRA (Low-Rank Adaptation)
- Quantization: 4-bit (NF4)
- Compute: Mixed Precision (bfloat16)
Training Hyperparameters
- Learning Rate: 2e-5
- Train Batch Size: 16
- Gradient Accumulation Steps: 2
- Effective Batch Size: 32
- Optimizer: paged_adamw_8bit
- Training Epochs: 5
- Warmup Ratio: 0.03
- LoRA Configuration:
- Rank: 256
- Alpha: 128
- Dropout: 0.05
- Target: all-linear layers
Framework Versions
- Transformers: latest
- PEFT: latest
- PyTorch: 2.1.0+cu121
- TRL: latest
- Accelerate: latest
Usage
The model can be used with the standard Hugging Face transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ruggsea/Llama3.1-Instruct-SEP-Chat"
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
# Format your input using the chat template
messages = [
{"role": "user", "content": "What is the categorical imperative?"}
]
# Apply the chat template
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False
)
# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.2,
no_repeat_ngram_size=3,
)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
Limitations
- The model's knowledge is primarily focused on philosophical concepts and may not perform as well on general topics
- As with all language models, it may occasionally generate incorrect or inconsistent information
- The model inherits any limitations and biases present in the base Llama 3 model and the SEP dataset
License
This model is subject to the Meta Llama 3 license terms. Please refer to Meta's licensing for usage requirements and restrictions.
- Downloads last month
- 8