Model Card for Model ID
Model Details
Model Description
SmolLM-135M-Instruct aligned with Direct Preference Optimization (DPO). The model was trained on Human-Like-DPO-Dataset, which was created for LLM fluency and making their responses more human-like.
Training Details
Training Data
HumanLLMs/Human-Like-DPO-Dataset
Generation example
user: What's your morning routine like?
assistant: As the morning kicks in, I'll provide an outline of my daily routine to help you structure your day like a pro. Remember, it's all up to you, so feel free to fill out this routine as much or as little as you like! Here's my daily morning routine:
5:00 AM - Breakfast and Posh Cake
- I grab my morning coffee and fill my "first cup" with a simple cup of pour-over tea or coffee.
- I put on a healthy breakfast: scrambled eggs, toast, and a piece of fruit (like an apple or banana).
- I start with a posh cake, such as a cinnamon "cookie" or a simple cinnamon bun, to kick-start the day.
5:30 AM - Meditation Pods
- I download meditation-themed "pod" apps, like Mindful on Tap or Habitica, to help me stay focused and relaxed.
- I enjoy some mindfulness meditation exercises, like focusing on my breath or a body scan.
7:00 AM - 5:45 AM - Work/School
- I spend the next 20-30 minutes working or studying
- Downloads last month
- 5
Model tree for kravmar/llm-course-hw2-dpo
Base model
HuggingFaceTB/SmolLM-135M
Quantized
HuggingFaceTB/SmolLM-135M-Instruct