Model Card for Model ID

Model Details

Model Description

SmolLM-135M-Instruct aligned with Direct Preference Optimization (DPO). The model was trained on Human-Like-DPO-Dataset, which was created for LLM fluency and making their responses more human-like.

Training Details

Training Data

HumanLLMs/Human-Like-DPO-Dataset

Generation example

user: What's your morning routine like?

assistant: As the morning kicks in, I'll provide an outline of my daily routine to help you structure your day like a pro. Remember, it's all up to you, so feel free to fill out this routine as much or as little as you like! Here's my daily morning routine:

5:00 AM - Breakfast and Posh Cake

  • I grab my morning coffee and fill my "first cup" with a simple cup of pour-over tea or coffee.
  • I put on a healthy breakfast: scrambled eggs, toast, and a piece of fruit (like an apple or banana).
  • I start with a posh cake, such as a cinnamon "cookie" or a simple cinnamon bun, to kick-start the day.

5:30 AM - Meditation Pods

  • I download meditation-themed "pod" apps, like Mindful on Tap or Habitica, to help me stay focused and relaxed.
  • I enjoy some mindfulness meditation exercises, like focusing on my breath or a body scan.

7:00 AM - 5:45 AM - Work/School

  • I spend the next 20-30 minutes working or studying
Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kravmar/llm-course-hw2-dpo

Finetuned
(193)
this model

Dataset used to train kravmar/llm-course-hw2-dpo