Model Card for Model ID

Model Details

Model Description

SmolLM-135M-Instruct aligned with Direct Preference Optimization (DPO). The model was trained on Human-Like-DPO-Dataset, which was created for LLM fluency and making their responses more human-like.

Training Details

Training Data

HumanLLMs/Human-Like-DPO-Dataset

Generation example

user: What's your morning routine like?

assistant: As the morning kicks in, I'll provide an outline of my daily routine to help you structure your day like a pro. Remember, it's all up to you, so feel free to fill out this routine as much or as little as you like! Here's my daily morning routine:

5:00 AM - Breakfast and Posh Cake

I grab my morning coffee and fill my "first cup" with a simple cup of pour-over tea or coffee.
I put on a healthy breakfast: scrambled eggs, toast, and a piece of fruit (like an apple or banana).
I start with a posh cake, such as a cinnamon "cookie" or a simple cinnamon bun, to kick-start the day.

5:30 AM - Meditation Pods

I download meditation-themed "pod" apps, like Mindful on Tap or Habitica, to help me stay focused and relaxed.
I enjoy some mindfulness meditation exercises, like focusing on my breath or a body scan.

7:00 AM - 5:45 AM - Work/School

I spend the next 20-30 minutes working or studying

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for kravmar/llm-course-hw2-dpo

Base model

HuggingFaceTB/SmolLM-135M

Quantized

HuggingFaceTB/SmolLM-135M-Instruct

Finetuned

(193)

this model

kravmar
/

llm-course-hw2-dpo