llm-course-hw2-dpo / README.md
kravmar's picture
Update README.md
644e90c verified
metadata
library_name: transformers
datasets:
  - HumanLLMs/Human-Like-DPO-Dataset
language:
  - en
base_model:
  - HuggingFaceTB/SmolLM-135M-Instruct
pipeline_tag: text-generation

Model Card for Model ID

Model Details

Model Description

SmolLM-135M-Instruct aligned with Direct Preference Optimization (DPO). The model was trained on Human-Like-DPO-Dataset, which was created for LLM fluency and making their responses more human-like.

Training Details

Training Data

HumanLLMs/Human-Like-DPO-Dataset

Generation example

user: What's your morning routine like?

assistant: As the morning kicks in, I'll provide an outline of my daily routine to help you structure your day like a pro. Remember, it's all up to you, so feel free to fill out this routine as much or as little as you like! Here's my daily morning routine:

5:00 AM - Breakfast and Posh Cake

  • I grab my morning coffee and fill my "first cup" with a simple cup of pour-over tea or coffee.
  • I put on a healthy breakfast: scrambled eggs, toast, and a piece of fruit (like an apple or banana).
  • I start with a posh cake, such as a cinnamon "cookie" or a simple cinnamon bun, to kick-start the day.

5:30 AM - Meditation Pods

  • I download meditation-themed "pod" apps, like Mindful on Tap or Habitica, to help me stay focused and relaxed.
  • I enjoy some mindfulness meditation exercises, like focusing on my breath or a body scan.

7:00 AM - 5:45 AM - Work/School

  • I spend the next 20-30 minutes working or studying