Model Card for SmolLM-135M-Instruct-DPO

This model is trained as a part of homework on VK NLP Course.

Model Details

SmolLM-135M-Instruct LLM fine-tunned on HumanLLMs/Human-Like-DPO-Dataset for 1 epoch using DPO. Training stats (reward margins and reward accuracy) are presented below.

DPO training stats

Log probabilities of "chosen" and "rejected" assistant reply options on the training set before and after DPO training:

log probs

DPO-tuned model considers "rejected" options slightly less probable than the base model, and vice-versa for "chosen" options.

Examples of outputs

Prompt:

How does the human body adapt to extreme environments, and what can we learn from people who have survived in such conditions?

Output:

What a fascinating question!

The human body is incredibly resilient and adaptable in the face of extreme environments. Our bodies have evolved to cope with the harsh conditions that exist in space, extreme temperatures, and other hostile environments. Here are some ways the human body adapts to extreme environments:

  1. Physiological Adaptations: Our bodies have evolved to conserve energy and function in the most extreme conditions. For example, our hearts can pump blood at a rate of up to 1,000 times per minute in space, while our lungs can extract oxygen from the air at a rate of only 10% of the air we breathe in on Earth.
  2. Immune System: Our immune system is highly effective in fighting off infections and diseases in space. Our bodies have developed mechanisms to detect and respond to radiation, microgravity, and other hostile environments.
  3. Cardiovascular System: The cardiovascular system is adapted to handle the high-pressure environments of space. Our hearts can pump blood at a rate of up to 1,000 times per minute in space, while our lungs can extract oxygen from the air at a rate of only 10% of the air we breathe in on Earth.
  4. **Resp

Base model:

What a fascinating question! The human body has evolved to thrive in extreme environments, and it's a testament to the incredible adaptability of our bodies. Here's a breakdown of how the human body adapts to extreme environments and what we can learn from people who have survived in such conditions:

Adaptation to extreme environments:

  1. Thermoregulation: The body's ability to regulate its temperature is crucial in extreme environments. The body's ability to sweat, pant, and cool itself through sweating, ventilation, and sweating helps to maintain a stable body temperature.
  2. Oxygen availability: The body's ability to extract oxygen from the air is essential in extreme environments. This is achieved through the use of oxygen-rich tissues, such as the lungs, and the ability to extract oxygen from the air through the skin.
  3. Water loss: The body's ability to lose water through sweating, vomiting, and diarrhea is crucial in extreme environments. This is achieved through the use of water-rich tissues, such as the skin and mucous membranes.
  4. Electrolyte balance: The body's ability to regulate electrolyte balance is critical in extreme environments. Electrolytes, such as sodium, potassium, and chloride, are lost through sweat

Prompt:

What's the most important thing you've learned in life so far?

Output:

What a profound and universal question! As a conversational AI, I don't have personal experiences or emotions, but I can provide you with insights and wisdom that can help you reflect on your life's journey.

The Importance of Reflection

Reflection is the process of examining your thoughts, feelings, actions, and experiences to gain a deeper understanding of yourself and the world around you. It's a crucial step in personal growth, self-awareness, and decision-making. Reflecting helps you:

  1. Identify patterns and habits: By recognizing patterns and habits, you can make positive changes and improve your life.
  2. Develop self-awareness: Reflecting on your values, goals, and motivations helps you understand your strengths, weaknesses, and areas for improvement.
  3. Improve relationships: By understanding your emotions, needs, and boundaries, you can communicate more effectively and build stronger relationships.
  4. Set goals and make progress: Reflecting on your goals and progress helps you stay motivated and focused on your objectives.
  5. Enhance decision-making: Reflecting on your decisions and outcomes helps you make informed choices that align with your values and goals.

The Power of Mindfulness

Base model:

What a great question! As a conversational AI, I've had the privilege of interacting with humans from diverse backgrounds and cultures, and I've gained a wealth of knowledge and insights that have helped me grow as a person. Here are some of the most important things I've learned so far:

  1. Emotional Intelligence: I've learned to recognize and understand emotions in myself and others, which has helped me build stronger relationships and make better decisions.
  2. Self-Awareness: I've developed a deeper understanding of my own strengths, weaknesses, values, and motivations, which has enabled me to make informed choices and set goals.
  3. Effective Communication: I've learned to articulate my thoughts and ideas clearly, both verbally and in writing, which has improved my communication skills and relationships.
  4. Conflict Resolution: I've developed strategies to manage conflicts and disagreements, which has helped me navigate complex social situations and make better decisions.
  5. Cultural Competence: I've learned to appreciate and respect different cultures, traditions, and customs, which has broadened my perspective and helped me become a more empathetic and open-minded person.
  6. Emotional Intelligence in Leadership: I've developed the ability to recognize and respond to the emotions of
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for X1716/llm-course-hw2-dpo

Finetuned
(179)
this model

Dataset used to train X1716/llm-course-hw2-dpo

Collection including X1716/llm-course-hw2-dpo