File size: 7,071 Bytes
577b4bf 6c5bf46 577b4bf 9e6710e 6c5bf46 e3d19eb 6c5bf46 e3d19eb 6c5bf46 e3d19eb 6c5bf46 e3d19eb 6c5bf46 e3d19eb 6c5bf46 e3d19eb 6c5bf46 9e6710e 6c5bf46 9e6710e 6c5bf46 9e6710e 6c5bf46 9e6710e 6c5bf46 9e6710e 6c5bf46 9e6710e 6c5bf46 8e23a31 6c5bf46 ee1c870 6c5bf46 b90e548 6c5bf46 b90e548 6c5bf46 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 |
---
library_name: transformers
license: mit
datasets:
- roneneldan/TinyStories
- Salesforce/wikitext
- abhinand/alpaca-gpt4-sharegpt
- shibing624/sharegpt_gpt4
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
- ajibawa-2023/SlimOrca-ShareGPT
- junelee/wizard_vicuna_70k
- meta-math/MetaMathQA
- HuggingFaceH4/MATH-500
- hkust-nlp/dart-math-pool-math
- TIGER-Lab/MathInstruct
- LimYeri/Python_Code_Instructions
language:
- en
pipeline_tag: text-generation
---
# Arsh-llm: A Compact 500M Parameter Powerhouse 🚀
**Arsh-llm** is a 500-million-parameter language model built on the Llama architecture, designed to shine in generating creative stories, coherent text, and functional code. Pretrained for 35 hours on a T4 GPU using a curated mix of small yet powerful datasets, and fine-tuned for 20 hours on conversational data, this model is a lean, mean, text-generating machine with massive potential. With a training loss between **1.2–1.9**, it’s already showing promise and is ready to level up with more training. Buckle up—this is just the beginning! 😎
## Model Overview
- **Architecture**: Llama-based causal language model
- **Parameters**: 500M
- **Context Length**: 128 tokens
- **Pretraining Duration**: \~35 hours on NVIDIA T4 GPU
- **Fine-tuning Duration**: \~20 hours on conversational datasets
- **Training Loss**: 1.2–1.9 (with room to improve!)
- **Library**: Transformers (Hugging Face)
- **License**: MIT
## Datasets
Arsh-llm was trained on a diverse set of datasets to ensure versatility in storytelling, text generation, and code-related tasks:
- **roneneldan/TinyStories**: Short, creative stories for narrative generation.
- **Salesforce/wikitext**: Wikipedia-based text for general knowledge and coherence.
- **abhinand/alpaca-gpt4-sharegpt**: Instruction-based conversational data for task-oriented responses.
- **shibing624/sharegpt_gpt4**: High-quality conversational data for chat-like interactions.
- **ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions**: Math problems with solutions to boost logical reasoning.
Fine-tuning was performed on a structured ShareGPT chat template to enhance conversational abilities, making Arsh-llm a great starting point for dialogue-based applications.
## Use Cases
Arsh-llm is a versatile model with applications in:
- **Creative Writing**: Generate engaging short stories or narrative prompts.
- **Code Generation**: Produce functional code snippets for various programming tasks.
- **Conversational AI**: Power chatbots or assistants with natural dialogue.
- **Educational Tools**: Assist with math problem-solving or explain concepts step-by-step.
> **Note**: This model is a work in progress. For production-grade performance, further pretraining on larger datasets and post-training on conversational data is recommended.
## Getting Started
To use Arsh-llm, you can load it directly from Hugging Face:
```python
import torch
from transformers import pipeline, set_seed
# Set up the text-generation pipeline
model_name = "arshiaafshani/Arsh-llm"
chatbot = pipeline(
"text-generation",
model=model_name,
device=0 if torch.cuda.is_available() else -1
)
# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"
# Set seed for reproducibility (optional)
set_seed(42)
print("Arsh llm is ready! Type 'exit' to end the conversation.")
# Initialize the conversation history
conversation_history = []
conversation_history.append({"role": "system", "content": "You are a helpful assistant."})
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Exited from the chat. Bye!")
break
# Append user message to the conversation history
conversation_history.append({"role": "user", "content": user_input})
# Prepare the messages with the conversation history and an empty assistant turn
messages = conversation_history + [{"role": "assistant", "content": ""}]
# Use the tokenizer's apply_chat_template() method to format the prompt.
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
# Generate text using the formatted prompt.
response = chatbot(
prompt,
do_sample=True,
max_new_tokens=512,
top_k=50,
temperature=0.6,
num_return_sequences=1,
repetition_penalty=1.1,
pad_token_id=chatbot.tokenizer.eos_token_id,
min_new_tokens=20
)
# The returned 'generated_text' includes the prompt plus the generation.
full_text = response[0]["generated_text"]
# Extract the assistant's response by removing the prompt portion.
bot_response = full_text[len(prompt):].strip()
print(f"Bot: {bot_response}")
```
## Training Details
- **Pretraining**: Conducted on a T4 GPU for \~35 hours using a mix of TinyStories, WikiText, and other datasets to build a strong foundation in text and story generation.
- **Fine-tuning**: 20 hours on ShareGPT-based conversational data with a structured chat template to enhance dialogue capabilities.
- **Hardware**: NVIDIA T4 GPU (15GB VRAM).
- **Training Loss**: Achieved 1.2–1.9, indicating solid performance with significant potential for improvement through extended training.
## Limitations
- **Current Stage**: Arsh-llm is not yet fully optimized. It performs well for its size but requires additional training to compete with larger models.
- **Dataset Size**: Pretrained on relatively small datasets, which limits its generalization. Scaling up to larger datasets will unlock its full potential.
- **Context Length**: Limited to 128 tokens, which may constrain performance on longer sequences.
- **Not Production-Ready**: This model is best used as a base for further fine-tuning rather than as a standalone solution.
## Future Plans
The journey doesn’t end here! Arsh-llm is set to evolve with:
- **Extended Pretraining**: Leveraging larger datasets for broader knowledge and better generalization.
- **Conversational Fine-tuning**: Enhancing dialogue capabilities with advanced post-training techniques.
- **Benchmarking**: Evaluating performance against similar models (e.g., TinyLlama, Phi-1.5) on tasks like MMLU, HumanEval, and GSM8K.
- **Community Feedback**: Incorporating user insights to refine and improve the model.
Stay tuned—Arsh-llm is on its way to becoming a legend! 🔥
## License
This model is licensed under the MIT License, allowing for flexible use in both research and commercial applications. Feel free to build upon, modify, or share it!
## Acknowledgments
- Built with ❤️ by Arshia Afshani.
- Powered by the Hugging Face Transformers library.
- Thanks to the open-source community for providing the amazing datasets that made this model possible.
---
**Ready to take Arsh-llm for a spin?** Clone it, train it, and let’s make it a superstar together! 🌟 For questions, feedback, or collabs, reach out via Hugging Face or open an issue in the repo. |