emredeveloper
/

DeepSeek-R1-Distill-Qwen-1.5B-4bit

4-bit precision

Model card Files Files and versions

emredeveloper commited on Jan 23

Commit

0e5a729

·

verified ·

1 Parent(s): 97d04de

Update README.md

Files changed (1) hide show

README.md +88 -1

README.md CHANGED Viewed

@@ -6,4 +6,91 @@ base_model:
 - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 tags:
 - cot
----

 - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 tags:
 - cot
+---
+# Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit
+<!-- Provide a quick summary of what the model is/does. -->
+This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library.
+## Model Details
+### Model Description
+- **Developed by:** [Your Name or Organization]
+- **Funded by [optional]:** [Your Funding Source, if applicable]
+- **Shared by:** [Your Name or Organization]
+- **Model type:** Transformer-based Language Model
+- **Language(s) (NLP):** English
+- **License:** MIT
+- **Finetuned from model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`
+### Model Sources [optional]
+- **Repository:** [Link to your GitHub repository, if applicable]
+- **Paper [optional]:** [Link to the paper, if applicable]
+- **Demo [optional]:** [Link to a live demo, if applicable]
+## Uses
+### Direct Use
+This model is intended for research and practical applications where memory efficiency is critical. It can be used for:
+- Text generation
+- Language understanding tasks
+- Chatbots and conversational AI
+### Downstream Use [optional]
+This model can be fine-tuned for specific tasks such as:
+- Sentiment analysis
+- Text classification
+- Summarization
+### Out-of-Scope Use
+This model is not suitable for:
+- High-precision tasks requiring full 16-bit or 32-bit precision
+- Applications requiring extremely low latency
+## Bias, Risks, and Limitations
+The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.
+### Recommendations
+Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.
+## How to Get Started with the Model
+Use the code below to get started with the model:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+import torch
+# Quantization configuration
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True
+)
+# Load the model and tokenizer
+tokenizer = AutoTokenizer.from_pretrained("your-username/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    "your-username/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
+    quantization_config=quantization_config,
+    device_map="auto",
+    trust_remote_code=True
+)
+# Generate text
+input_text = "Hello, how are you?"
+inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))