prithivMLmods commited on
Commit
b500102
·
verified ·
1 Parent(s): ccc4c01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -3
README.md CHANGED
@@ -1,3 +1,122 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-14B-Instruct
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - text-generation-inference
11
+ - coder
12
+ - Math
13
+ - RL
14
+ ---
15
+
16
+ ![8.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/HXno2Of6OWRexX88f2wtL.png)
17
+
18
+ # **Eratosthenes-Polymath-14B-Instruct**
19
+
20
+ > **Eratosthenes-Polymath-14B-Instruct** is built on the Qwen 2.5 14B modality architecture, engineered to excel in mathematical reasoning, distributed reinforcement learning (RL), and general-purpose problem solving. This model is fine-tuned with chain-of-thought reasoning datasets, optimization-focused corpora, and advanced structured reasoning datasets to maximize its capabilities in logical deduction, multi-step reasoning, and intelligent decision-making.
21
+
22
+ ## **Key Improvements**
23
+ 1. **Advanced Mathematical Reasoning**:
24
+ Excels in solving complex equations, performing symbolic computation, theorem proving, and step-by-step mathematical problem-solving.
25
+
26
+ 2. **Distributed Reinforcement Learning Expertise**:
27
+ Specially fine-tuned for robust policy optimization using distributed RL techniques, providing resilience and optimality across dynamic problem spaces.
28
+
29
+ 3. **General-Purpose Reasoning and Problem Solving**:
30
+ Strong across a broad range of domains, handling factual questions, logical analysis, and multi-step cognitive tasks.
31
+
32
+ 4. **Long-Context Mastery**:
33
+ Supports up to 128K tokens for context and can generate up to 8K tokens, enabling detailed, coherent long-form outputs and complex derivations.
34
+
35
+ 5. **Superior Instruction Following**:
36
+ Capable of following complex and structured prompts precisely, maintaining focus and clarity over extended dialogues.
37
+
38
+ 6. **Coding and Algorithmic Fluency**:
39
+ Highly effective in code generation, debugging, algorithm design, and optimization problem modeling across various programming languages.
40
+
41
+ ## **Quickstart with transformers**
42
+
43
+ Use the model easily with the `transformers` library and `apply_chat_template`:
44
+
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+
48
+ model_name = "prithivMLmods/Eratosthenes-Polymath-14B-Instruct"
49
+
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ model_name,
52
+ torch_dtype="auto",
53
+ device_map="auto"
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
56
+
57
+ prompt = "Explain the connection between distributed reinforcement learning and robust policy optimization."
58
+ messages = [
59
+ {"role": "system", "content": "You are an expert assistant specializing in mathematics, optimization, and reinforcement learning."},
60
+ {"role": "user", "content": prompt}
61
+ ]
62
+ text = tokenizer.apply_chat_template(
63
+ messages,
64
+ tokenize=False,
65
+ add_generation_prompt=True
66
+ )
67
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
68
+
69
+ generated_ids = model.generate(
70
+ **model_inputs,
71
+ max_new_tokens=512
72
+ )
73
+ generated_ids = [
74
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
75
+ ]
76
+
77
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
78
+ ```
79
+
80
+ ## **Intended Use**
81
+ 1. **Mathematical and Optimization Problem Solving**:
82
+ Designed for solving complex mathematical problems, optimization modeling, symbolic logic, and structured derivations.
83
+
84
+ 2. **Distributed Reinforcement Learning Research**:
85
+ Supports designing, analyzing, and explaining distributed RL systems, robust policy optimization, and autonomous decision systems.
86
+
87
+ 3. **General Knowledge and Reasoning**:
88
+ Effective in answering a wide range of questions and performing structured reasoning across scientific, technical, and educational domains.
89
+
90
+ 4. **Educational and Research Support**:
91
+ Ideal for students, researchers, and professionals seeking detailed explanations, derivations, and robust scientific insights.
92
+
93
+ 5. **Code Writing and Algorithm Design**:
94
+ Excels at creating, optimizing, and explaining algorithms, particularly those relevant to mathematical computation and optimization.
95
+
96
+ 6. **Intelligent Conversational Systems**:
97
+ Perfect for technical conversational agents and educational bots requiring deep understanding and detailed reasoning capabilities.
98
+
99
+ 7. **Long-Form Technical Content Generation**:
100
+ Capable of producing structured, coherent articles, tutorials, and research papers, especially in technical and mathematical fields.
101
+
102
+ 8. **Structured Data Generation**:
103
+ Supports outputting structured formats such as proofs, equations, tables, and JSON useful for scientific and technical workflows.
104
+
105
+ ## **Limitations**
106
+ 1. **Heavy Hardware Requirements**:
107
+ Due to its large parameter count and long-context handling, it requires powerful GPUs or TPUs with significant memory.
108
+
109
+ 2. **Potential for Training Biases**:
110
+ Outputs may still reflect biases from the mathematical, technical, or optimization-specific datasets used during training.
111
+
112
+ 3. **Less Effective in Creative Tasks**:
113
+ Focused more on technical and logical reasoning than on freeform creative writing or storytelling.
114
+
115
+ 4. **No Real-Time Event Awareness**:
116
+ Limited to knowledge prior to its training cutoff, without access to live or real-world updates.
117
+
118
+ 5. **Prompt Sensitivity**:
119
+ Performance may vary based on the clarity, structure, and specificity of the prompt, particularly for complex multi-step tasks.
120
+
121
+ 6. **Error Propagation Risk**:
122
+ Small inaccuracies in early stages of long-form outputs could propagate, affecting the overall answer coherence.