update context length
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
pipeline_tag: text-generation
|
| 3 |
-
base_model: ibm-granite/granite-8b-code-base
|
| 4 |
inference: false
|
| 5 |
license: apache-2.0
|
| 6 |
datasets:
|
|
@@ -19,7 +19,7 @@ tags:
|
|
| 19 |
- code
|
| 20 |
- granite
|
| 21 |
model-index:
|
| 22 |
-
- name: granite-8b-code-instruct
|
| 23 |
results:
|
| 24 |
- task:
|
| 25 |
type: text-generation
|
|
@@ -205,10 +205,10 @@ model-index:
|
|
| 205 |
|
| 206 |

|
| 207 |
|
| 208 |
-
# Granite-8B-Code-Instruct
|
| 209 |
|
| 210 |
## Model Summary
|
| 211 |
-
**Granite-8B-Code-Instruct** is a 8B parameter model fine tuned from *Granite-8B-Code-Base* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
|
| 212 |
|
| 213 |
- **Developers:** IBM Research
|
| 214 |
- **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
|
|
@@ -223,13 +223,13 @@ The model is designed to respond to coding related instructions and can be used
|
|
| 223 |
<!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->
|
| 224 |
|
| 225 |
### Generation
|
| 226 |
-
This is a simple example of how to use **Granite-8B-Code-Instruct** model.
|
| 227 |
|
| 228 |
```python
|
| 229 |
import torch
|
| 230 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 231 |
device = "cuda" # or "cpu"
|
| 232 |
-
model_path = "ibm-granite/granite-8b-code-instruct"
|
| 233 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 234 |
# drop device_map if running on CPU
|
| 235 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
|
|
@@ -265,4 +265,4 @@ Granite Code Instruct models are trained on the following types of data.
|
|
| 265 |
We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
| 266 |
|
| 267 |
## Ethical Considerations and Limitations
|
| 268 |
-
Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-8B-Code-Base](https://huggingface.co/ibm-granite/granite-8b-code-base)* model card.
|
|
|
|
| 1 |
---
|
| 2 |
pipeline_tag: text-generation
|
| 3 |
+
base_model: ibm-granite/granite-8b-code-base-4k
|
| 4 |
inference: false
|
| 5 |
license: apache-2.0
|
| 6 |
datasets:
|
|
|
|
| 19 |
- code
|
| 20 |
- granite
|
| 21 |
model-index:
|
| 22 |
+
- name: granite-8b-code-instruct-4k
|
| 23 |
results:
|
| 24 |
- task:
|
| 25 |
type: text-generation
|
|
|
|
| 205 |
|
| 206 |

|
| 207 |
|
| 208 |
+
# Granite-8B-Code-Instruct-4K
|
| 209 |
|
| 210 |
## Model Summary
|
| 211 |
+
**Granite-8B-Code-Instruct-4K** is a 8B parameter model fine tuned from *Granite-8B-Code-Base-4K* on a combination of **permissively licensed** instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills.
|
| 212 |
|
| 213 |
- **Developers:** IBM Research
|
| 214 |
- **GitHub Repository:** [ibm-granite/granite-code-models](https://github.com/ibm-granite/granite-code-models)
|
|
|
|
| 223 |
<!-- TO DO: Check starcoder2 instruct code example that includes the template https://huggingface.co/bigcode/starcoder2-15b-instruct-v0.1 -->
|
| 224 |
|
| 225 |
### Generation
|
| 226 |
+
This is a simple example of how to use **Granite-8B-Code-Instruct-4K** model.
|
| 227 |
|
| 228 |
```python
|
| 229 |
import torch
|
| 230 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 231 |
device = "cuda" # or "cpu"
|
| 232 |
+
model_path = "ibm-granite/granite-8b-code-instruct-4k"
|
| 233 |
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
| 234 |
# drop device_map if running on CPU
|
| 235 |
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
|
|
|
|
| 265 |
We train the Granite Code models using two of IBM's super computing clusters, namely Vela and Blue Vela, both outfitted with NVIDIA A100 and H100 GPUs respectively. These clusters provide a scalable and efficient infrastructure for training our models over thousands of GPUs.
|
| 266 |
|
| 267 |
## Ethical Considerations and Limitations
|
| 268 |
+
Granite code instruct models are primarily finetuned using instruction-response pairs across a specific set of programming languages. Thus, their performance may be limited with out-of-domain programming languages. In this situation, it is beneficial providing few-shot examples to steer the model's output. Moreover, developers should perform safety testing and target-specific tuning before deploying these models on critical applications. The model also inherits ethical considerations and limitations from its base model. For more information, please refer to *[Granite-8B-Code-Base-4K](https://huggingface.co/ibm-granite/granite-8b-code-base-4k)* model card.
|