| --- |
| library_name: transformers |
| tags: |
| - gpt |
| - byte-tokenization |
| - mobile |
| - embedded |
| - onnx |
| license: cc-by-nc-4.0 |
| datasets: |
| - custom |
| - web |
| language: en |
| widget: |
| - text: "In order to make pancakes, you need to" |
| - text: "Once upon a time" |
| --- |
| |
| <p align="center"> |
| <img src="logo.png" alt="IJK Technology" width="150"> |
| </p> |
|
|
| <h1 align="center">IJK Technology β ByteGPT-small</h1> |
|
|
|
|
| **ByteGPT-small** is a small GPT-style language model trained using byte tokenization inspired by the ByT5 paper. It is designed for use on compute- and memory-constrained devices, such as mobile phones and embedded systems. |
|
|
| ## π Overview |
| - **Model Type:** GPT-style causal language model |
| - **Tokenizer:** Byte-level tokenization (from ByT5) |
| - **Intended Use:** Edge devices, mobile phones, embedded systems |
| - **Size:** Small (initial prototype) |
| - **Training:** Custom-trained from scratch |
|
|
| ## π§ Why Byte Tokenization? |
| Byte tokenization offers several advantages for small-scale, efficient models: |
|
|
| 1. **Reduced Memory Footprint:** |
| Byte-level tokenization drastically reduces the size of the embedding layer, making the model suitable for devices with limited RAM. |
|
|
| 2. **No External Dependencies:** |
| Unlike subword tokenizers (e.g., SentencePiece, BPE), byte tokenization requires no external libraries for tokenization. A simple Python script can handle tokenization. |
|
|
| 3. **Robustness to Noise:** |
| Byte-level models are more robust to misspellings, typos, and out-of-vocabulary tokens. |
|
|
| ## π‘ Future Plans |
| This is the **first** in a series of models. While this model is not yet highly useful due to its small size, it represents the foundation for future versions. Upcoming releases will include: |
|
|
| - **Larger Models:** Scaled-up versions with better performance |
| - **Distilled Models:** Using GPRO distillation to create highly efficient small models |
| - **Benchmark Results:** Comparative performance on mobile devices |
|
|
| ## π» Usage |
|
|
| ### **Quick Start (with `transformers`):** |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-small", trust_remote_code=True) |
| tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-small") |
| |
| input_text = "What is the capital of France?" |
| inputs = tokenizer(input_text, return_tensors="pt") |
| outputs = model.generate(**inputs, max_new_tokens=100) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Tokenizer |
|
|
| The tokenizer is byte-level, compatible with AutoTokenizer from Hugging Face: |
|
|
| ```python |
| tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-small") |
| ``` |
|
|
| ### ONNX |
|
|
| The model is also available in ONNX format, and can be used with the ONNX Runtime: |
|
|
| ```python |
| import onnxruntime as ort |
| import numpy as np |
| |
| # Create ONNX Runtime session |
| ort_session = ort.InferenceSession("model.onnx") |
| |
| # Helper function to generate text using the ONNX model |
| def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0): |
| input_ids = prompt_ids.clone() |
| |
| for _ in range(max_new_tokens): |
| # Get the last block_size tokens if input is too long |
| if input_ids.shape[1] > model.block_size: |
| input_ids = input_ids[:, -model.block_size:] |
| |
| # Run inference |
| ort_inputs = { |
| 'input': input_ids.cpu().numpy() |
| } |
| logits = ort_session.run(None, ort_inputs)[0] |
| |
| # Get predictions for the next token |
| logits = torch.from_numpy(logits) |
| logits = logits[:, -1, :] # Only take the last token's predictions |
| |
| # Apply temperature |
| if temperature != 1.0: |
| logits = logits / temperature |
| |
| # Sample from the distribution |
| probs = torch.nn.functional.softmax(logits, dim=-1) |
| next_token = torch.multinomial(probs, num_samples=1) |
| |
| # Append the new token |
| input_ids = torch.cat([input_ids, next_token], dim=1) |
| |
| return input_ids |
| |
| # Test the generation |
| prompt = "Hello" |
| prompt_ids = tok(prompt, return_tensors="pt")["input_ids"] |
| generated_ids = generate_with_onnx(prompt_ids) |
| generated_text = tok.decode(generated_ids[0], skip_special_tokens=True) |
| print(f"Generated text: {generated_text}") |
| #Generated text: Hello everyone! |
| #A dinner is only available for St. Loui |
| ``` |
|
|
| ### Android Usage |
|
|
| We've just released an Android SDK. You can find the SDK on our [GitHub](https://github.com/ijktech/ByteGPT-Android). |
|
|
| The SDK can be included in your Android project by adding the following to your `build.gradle` file: |
|
|
| ``` |
| repositories { |
| maven { |
| url = uri("https://raw.githubusercontent.com/ijktech/ByteGPT-Android/maven-repo") |
| } |
| } |
| |
| dependencies { |
| implementation("com.github.ijktech:ByteGPT-Android:1.0.9") |
| } |
| ``` |
|
|
|
|
| ### iOS Usage |
|
|
| Coming Soon! |
|
|
|
|
| ## π License |
| π **CC-BY-NC-4.0**: Free for non-commercial use. |
|
|
| πΌ **Commercial Use**: Contact IJK Technology Ltd for licensing at [james@ijktech.com](mailto:james@ijktech.com). |
|
|
| ## π οΈ About IJK Technology Ltd |
| IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms. |