File size: 5,531 Bytes
b939164 486a590 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
# Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
[[π Homepage](https://open-bee.github.io/)] [[π Arxiv Paper(coming soon)](https://www.google.com/search?q=)] [[π€ Models](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995)] [[π€ Datasets(coming soon)](https://www.google.com/search?q=)] [[π» Code(coming soon)](https://github.com/Open-Bee)]
## Introduction
We introduce **Bee-8B**, a new state-of-the-art, fully open 8B Multimodal Large Language Model (MLLM) designed to close the performance gap with proprietary models by focusing on data quality.
Bee-8B is trained on our new **Honey-Data-15M** corpus, a high-quality supervised fine-tuning (SFT) dataset of approximately 15 million samples. This dataset was meticulously created with our transparent, adaptable, and open-source data curation pipeline, **HoneyPipe**, which systematically cleans noisy data and enriches it with a novel dual-level (short and long) Chain-of-Thought (CoT) strategy.
This dataset enables Bee-8B to achieve exceptional performance, particularly in complex reasoning and factual accuracy, establishing a new standard for fully open MLLMs.
## Key Features
- **High-Quality, Large-Scale Dataset:** We release **Honey-Data-15M**, a new 15M-sample SFT corpus. It has undergone extensive cleaning to remove widespread noise and has been enriched with dual-level CoT reasoning to enhance advanced problem-solving capabilities.
- **Fully Open-Source Data Curation Suite:** We provide not just the data, but the entire methodology. **HoneyPipe** and its underlying framework **DataStudio** offer the community a transparent and reproducible pipeline, moving beyond static dataset releases.
- **State-of-the-Art Open Model:** Our model, **Bee-8B**, achieves state-of-the-art performance among fully open MLLMs and is highly competitive with recent semi-open models like InternVL3.5-8B, demonstrating the power of high-quality data.
## News
- **[2025.10.13]** π **Bee-8B is Released\!** Our model is now publicly available. You can download it from [Hugging Face](https://huggingface.co/collections/Open-Bee/bee-8b-68ecbf10417810d90fbd9995).
- **[2025.10.13]** π We are thrilled to release the paper and code for the Open-Bee project\! More resources, including the Honey-Data-15M dataset, are coming soon.
## Quickstart
Below, we provide simple examples to show how to use Bee-8B with π€ Transformers.
### Using π€ Transformers to Chat
```python
import requests
from PIL import Image
from transformers import AutoModel, AutoProcessor
model_path = "Open-Bee/Bee-8B-RL"
# Load model
model = AutoModel.from_pretrained(
model_path,
torch_dtype="float32",
trust_remote_code=True,
).to("cuda")
# Load processor
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
# Define conversation messages
messages = [{
"role":
"user",
"content": [
{
"type": "image",
"image": "https://huggingface.co/Open-Bee/Bee-8B-RL/assets/logo.png",
},
{
"type": "text",
"text": "Based on this picture, write an advertising slogan about Bee-8B (a Fully Open Multimodal Large Language Model)."
},
],
}]
# Apply chat template
text = processor.apply_chat_template(messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True)
# Load image
image_url = "https://huggingface.co/Open-Bee/Bee-8B-RL/assets/logo.png"
image = Image.open(requests.get(image_url, stream=True).raw)
# Process inputs
inputs = processor(images=image, text=text, return_tensors="pt").to("cuda")
# Generate output
generated_ids = model.generate(**inputs, max_new_tokens=16384, temperature=0.6)
output_ids = generated_ids[0][len(inputs.input_ids[0]):]
# Decode output
output_text = processor.decode(output_ids, skip_special_tokens=True)
# Print result
print(output_text)
```
## Experimental Results
<div align="center">
<img src="assets/results.png" alt="logo"/>
</div>
1. **New State-of-the-Art:** Bee-8B establishes a new performance standard for fully open MLLMs, proving highly competitive with recent semi-open models across a wide array of benchmarks.
2. **Excellence in Complex Reasoning:** Thanks to the CoT-enriched Honey-Data-15M, Bee-8B shows its most significant advancements in complex math and reasoning. It achieves top scores on challenging benchmarks like **MathVerse**, **LogicVista**, and **DynaMath**.
3. **Superior Document and Chart Understanding:** The model demonstrates powerful capabilities in analyzing structured visual data, securing the top rank on the **CharXiv** benchmark for both descriptive and reasoning questions.
4. **Robust Factual Accuracy:** Bee-8B excels in general VQA tasks that test factual accuracy and core visual skills, ranking first on benchmarks like **CountBench** and **RealWorldQA**.
## Acknowledgements
Bee-8B is developed based on the architectures and codebases of the following projects: [R-4B](https://huggingface.co/YannQi/R-4B), [LLaVA-OneVision](https://github.com/LLaVA-VL/LLaVA-NeXT), [SigLIP2](https://huggingface.co/google/siglip2-so400m-patch14-384), [Qwen3](https://github.com/QwenLM/Qwen3), and evaluated using [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). We sincerely thank these projects for their outstanding contributions to the open-source community. |