--- library_name: transformers tags: - multimodal - gui license: apache-2.0 datasets: - chakra-labs/pango - chakra-labs/pango-sample language: - en base_model: - ByteDance-Seed/UI-TARS-7B-SFT pipeline_tag: image-text-to-text --- # GLADOS-1 — UI-TARS-7B-SFT ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66f5e39d1931d2d4817ab43c/ZGtCgRLXBQQbNMPJVLERW.png) ### Model Description GLADOS-1 is the first computer-use (CUA) model post-trained using **collective, crowd-sourced trajectories**. Leveraging the enourmous [PANGO dataset](https://huggingface.co/datasets/chakra-labs/pango-sample) (with primarily Chrome based interactions), it's purpose is to provide a lense as to what's possible with enormous trajectory sizes in computer use. It also represents the first open-sourced post-training pipeline for [UI-TARS](https://arxiv.org/pdf/2501.12326), inspired by the existing [Qwen2VL finetuning series](https://github.com/2U1/Qwen2-VL-Finetune). This model is designed to: - **Be compliant**. It has been taught to rigorouly follow directions and output action formats compatible with downstream parsers like PyAutoGUI. - **Understand web productivity applications**. The Pango dataset primarily contains productivity application usage in browser. Consequently in OSWorld results, we observe significantly improved performance on the Chrome task bench. - **Have strong intuition on visual grounding**. Our experiments are detailed more closely here in our [research blog](TBD).

📕 Release Blog | 🤗 Code | 🔧 Deployment (via UI-TARS) | 🖥️ Running on your own computer (via UI-TARS Desktop)

## Citation ```tex @misc{chakralabs2025glados-1, author = {Chakra Labs}, title = {GLADOS-1}, url = {https://github.com/Chakra-Network/GLADOS-1}, year = {2025} } ```