Commit 
							
							·
						
						dc29cf2
	
1
								Parent(s):
							
							a3af57e
								
update readme
Browse files
    	
        README.md
    CHANGED
    
    | @@ -15,7 +15,7 @@ library_name: transformers | |
| 15 | 
             
            [UI-TARS-7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)  | 
         | 
| 16 | 
             
            [**UI-TARS-7B-DPO**](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)(Recommended)  | 
         | 
| 17 | 
             
            [UI-TARS-72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)  | 
         | 
| 18 | 
            -
            [UI-TARS-72B-DPO](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)
         | 
| 19 | 
             
            ## Introduction
         | 
| 20 |  | 
| 21 | 
             
            UI-TARS is a next-generation native GUI agent model designed to interact seamlessly with graphical user interfaces (GUIs) using human-like perception, reasoning, and action capabilities. Unlike traditional modular frameworks, UI-TARS integrates all key components—perception, reasoning, grounding, and memory—within a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules.
         | 
|  | |
| 15 | 
             
            [UI-TARS-7B-SFT](https://huggingface.co/bytedance-research/UI-TARS-7B-SFT)  | 
         | 
| 16 | 
             
            [**UI-TARS-7B-DPO**](https://huggingface.co/bytedance-research/UI-TARS-7B-DPO)(Recommended)  | 
         | 
| 17 | 
             
            [UI-TARS-72B-SFT](https://huggingface.co/bytedance-research/UI-TARS-72B-SFT)  | 
         | 
| 18 | 
            +
            [**UI-TARS-72B-DPO**](https://huggingface.co/bytedance-research/UI-TARS-72B-DPO)(Recommended)
         | 
| 19 | 
             
            ## Introduction
         | 
| 20 |  | 
| 21 | 
             
            UI-TARS is a next-generation native GUI agent model designed to interact seamlessly with graphical user interfaces (GUIs) using human-like perception, reasoning, and action capabilities. Unlike traditional modular frameworks, UI-TARS integrates all key components—perception, reasoning, grounding, and memory—within a single vision-language model (VLM), enabling end-to-end task automation without predefined workflows or manual rules.
         | 
