ZHANGYUXUAN-zR commited on
Commit
1175a06
·
verified ·
1 Parent(s): 942bcce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - zh
5
+ base_model:
6
+ - zai-org/GLM-4.1V-9B-Base
7
+ pipeline_tag: image-text-to-text
8
+ tags:
9
+ - agent
10
+ ---
11
+
12
+ # AutoGLM-Phone-9B
13
+
14
+ <div align="center">
15
+ <img src="https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/resources/logo.svg" width="20%"/>
16
+ </div>
17
+
18
+ <p align="center">
19
+ 👋 Join our <a href="https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/resources/WECHAT.md" target="_blank">WeChat</a> community
20
+ </p>
21
+
22
+ > ⚠️ This project is intended **for research and educational purposes only**.
23
+ > Any use for illegal data access, system interference, or unlawful activities is strictly prohibited.
24
+ > Please review our [Terms of Use](https://raw.githubusercontent.com/zai-org/Open-AutoGLM/refs/heads/main/resources/privacy_policy.txt) carefully.
25
+
26
+ ## Project Overview
27
+
28
+ **Phone Agent** is a mobile intelligent assistant framework built on **AutoGLM**, capable of understanding smartphone screens through multimodal perception and executing automated operations to complete tasks.
29
+ The system controls devices via **ADB (Android Debug Bridge)**, uses a **vision-language model** for screen understanding, and leverages **intelligent planning** to generate and execute action sequences.
30
+
31
+ Users can simply describe tasks in natural language—for example, *“Open Xiaohongshu and search for food recommendations.”*
32
+ Phone Agent will automatically parse the intent, understand the current UI, plan the next steps, and carry out the entire workflow.
33
+
34
+ The system also includes:
35
+ - **Sensitive action confirmation mechanisms**
36
+ - **Human-in-the-loop fallback** for login or verification code scenarios
37
+ - **Remote ADB debugging**, allowing device connection via WiFi or network for flexible remote control and development
38
+
39
+ ## Model Usage
40
+
41
+ We provide an open-source model usage guide to help you quickly download and deploy the model.
42
+ Please visit our **[GitHub](https://github.com/zai-org/Open-AutoGLM)** for detailed instructions.
43
+
44
+ - The model architecture is identical to **`GLM-4.1V-9B-Thinking`**.
45
+ For deployment details, see the **[GLM-V](https://github.com/zai-org/GLM-V)** repository.
46
+
47
+ ### Citation
48
+
49
+ If you find our work helpful, please cite the following paper:
50
+ ```bibtex
51
+ @article{liu2024autoglm,
52
+ title={Autoglm: Autonomous foundation agents for guis},
53
+ author={Liu, Xiao and Qin, Bo and Liang, Dongzhu and Dong, Guang and Lai, Hanyu and Zhang, Hanchen and Zhao, Hanlin and Iong, Iat Long and Sun, Jiadai and Wang, Jiaqi and others},
54
+ journal={arXiv preprint arXiv:2411.00820},
55
+ year={2024}
56
+ }
57
+ ```