Audio-Text-to-Text
Adapters
Safetensors
phi4mm
custom_code
File size: 970 Bytes
842b9bb
 
 
 
 
 
 
 
5d73f67
 
69c34b3
5d73f67
 
 
513c781
 
5d73f67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c0a20d
80ce39f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
license: apache-2.0
datasets:
- ICTNLP/StreamUni
base_model:
- microsoft/Phi-4-multimodal-instruct
pipeline_tag: audio-text-to-text
library_name: adapter-transformers
---

# The model for the paper '[StreamUni: Achieving Streaming Speech Translation with a Unified Large Speech-Language Model](https://arxiv.org/abs/2507.07803v1)'

## Usage

Please refer to [Github Page](https://github.com/ictnlp/StreamUni)

### Requirements

Phi-4 family has been integrated in the `4.48.2` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
We suggest to run with Python 3.10.
Examples of required packages:
```
flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2
```

## Training Datasets
- https://huggingface.co/datasets/ICTNLP/StreamUni
## Github Pages
- https://github.com/ictnlp/StreamUni