---
title: Bizom_Voice_Assistant
app_file: app.py
sdk: gradio
sdk_version: 5.50.0
---
# FastRTC Audio Streaming with Transcription & TTS

A real-time audio streaming application built with FastRTC that provides:
- **Speech-to-Text (STT)**: Transcribes incoming audio in real-time
- **Text-to-Speech (TTS)**: Converts transcribed text back to audio
- **API Streaming Support**: Connect from external clients (Android/KMM apps) via WebRTC
- **Bidirectional Communication**: Send and receive audio with transcription feedback

## Features

- 🎤 **Real-time Audio Streaming**: Low-latency audio streaming using WebRTC
- 📝 **Automatic Transcription**: Speech-to-text using Moonshine STT model
- 🔊 **Voice Response**: Text-to-speech using Kokoro TTS model
- 📡 **API Support**: Connect from external applications via WebRTC API
- 🌐 **Network Access**: Accepts connections from network (not just localhost)
- ⏸️ **Pause Detection**: Uses `ReplyOnPause` handler to process complete utterances

## Prerequisites

- Python 3.8 or higher
- pip (Python package manager)
- Hugging Face account with API token (for Cloudflare TURN credentials in production)
- Google Gemini API key (optional, for AI responses)

## Installation

1. **Clone or navigate to the project directory:**
   ```bash
   cd fastrtc
   ```

2. **Create a virtual environment (recommended):**
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Set up environment variables:**
   
   Create a `.env` file in the project root:
   ```bash
   touch .env
   ```
   
   Add your API keys to `.env`:
   ```env
   HF_TOKEN=hf_...  # Required for Cloudflare TURN credentials (production deployment)
   GEMINI_API_KEY=...  # Optional: for AI responses via Google Gemini
   ```
   
   **Important:** Never commit your `.env` file to git! It should be in `.gitignore`.

4. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```

   This will install:
   - `fastrtc[vad, stt, tts]` - FastRTC with VAD, STT, and TTS support
   - `google-genai` - Google Generative AI client
   - `python-dotenv` - Environment variable management
   - All required dependencies (numpy, gradio, etc.)

## Usage

### Running the Server

1. **Activate your virtual environment** (if not already activated):
   ```bash
   source venv/bin/activate
   ```

2. **Run the application:**
   ```bash
   python app.py
   ```

3. **Access the web interface:**
   - Open your browser and navigate to: `http://localhost:7860`
   - The Gradio interface will be available for testing

### Server Configuration

The server is configured to:
- Listen on `0.0.0.0:7860` (accepts connections from network)
- Use `ReplyOnPause` handler (processes audio when user pauses speaking)
- Support bidirectional audio (`send-receive` mode)

To modify settings, edit `app.py`:

```python
stream.ui.launch(
    server_name="0.0.0.0",  # Change to "127.0.0.1" for localhost only
    server_port=7860,       # Change port if needed
    share=False             # Set to True for public Gradio URL
)
```

## API Streaming for External Clients

This application supports connecting from external clients (e.g., Android/KMM apps) via the FastRTC WebRTC API.

### Connection Endpoint

- **URL**: `http://YOUR_SERVER_IP:7860/webrtc/offer`
- **Method**: POST
- **Content-Type**: application/json

### Request Format

```json
{
    "sdp": "<webrtc_offer_sdp>",
    "type": "offer"
}
```

### Response Format

```json
{
    "sdp": "<webrtc_answer_sdp>",
    "type": "answer",
    "webrtc_id": "<unique_connection_id>"
}
```

### Message Types

The server sends messages via Data Channel with the following format:

```json
{
    "type": "fetch_output" | "log" | "error" | "warning",
    "data": "<message_content>"
}
```

#### Transcription Messages

When audio is transcribed, clients receive:

```json
{
    "type": "fetch_output",
    "data": "<transcribed_text>"
}
```

#### Log Messages

The server sends log messages for debugging:

```json
{
    "type": "log",
    "data": "pause_detected" | "response_starting" | "started_talking"
}
```

### Connecting from Android/KMM App

1. **Establish WebRTC Connection:**
   - Create a `PeerConnection` with ICE servers
   - Create an audio track from microphone
   - Create a data channel for text messages

2. **Send WebRTC Offer:**
   - Create an offer
   - POST to `/webrtc/offer` endpoint
   - Receive answer and set remote description

3. **Handle Messages:**
   - Listen for `fetch_output` messages on data channel
   - Display transcription text
   - Play received audio (TTS response)

4. **Receive Audio:**
   - Audio track receives TTS audio response
   - Play through device speakers/headphones

For detailed Android/KMM implementation, see the [FastRTC API Documentation](https://fastrtc.org/userguide/api/).

## Architecture

### Components

1. **STT Model** (`moonshine/base`):
   - Converts speech audio to text
   - Processes complete utterances (on pause)

2. **TTS Model** (`kokoro`):
   - Converts transcribed text to speech audio
   - Uses voice: `af_heart`
   - Language: `en-us`

3. **ReplyOnPause Handler**:
   - Buffers audio chunks
   - Detects when user stops speaking
   - Processes complete utterances

4. **Stream Handler**:
   - Receives audio from client
   - Transcribes using STT
   - Sends transcription via `AdditionalOutputs`
   - Generates TTS audio
   - Returns audio to client

### Flow Diagram

```
Client (Android/Web) 
    ↓ [Audio Stream]
WebRTC Connection
    ↓
ReplyOnPause Handler (buffers audio)
    ↓ [On Pause]
Echo Handler
    ↓
STT Model → Transcription
    ↓
AdditionalOutputs → Client (via Data Channel)
    ↓
TTS Model → Audio Response
    ↓ [Audio Stream]
WebRTC Connection
    ↓
Client (plays audio)
```

## Deployment

### Cloudflare TURN Configuration

This application uses Cloudflare TURN servers for improved WebRTC connectivity, especially important for production deployments where clients may be behind NATs or firewalls.

**Required for Production:**
- Set the `HF_TOKEN` environment variable with your Hugging Face API token
- The application will automatically configure Cloudflare TURN credentials for both client and server

**Configuration Details:**
- **Client RTC Configuration**: Uses async `get_cloudflare_turn_credentials_async()` to fetch credentials dynamically
- **Server RTC Configuration**: Uses `get_cloudflare_turn_credentials(ttl=360_000)` with 100-hour TTL
- If `HF_TOKEN` is not set, the app will run without TURN configuration (may have connectivity issues in production)

### Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `HF_TOKEN` | Yes (for production) | Hugging Face API token for Cloudflare TURN credentials |
| `GEMINI_API_KEY` | No | Google Gemini API key for AI-powered responses |

### Deployment Platforms

The application can be deployed to various platforms:

1. **Cloud Platforms** (AWS, GCP, Azure, etc.):
   - Set environment variables in your platform's configuration
   - Ensure port 7860 is accessible
   - The app listens on `0.0.0.0:7860` by default

2. **Docker**:
   ```dockerfile
   FROM python:3.11-slim
   WORKDIR /app
   COPY requirements.txt .
   RUN pip install -r requirements.txt
   COPY . .
   ENV HF_TOKEN=${HF_TOKEN}
   ENV GEMINI_API_KEY=${GEMINI_API_KEY}
   CMD ["python", "app.py"]
   ```

3. **Platform-as-a-Service** (Heroku, Railway, etc.):
   - Set environment variables in your platform dashboard
   - The app will automatically use them via `python-dotenv`

## Configuration

### TTS Options

Modify TTS settings in `app.py`:

```python
tts_options = KokoroTTSOptions(
    voice="af_heart",    # Change voice
    speed=1.0,          # Adjust speed (0.5 - 2.0)
    lang="en-us"        # Change language
)
```

### STT Model

Change STT model in `app.py`:

```python
stt_model = get_stt_model(model="moonshine/base")  # Change model
```

## Error Handling

The application includes error handling for:
- Empty transcriptions (yields silence)
- TTS generation errors (yields silence fallback)
- Connection errors (handled by FastRTC)

## Troubleshooting

### Server Not Accessible from Network

- Ensure `server_name="0.0.0.0"` in `app.py`
- Check firewall settings
- Verify server IP address

### No Transcription Received

- Check that audio is being sent from client
- Verify STT model is loaded correctly
- Check console logs for errors

### TTS Errors

- Ensure text is not empty before calling TTS
- Check TTS model is loaded correctly
- Verify TTS options are valid

## Development

### Project Structure

```
fastrtc/
├── app.py              # Main application file
├── requirements.txt    # Python dependencies
├── README.md          # This file
└── venv/              # Virtual environment (gitignored)
```

### Dependencies

- `fastrtc[stt]` - FastRTC with STT support
- `numpy` - Audio processing
- `gradio` - Web interface

## Resources

- [FastRTC Documentation](https://fastrtc.org/)
- [FastRTC API Guide](https://fastrtc.org/userguide/api/)
- [FastRTC Audio Streaming](https://fastrtc.org/userguide/audio/)

## License

[Add your license here]

## Contributing

[Add contribution guidelines here]