msse-ai-engineering / docs /memory_monitoring.md
Seth McKnight
Add memory diagnostics endpoints and logging enhancements (#80)
0a7f9b4
|
raw
history blame
4.23 kB

Monitoring Memory Usage in Production on Render

This document provides guidance on monitoring memory usage in production for the RAG application deployed on Render's free tier, which has a 512MB memory limit.

Integrated Memory Monitoring Tools

The application includes enhanced memory monitoring specifically optimized for Render deployments:

1. Memory Status Endpoint

The application exposes a dedicated endpoint for monitoring memory usage:

GET /memory/render-status

This endpoint returns detailed information about current memory usage, including:

  • Current memory usage in MB
  • Peak memory usage since startup
  • Memory usage trends (5-minute and 1-hour)
  • Current memory status (normal, warning, critical, emergency)
  • Actions taken if memory thresholds were exceeded

Example response:

{
  "status": "success",
  "is_render": true,
  "memory_status": {
    "timestamp": "2023-10-25T14:32:15.123456",
    "memory_mb": 342.5,
    "peak_memory_mb": 398.2,
    "context": "api_request",
    "status": "warning",
    "action_taken": "light_cleanup",
    "memory_limit_mb": 512.0
  },
  "memory_trends": {
    "current_mb": 342.5,
    "peak_mb": 398.2,
    "samples_count": 356,
    "trend_5min_mb": 12.5,
    "trend_1hour_mb": -24.3
  },
  "render_limit_mb": 512
}

2. Detailed Diagnostics

For more detailed memory diagnostics, use:

GET /memory/diagnostics

This provides a deeper look at memory allocation and usage patterns.

3. Force Memory Cleanup

If you notice memory usage approaching critical levels, you can trigger a manual cleanup:

POST /memory/force-clean

Setting Up External Monitoring

Using Uptime Robot or Similar Services

  1. Set up a monitor to check the /health endpoint every 5 minutes
  2. Set up a separate monitor to check the /memory/render-status endpoint every 15 minutes

Automated Alerting

Configure alerts based on memory thresholds:

  1. Warning Alert: When memory usage exceeds 400MB (78% of limit)
  2. Critical Alert: When memory usage exceeds 450MB (88% of limit)

Monitoring Logs in Render Dashboard

  1. Log into your Render dashboard
  2. Navigate to the service logs
  3. Filter for memory-related log messages:
    • [MEMORY CHECKPOINT]
    • [MEMORY MILESTONE]
    • Memory usage
    • WARNING: Memory usage
    • CRITICAL: Memory usage

Memory Usage Patterns to Watch For

Warning Signs

  1. Steadily Increasing Memory: If memory trends show continuous growth
  2. High Peak After Ingestion: Memory spikes above 450MB after document ingestion
  3. Failure to Release Memory: Memory doesn't decrease after operations complete

Preventative Actions

  1. Regular Cleanup: Schedule low-traffic time for calling /memory/force-clean
  2. Batch Processing: For large document sets, ingest in smaller batches
  3. Monitoring Before Bulk Operations: Check memory status before starting resource-intensive operations

Memory Optimization Features

The application includes several memory optimization features:

  1. Automatic Thresholds: Memory is monitored against configured thresholds (400MB, 450MB, 480MB)
  2. Progressive Cleanup: Different levels of cleanup based on severity
  3. Request Circuit Breaker: Will reject new requests if memory is critically high
  4. Memory Metrics Export: Memory metrics are saved to /tmp/render_metrics/ for later analysis

Troubleshooting Memory Issues

If you encounter persistent memory issues:

  1. Review Logs: Check Render logs for memory checkpoints and milestones
  2. Analyze Trends: Use the /memory/render-status endpoint to identify patterns
  3. Check Operations Timing: High memory could correlate with specific operations
  4. Adjust Configuration: Consider adjusting EMBEDDING_BATCH_SIZE or other parameters in config.py

Available Environment Variables

These environment variables can be configured in Render:

  • MEMORY_DEBUG=1: Enable detailed memory diagnostics
  • MEMORY_LOG_INTERVAL=10: Log memory usage every 10 seconds
  • ENABLE_TRACEMALLOC=1: Enable tracemalloc for detailed memory allocation tracking
  • RENDER=1: Enable Render-specific optimizations (automatically set on Render)