Spaces:
Sleeping
Contributing
Thanks for wanting to contribute! This repository uses a strict CI and formatting policy to keep code consistent, with special emphasis on memory-efficient development for cloud deployment.
π§ Memory-Constrained Development Guidelines
This project is optimized for deployment on Render's free tier (512MB RAM limit). All contributions must consider memory usage as a primary constraint.
Memory Development Principles
- Memory-First Design: Consider memory impact of every code change
- Lazy Loading: Initialize services only when needed
- Resource Cleanup: Always clean up resources in finally blocks or context managers
- Memory Testing: Test changes in memory-constrained environments
- Monitoring Integration: Add memory tracking to new services
Memory-Aware Code Guidelines
β DO - Memory Efficient Patterns:
# Use context managers for resource cleanup
from src.utils.memory_utils import MemoryManager
with MemoryManager() as mem:
# Memory-intensive operations
embeddings = process_large_dataset(data)
# Automatic cleanup on exit
# Implement lazy loading for expensive services
@lru_cache(maxsize=1)
def get_expensive_service():
return ExpensiveService() # Only created once
# Use generators for large data processing
def process_documents(documents):
for doc in documents:
yield process_single_document(doc) # Memory efficient iteration
β DON'T - Memory Wasteful Patterns:
# Don't load all data into memory at once
all_embeddings = [embed(doc) for doc in all_documents] # Memory spike
# Don't create multiple instances of expensive services
service1 = ExpensiveMLModel()
service2 = ExpensiveMLModel() # Duplicates memory usage
# Don't keep large objects in global scope
GLOBAL_LARGE_DATA = load_entire_dataset() # Always consumes memory
π οΈ Recommended Local Setup
We recommend using pyenv + venv to create a reproducible development environment. A helper script dev-setup.sh is included to automate the steps:
# Run the helper script (default Python version can be overridden)
./dev-setup.sh 3.11.4
source venv/bin/activate
# Install pre-commit hooks
pip install -r dev-requirements.txt
pre-commit install
Memory-Constrained Testing Environment
Test your changes in a memory-limited environment:
# Limit Python process memory to simulate Render constraints (macOS/Linux)
ulimit -v 524288 # 512MB limit in KB
# Run your development server
flask run
# Test memory usage
curl http://localhost:5000/health | jq '.memory_usage_mb'
π§ͺ Development Workflow
Before Opening a PR
Required Checks:
- Code Quality:
make formatandmake ci-check - Test Suite:
pytest(all 138 tests must pass) - Pre-commit:
pre-commit run --all-files - Memory Testing: Verify memory usage stays within limits
Memory-Specific Testing:
# Test memory usage during development
python -c "
from src.app_factory import create_app
from src.utils.memory_utils import MemoryManager
app = create_app()
with app.app_context():
mem = MemoryManager()
print(f'App startup memory: {mem.get_memory_usage():.1f}MB')
# Should be ~50MB or less
"
# Test first request memory loading
curl -X POST http://localhost:5000/chat -H "Content-Type: application/json" \
-d '{"message": "test"}' && \
curl http://localhost:5000/health | jq '.memory_usage_mb'
# Should be ~200MB or less
Memory Optimization Development Process
- Profile Before Changes: Measure baseline memory usage
- Implement Changes: Follow memory-efficient patterns
- Profile After Changes: Verify memory impact is acceptable
- Load Test: Validate performance under memory constraints
- Document Changes: Update memory-related documentation
New Feature Development Guidelines
When Adding New ML Services:
# Example: Adding a new ML service with memory management
class NewMLService:
def __init__(self):
self._model = None # Lazy loading
@property
def model(self):
if self._model is None:
with MemoryManager() as mem:
logger.info(f"Loading model, current memory: {mem.get_memory_usage():.1f}MB")
self._model = load_expensive_model()
logger.info(f"Model loaded, current memory: {mem.get_memory_usage():.1f}MB")
return self._model
def process(self, data):
# Use the lazily-loaded model
return self.model.predict(data)
Memory Testing for New Features:
# Add to your test file
def test_new_feature_memory_usage():
"""Test that new feature doesn't exceed memory limits"""
import psutil
import os
# Measure before
process = psutil.Process(os.getpid())
memory_before = process.memory_info().rss / 1024 / 1024 # MB
# Execute new feature
result = your_new_feature()
# Measure after
memory_after = process.memory_info().rss / 1024 / 1024 # MB
memory_increase = memory_after - memory_before
# Assert memory increase is reasonable
assert memory_increase < 50, f"Memory increase {memory_increase:.1f}MB exceeds 50MB limit"
assert memory_after < 300, f"Total memory {memory_after:.1f}MB exceeds 300MB limit"
π§ CI Expectations
Automated Checks:
- Code Quality: Pre-commit hooks (black, isort, flake8)
- Test Suite: All 138 tests must pass
- Memory Validation: Memory usage checks during CI
- Performance Regression: Response time validation
- Python Version: Enforces Python >=3.10
Memory-Specific CI Checks:
# CI pipeline includes memory validation
pytest tests/test_memory_constraints.py # Memory usage tests
pytest tests/test_performance.py # Response time validation
pytest tests/test_resource_cleanup.py # Resource leak detection
π Deployment Considerations
Render Platform Constraints
Resource Limits:
- RAM: 512MB total (200MB steady state, 312MB headroom)
- CPU: 0.1 vCPU (I/O bound workload)
- Storage: 1GB (current usage ~100MB)
- Network: Unmetered (external API calls)
Performance Requirements:
- Startup Time: <30 seconds (lazy loading)
- Response Time: <3 seconds for chat requests
- Memory Stability: No memory leaks over 24+ hours
- Concurrent Users: Support 20-30 simultaneous requests
Production Testing
Before Production Deployment:
# Test with production configuration
export FLASK_ENV=production
gunicorn -c gunicorn.conf.py app:app &
# Load test with memory monitoring
artillery run load-test.yml # Simulate concurrent users
curl http://localhost:5000/health | jq '.memory_usage_mb'
# Memory leak detection (run for 1+ hours)
while true; do
curl -s http://localhost:5000/health | jq '.memory_usage_mb'
sleep 300 # Check every 5 minutes
done
π Additional Resources
Memory Optimization References
- Memory Utils Documentation: Comprehensive memory management utilities
- App Factory Pattern: Lazy loading implementation
- Gunicorn Configuration: Production server optimization
- Design Documentation: Memory architecture decisions
Development Tools
# Memory profiling during development
pip install memory-profiler
python -m memory_profiler your_script.py
# Real-time memory monitoring
pip install psutil
python -c "
import psutil
process = psutil.Process()
print(f'Memory: {process.memory_info().rss / 1024 / 1024:.1f}MB')
"
π― Code Review Guidelines
Memory-Focused Code Review
Review Checklist:
- Does the code follow lazy loading patterns?
- Are expensive resources properly cleaned up?
- Is memory usage tested and validated?
- Are there any potential memory leaks?
- Does the change impact startup memory?
- Is caching used appropriately?
Memory Review Questions:
- "What is the memory impact of this change?"
- "Could this cause a memory leak in long-running processes?"
- "Is this resource initialized only when needed?"
- "Are all expensive objects properly cleaned up?"
- "How does this scale with concurrent users?"
Thank you for contributing to memory-efficient, production-ready RAG development! Please open issues or PRs against main and follow these memory-conscious development practices.