Skip to content

Health Monitoring

Overview

The AI-Pi project implements comprehensive health monitoring across all services to ensure system reliability and quick issue detection.

Architecture

graph TD
    A[Health Monitor] --> B[Client Health]
    A --> C[Server Health]
    A --> D[AI Health]
    A --> E[MongoDB Health]

    B --> F[WebSocket]
    C --> F
    D --> F
    E --> F

    F --> G[Health Dashboard]

Health Check Protocol

Each service implements a standardized health check endpoint that returns:

{
  "status": "string",      // "healthy" or "unhealthy"
  "timestamp": "number",   // Unix timestamp in milliseconds
  "version": "string",     // Service version
  "service": "string",     // Service name
  "dependencies": {        // Status of dependent services
    "service1": "string",  // "connected" or "disconnected"
    "service2": "string"
  }
}

Service Health Endpoints

Service Endpoint Port
Client /health 3000
Server /health 8080
AI /health 5000
MongoDB (Internal) 27017

Docker Integration

Docker Compose configures health checks for each service:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:<port>/health"]
  interval: 30s
  timeout: 10s
  retries: 3

Monitoring Tools

  1. Docker Health

    docker ps  # View container health
    docker inspect  # Detailed health info
    

  2. Health Check Script

    ./scripts/check-health.sh  # Check all services
    

  3. Logging

    docker-compose logs  # View service logs
    

Error Handling

  1. Automatic Recovery
  2. Services attempt to reconnect to dependencies
  3. Docker restarts unhealthy containers

  4. Notifications

  5. Log critical health issues
  6. Alert on repeated failures

Best Practices

  1. Implementation
  2. Keep health checks lightweight
  3. Include relevant metrics
  4. Handle timeouts gracefully

  5. Monitoring

  6. Check health regularly
  7. Log health status changes
  8. Track dependency health

  9. Maintenance

  10. Update health checks with new features
  11. Review health metrics regularly
  12. Test failure scenarios

Version History

  • v1.0: Initial health monitoring
  • v1.1: Added dependency checks
  • v1.2: Enhanced Docker integration
  • v2.0: Updated for simplified battle mechanics