# Docker Metrics Collector A lightweight, modular Docker monitoring tool that collects comprehensive metrics from containers, volumes, and the Docker system, then sends them to Graphite. ## 🚀 Features ### Comprehensive Metrics Collection **Container Metrics:** - CPU usage percentage (accurate per-container calculation) - Memory usage (bytes and percentage) - Disk usage per container (filesystem size in bytes) - Network I/O (rx/tx bytes and packets) - Block I/O (read/write bytes) - Container state (running=2, paused=1, stopped=0) - Health status (healthy=2, starting=1, unhealthy=0) - Restart count **Volume Metrics:** - Container count per volume - Volume labels count - Volume usage tracking **System Metrics:** - Total/running/paused/stopped container counts - Total image count and active images - System-wide storage usage (images, containers, volumes) - Docker system df parsing for detailed disk usage **Aggregated Metrics:** - Per-container metric summaries - Volume usage patterns (in-use vs unused) - Container utilization percentage **Self-Metrics:** - Service uptime and iteration count - Collection duration (average and last) - Metrics collected per iteration - Collector success/error counts - Export success/error counts - Memory usage (RSS, VMS) - CPU usage percentage - Thread count ## Quick Start ### Using Docker Compose ```bash # Start Graphite and the metrics collector docker compose up -d # View logs docker logs -f docker-df-collector # Access Grafana open http://localhost:80 ``` The collector will gather metrics every few seconds and send them to Graphite. ## Configuration Configure via environment variables in `compose.yml`: | Variable | Description | Default | | ------------------- | ------------------------------ | ---------------------- | | `GRAPHITE_ENDPOINT` | Graphite plaintext endpoint | `http://graphite:2003` | | `GRAPHITE_PREFIX` | Prefix for all metric names | `docker-metrics` | | `INTERVAL_SECONDS` | Collection interval in seconds | `60` | | `DEBUG` | Enable debug console output | `false` | ## Metrics Reference All metrics follow the pattern: `{prefix}.{category}.{name}.{metric}` ### Container Metrics ``` docker-metrics.containers.{container_name}.cpu_percent docker-metrics.containers.{container_name}.memory_bytes docker-metrics.containers.{container_name}.memory_percent docker-metrics.containers.{container_name}.disk_usage_bytes docker-metrics.containers.{container_name}.state docker-metrics.containers.{container_name}.health docker-metrics.containers.{container_name}.restart_count docker-metrics.containers.{container_name}.network.rx_bytes docker-metrics.containers.{container_name}.network.tx_bytes docker-metrics.containers.{container_name}.blkio.read_bytes docker-metrics.containers.{container_name}.blkio.write_bytes ``` ### System Metrics ``` docker-metrics.system.containers.total docker-metrics.system.containers.running docker-metrics.system.images.total docker-metrics.system.images.total_size_bytes docker-metrics.system.containers.total_size_bytes docker-metrics.system.volumes.total_size_bytes ``` ### Aggregated Metrics ``` docker-metrics.aggregated.volumes.unused_count docker-metrics.aggregated.system.container_utilization_percent ``` ### Self-Metrics (Service Health) ``` docker-metrics.service.uptime_seconds docker-metrics.service.iterations_total docker-metrics.service.metrics_collected_total docker-metrics.service.metrics_collected_last docker-metrics.service.collection_duration_avg_seconds docker-metrics.service.collection_duration_last_seconds docker-metrics.service.collector.{collector_name}.success_total docker-metrics.service.collector.{collector_name}.errors_total docker-metrics.service.exports_success_total docker-metrics.service.exports_errors_total docker-metrics.service.memory_rss_bytes docker-metrics.service.memory_vms_bytes docker-metrics.service.memory_rss_mb docker-metrics.service.cpu_percent docker-metrics.service.threads_count ``` ## 📊 Grafana Queries Powerful queries to visualize your Docker metrics: ### Container Performance - **Top 10 CPU consumers**: `aliasByNode(highestMax(docker-metrics.containers.*.cpu_percent, 10), 2)` - **Top 10 memory users**: `aliasByNode(highestMax(docker-metrics.containers.*.memory_bytes, 10), 2)` - **Average CPU across all containers**: `averageSeries(docker-metrics.containers.*.cpu_percent)` - **Total memory used by all containers**: `sumSeries(docker-metrics.containers.*.memory_bytes)` - **Container health status**: `aliasByNode(docker-metrics.containers.*.health, 2)` (`2` = healthy, `1` = starting, `0` = unhealthy, `-1` = not available) ### Network Monitoring - **Total network traffic (RX + TX)**: `sumSeries(docker-metrics.containers.*.network.{rx,tx}_bytes)` - **Top 5 network receivers**: `aliasByNode(highestMax(docker-metrics.containers.*.network.rx_bytes, 5), 2)` - **Top 5 network transmitters**: `aliasByNode(highestMax(docker-metrics.containers.*.network.tx_bytes, 5), 2)` - **Network packets per second**: `derivative(sumSeries(docker-metrics.containers.*.network.{rx,tx}_packets))` ### Storage & Disk I/O - **Total Docker storage usage**: `sumSeries(docker-metrics.system.{images,containers,volumes}.total_size_bytes)` - **Storage by category**: `aliasByNode(docker-metrics.system.*.total_size_bytes, 2)` - **Top 10 containers by disk usage**: `aliasByNode(highestMax(docker-metrics.containers.*.disk_usage_bytes, 10), 2)` - **Total disk usage across all containers**: `sumSeries(docker-metrics.containers.*.disk_usage_bytes)` - **Container disk usage over time**: `aliasByNode(docker-metrics.containers.*.disk_usage_bytes, 2)` - **Top 5 disk readers**: `aliasByNode(highestMax(docker-metrics.containers.*.blkio.read_bytes, 5), 2)` - **Top 5 disk writers**: `aliasByNode(highestMax(docker-metrics.containers.*.blkio.write_bytes, 5), 2)` - **Total I/O operations rate**: `derivative(sumSeries(docker-metrics.containers.*.blkio.{read,write}_bytes))` ### System Overview - **Container utilization %**: `docker-metrics.aggregated.system.container_utilization_percent` - **Running vs total containers**: `aliasByNode(docker-metrics.system.containers.{running,total}, 3)` - **Container states breakdown**: `aliasByNode(docker-metrics.system.containers.*, 3)` - **Unused volumes**: `docker-metrics.aggregated.volumes.unused_count` - **Volume usage ratio**: `divideSeries(docker-metrics.aggregated.volumes.in_use_count, docker-metrics.aggregated.volumes.total_count)` ### Container Lifecycle - **Containers by state**: `aliasByNode(docker-metrics.containers.*.state, 2)` (`2` = running, `1` = paused, `0` = stopped) - **Restart count trends**: `aliasByNode(docker-metrics.containers.*.restart_count, 2)` - **Containers restarted recently**: `aliasByNode(highestCurrent(docker-metrics.containers.*.restart_count, 5), 2)` ### Self-Monitoring (Service Health) - **Service uptime**: `docker-metrics.service.uptime_seconds` - **Collection performance**: `aliasByNode(docker-metrics.service.collection_duration_{avg,last}_seconds, 3)` - **Metrics collected per iteration**: `docker-metrics.service.metrics_collected_last` - **Total metrics collected**: `docker-metrics.service.metrics_collected_total` - **Service memory usage (MB)**: `docker-metrics.service.memory_rss_mb` - **Service CPU usage**: `docker-metrics.service.cpu_percent` - **Collector success rates**: `aliasByNode(docker-metrics.service.collector.*.success_total, 3)` - **Collector error counts**: `aliasByNode(docker-metrics.service.collector.*.errors_total, 3)` - **Export success vs errors**: `aliasByNode(docker-metrics.service.exports_{success,errors}_total, 2)` - **Service health score**: `divideSeries(docker-metrics.service.exports_success_total, sumSeries(docker-metrics.service.exports_{success,errors}_total))` ### Advanced Queries - **Memory usage % across containers**: `aliasByNode(docker-metrics.containers.*.memory_percent, 2)` - **Containers using most network bandwidth**: `aliasByNode(highestMax(sumSeriesWithWildcards(docker-metrics.containers.*.network.{rx,tx}_bytes, 3), 10), 2)` - **I/O per container (read + write)**: `aliasByNode(sumSeriesWithWildcards(docker-metrics.containers.*.blkio.{read,write}_bytes, 3), 2)` - **Active images ratio**: `divideSeries(docker-metrics.system.images.active_count, docker-metrics.system.images.total)` ## 🛠️ Development ### Running Locally ```bash cd src pip install -r requirements.txt export GRAPHITE_ENDPOINT=http://localhost:2003 export DEBUG=true python main.py ``` ### Adding Custom Collectors Extend `BaseCollector` to create new metric collectors: ```python from collectors.base import BaseCollector class MyCollector(BaseCollector): def get_name(self) -> str: return "mycollector" def collect(self) -> list: return [{'name': 'my.metric', 'value': 42, 'timestamp': time.time()}] ``` ## Performance - **Memory:** ~50-100MB - **CPU:** <1% (during collection) - **Collection time:** 1-5 seconds - **Network:** Minimal (Graphite plaintext protocol) ## Why This Tool? This tool brings comprehensive Docker monitoring to Graphite with: ✅ **Modular design** - Easy to extend and customize ✅ **Lightweight** - Minimal resource usage ✅ **Comprehensive** - 50+ metrics out of the box ✅ **Production-ready** - Runs in container, non-root, read-only socket access Perfect for monitoring Docker hosts without complex setups.