9.3 KiB
Docker Metrics Collector
A lightweight, modular Docker monitoring tool that collects comprehensive metrics from containers, volumes, and the Docker system, then sends them to Graphite.
🚀 Features
Comprehensive Metrics Collection
Container Metrics:
- CPU usage percentage (accurate per-container calculation)
- Memory usage (bytes and percentage)
- Disk usage per container (filesystem size in bytes)
- Network I/O (rx/tx bytes and packets)
- Block I/O (read/write bytes)
- Container state (running=2, paused=1, stopped=0)
- Health status (healthy=2, starting=1, unhealthy=0)
- Restart count
Volume Metrics:
- Container count per volume
- Volume labels count
- Volume usage tracking
System Metrics:
- Total/running/paused/stopped container counts
- Total image count and active images
- System-wide storage usage (images, containers, volumes)
- Docker system df parsing for detailed disk usage
Aggregated Metrics:
- Per-container metric summaries
- Volume usage patterns (in-use vs unused)
- Container utilization percentage
Self-Metrics:
- Service uptime and iteration count
- Collection duration (average and last)
- Metrics collected per iteration
- Collector success/error counts
- Export success/error counts
- Memory usage (RSS, VMS)
- CPU usage percentage
- Thread count
Quick Start
Using Docker Compose
# Start Graphite and the metrics collector
docker compose up -d
# View logs
docker logs -f docker-df-collector
# Access Grafana
open http://localhost:80
The collector will gather metrics every few seconds and send them to Graphite.
Configuration
Configure via environment variables in compose.yml:
| Variable | Description | Default |
|---|---|---|
GRAPHITE_ENDPOINT |
Graphite plaintext endpoint | http://graphite:2003 |
GRAPHITE_PREFIX |
Prefix for all metric names | docker-metrics |
INTERVAL_SECONDS |
Collection interval in seconds | 60 |
DEBUG |
Enable debug console output | false |
Metrics Reference
All metrics follow the pattern: {prefix}.{category}.{name}.{metric}
Container Metrics
docker-metrics.containers.{container_name}.cpu_percent
docker-metrics.containers.{container_name}.memory_bytes
docker-metrics.containers.{container_name}.memory_percent
docker-metrics.containers.{container_name}.disk_usage_bytes
docker-metrics.containers.{container_name}.state
docker-metrics.containers.{container_name}.health
docker-metrics.containers.{container_name}.restart_count
docker-metrics.containers.{container_name}.network.rx_bytes
docker-metrics.containers.{container_name}.network.tx_bytes
docker-metrics.containers.{container_name}.blkio.read_bytes
docker-metrics.containers.{container_name}.blkio.write_bytes
System Metrics
docker-metrics.system.containers.total
docker-metrics.system.containers.running
docker-metrics.system.images.total
docker-metrics.system.images.total_size_bytes
docker-metrics.system.containers.total_size_bytes
docker-metrics.system.volumes.total_size_bytes
Aggregated Metrics
docker-metrics.aggregated.volumes.unused_count
docker-metrics.aggregated.system.container_utilization_percent
Self-Metrics (Service Health)
docker-metrics.service.uptime_seconds
docker-metrics.service.iterations_total
docker-metrics.service.metrics_collected_total
docker-metrics.service.metrics_collected_last
docker-metrics.service.collection_duration_avg_seconds
docker-metrics.service.collection_duration_last_seconds
docker-metrics.service.collector.{collector_name}.success_total
docker-metrics.service.collector.{collector_name}.errors_total
docker-metrics.service.exports_success_total
docker-metrics.service.exports_errors_total
docker-metrics.service.memory_rss_bytes
docker-metrics.service.memory_vms_bytes
docker-metrics.service.memory_rss_mb
docker-metrics.service.cpu_percent
docker-metrics.service.threads_count
📊 Grafana Queries
Powerful queries to visualize your Docker metrics:
Container Performance
- Top 10 CPU consumers:
aliasByNode(highestMax(docker-metrics.containers.*.cpu_percent, 10), 2) - Top 10 memory users:
aliasByNode(highestMax(docker-metrics.containers.*.memory_bytes, 10), 2) - Average CPU across all containers:
averageSeries(docker-metrics.containers.*.cpu_percent) - Total memory used by all containers:
sumSeries(docker-metrics.containers.*.memory_bytes) - Container health status:
aliasByNode(docker-metrics.containers.*.health, 2)(2= healthy,1= starting,0= unhealthy,-1= not available)
Network Monitoring
- Total network traffic (RX + TX):
sumSeries(docker-metrics.containers.*.network.{rx,tx}_bytes) - Top 5 network receivers:
aliasByNode(highestMax(docker-metrics.containers.*.network.rx_bytes, 5), 2) - Top 5 network transmitters:
aliasByNode(highestMax(docker-metrics.containers.*.network.tx_bytes, 5), 2) - Network packets per second:
derivative(sumSeries(docker-metrics.containers.*.network.{rx,tx}_packets))
Storage & Disk I/O
- Total Docker storage usage:
sumSeries(docker-metrics.system.{images,containers,volumes}.total_size_bytes) - Storage by category:
aliasByNode(docker-metrics.system.*.total_size_bytes, 2) - Top 10 containers by disk usage:
aliasByNode(highestMax(docker-metrics.containers.*.disk_usage_bytes, 10), 2) - Total disk usage across all containers:
sumSeries(docker-metrics.containers.*.disk_usage_bytes) - Container disk usage over time:
aliasByNode(docker-metrics.containers.*.disk_usage_bytes, 2) - Top 5 disk readers:
aliasByNode(highestMax(docker-metrics.containers.*.blkio.read_bytes, 5), 2) - Top 5 disk writers:
aliasByNode(highestMax(docker-metrics.containers.*.blkio.write_bytes, 5), 2) - Total I/O operations rate:
derivative(sumSeries(docker-metrics.containers.*.blkio.{read,write}_bytes))
System Overview
- Container utilization %:
docker-metrics.aggregated.system.container_utilization_percent - Running vs total containers:
aliasByNode(docker-metrics.system.containers.{running,total}, 3) - Container states breakdown:
aliasByNode(docker-metrics.system.containers.*, 3) - Unused volumes:
docker-metrics.aggregated.volumes.unused_count - Volume usage ratio:
divideSeries(docker-metrics.aggregated.volumes.in_use_count, docker-metrics.aggregated.volumes.total_count)
Container Lifecycle
- Containers by state:
aliasByNode(docker-metrics.containers.*.state, 2)(2= running,1= paused,0= stopped) - Restart count trends:
aliasByNode(docker-metrics.containers.*.restart_count, 2) - Containers restarted recently:
aliasByNode(highestCurrent(docker-metrics.containers.*.restart_count, 5), 2)
Self-Monitoring (Service Health)
- Service uptime:
docker-metrics.service.uptime_seconds - Collection performance:
aliasByNode(docker-metrics.service.collection_duration_{avg,last}_seconds, 3) - Metrics collected per iteration:
docker-metrics.service.metrics_collected_last - Total metrics collected:
docker-metrics.service.metrics_collected_total - Service memory usage (MB):
docker-metrics.service.memory_rss_mb - Service CPU usage:
docker-metrics.service.cpu_percent - Collector success rates:
aliasByNode(docker-metrics.service.collector.*.success_total, 3) - Collector error counts:
aliasByNode(docker-metrics.service.collector.*.errors_total, 3) - Export success vs errors:
aliasByNode(docker-metrics.service.exports_{success,errors}_total, 2) - Service health score:
divideSeries(docker-metrics.service.exports_success_total, sumSeries(docker-metrics.service.exports_{success,errors}_total))
Advanced Queries
- Memory usage % across containers:
aliasByNode(docker-metrics.containers.*.memory_percent, 2) - Containers using most network bandwidth:
aliasByNode(highestMax(sumSeriesWithWildcards(docker-metrics.containers.*.network.{rx,tx}_bytes, 3), 10), 2) - I/O per container (read + write):
aliasByNode(sumSeriesWithWildcards(docker-metrics.containers.*.blkio.{read,write}_bytes, 3), 2) - Active images ratio:
divideSeries(docker-metrics.system.images.active_count, docker-metrics.system.images.total)
🛠️ Development
Running Locally
cd src
pip install -r requirements.txt
export GRAPHITE_ENDPOINT=http://localhost:2003
export DEBUG=true
python main.py
Adding Custom Collectors
Extend BaseCollector to create new metric collectors:
from collectors.base import BaseCollector
class MyCollector(BaseCollector):
def get_name(self) -> str:
return "mycollector"
def collect(self) -> list:
return [{'name': 'my.metric', 'value': 42, 'timestamp': time.time()}]
Performance
- Memory: ~50-100MB
- CPU: <1% (during collection)
- Collection time: 1-5 seconds
- Network: Minimal (Graphite plaintext protocol)
Why This Tool?
This tool brings comprehensive Docker monitoring to Graphite with:
✅ Modular design - Easy to extend and customize
✅ Lightweight - Minimal resource usage
✅ Comprehensive - 50+ metrics out of the box
✅ Production-ready - Runs in container, non-root, read-only socket access
Perfect for monitoring Docker hosts without complex setups.