Files
simple-docker-metrics/Readme.md
2025-12-14 20:16:34 +01:00

9.3 KiB

Docker Metrics Collector

A lightweight, modular Docker monitoring tool that collects comprehensive metrics from containers, volumes, and the Docker system, then sends them to Graphite.

🚀 Features

Comprehensive Metrics Collection

Container Metrics:

  • CPU usage percentage (accurate per-container calculation)
  • Memory usage (bytes and percentage)
  • Disk usage per container (filesystem size in bytes)
  • Network I/O (rx/tx bytes and packets)
  • Block I/O (read/write bytes)
  • Container state (running=2, paused=1, stopped=0)
  • Health status (healthy=2, starting=1, unhealthy=0)
  • Restart count

Volume Metrics:

  • Container count per volume
  • Volume labels count
  • Volume usage tracking

System Metrics:

  • Total/running/paused/stopped container counts
  • Total image count and active images
  • System-wide storage usage (images, containers, volumes)
  • Docker system df parsing for detailed disk usage

Aggregated Metrics:

  • Per-container metric summaries
  • Volume usage patterns (in-use vs unused)
  • Container utilization percentage

Self-Metrics:

  • Service uptime and iteration count
  • Collection duration (average and last)
  • Metrics collected per iteration
  • Collector success/error counts
  • Export success/error counts
  • Memory usage (RSS, VMS)
  • CPU usage percentage
  • Thread count

Quick Start

Using Docker Compose

# Start Graphite and the metrics collector
docker compose up -d

# View logs
docker logs -f docker-df-collector

# Access Grafana
open http://localhost:80

The collector will gather metrics every few seconds and send them to Graphite.

Configuration

Configure via environment variables in compose.yml:

Variable Description Default
GRAPHITE_ENDPOINT Graphite plaintext endpoint http://graphite:2003
GRAPHITE_PREFIX Prefix for all metric names docker-metrics
INTERVAL_SECONDS Collection interval in seconds 60
DEBUG Enable debug console output false

Metrics Reference

All metrics follow the pattern: {prefix}.{category}.{name}.{metric}

Container Metrics

docker-metrics.containers.{container_name}.cpu_percent
docker-metrics.containers.{container_name}.memory_bytes
docker-metrics.containers.{container_name}.memory_percent
docker-metrics.containers.{container_name}.disk_usage_bytes
docker-metrics.containers.{container_name}.state
docker-metrics.containers.{container_name}.health
docker-metrics.containers.{container_name}.restart_count
docker-metrics.containers.{container_name}.network.rx_bytes
docker-metrics.containers.{container_name}.network.tx_bytes
docker-metrics.containers.{container_name}.blkio.read_bytes
docker-metrics.containers.{container_name}.blkio.write_bytes

System Metrics

docker-metrics.system.containers.total
docker-metrics.system.containers.running
docker-metrics.system.images.total
docker-metrics.system.images.total_size_bytes
docker-metrics.system.containers.total_size_bytes
docker-metrics.system.volumes.total_size_bytes

Aggregated Metrics

docker-metrics.aggregated.volumes.unused_count
docker-metrics.aggregated.system.container_utilization_percent

Self-Metrics (Service Health)

docker-metrics.service.uptime_seconds
docker-metrics.service.iterations_total
docker-metrics.service.metrics_collected_total
docker-metrics.service.metrics_collected_last
docker-metrics.service.collection_duration_avg_seconds
docker-metrics.service.collection_duration_last_seconds
docker-metrics.service.collector.{collector_name}.success_total
docker-metrics.service.collector.{collector_name}.errors_total
docker-metrics.service.exports_success_total
docker-metrics.service.exports_errors_total
docker-metrics.service.memory_rss_bytes
docker-metrics.service.memory_vms_bytes
docker-metrics.service.memory_rss_mb
docker-metrics.service.cpu_percent
docker-metrics.service.threads_count

📊 Grafana Queries

Powerful queries to visualize your Docker metrics:

Container Performance

  • Top 10 CPU consumers: aliasByNode(highestMax(docker-metrics.containers.*.cpu_percent, 10), 2)
  • Top 10 memory users: aliasByNode(highestMax(docker-metrics.containers.*.memory_bytes, 10), 2)
  • Average CPU across all containers: averageSeries(docker-metrics.containers.*.cpu_percent)
  • Total memory used by all containers: sumSeries(docker-metrics.containers.*.memory_bytes)
  • Container health status: aliasByNode(docker-metrics.containers.*.health, 2) (2 = healthy, 1 = starting, 0 = unhealthy, -1 = not available)

Network Monitoring

  • Total network traffic (RX + TX): sumSeries(docker-metrics.containers.*.network.{rx,tx}_bytes)
  • Top 5 network receivers: aliasByNode(highestMax(docker-metrics.containers.*.network.rx_bytes, 5), 2)
  • Top 5 network transmitters: aliasByNode(highestMax(docker-metrics.containers.*.network.tx_bytes, 5), 2)
  • Network packets per second: derivative(sumSeries(docker-metrics.containers.*.network.{rx,tx}_packets))

Storage & Disk I/O

  • Total Docker storage usage: sumSeries(docker-metrics.system.{images,containers,volumes}.total_size_bytes)
  • Storage by category: aliasByNode(docker-metrics.system.*.total_size_bytes, 2)
  • Top 10 containers by disk usage: aliasByNode(highestMax(docker-metrics.containers.*.disk_usage_bytes, 10), 2)
  • Total disk usage across all containers: sumSeries(docker-metrics.containers.*.disk_usage_bytes)
  • Container disk usage over time: aliasByNode(docker-metrics.containers.*.disk_usage_bytes, 2)
  • Top 5 disk readers: aliasByNode(highestMax(docker-metrics.containers.*.blkio.read_bytes, 5), 2)
  • Top 5 disk writers: aliasByNode(highestMax(docker-metrics.containers.*.blkio.write_bytes, 5), 2)
  • Total I/O operations rate: derivative(sumSeries(docker-metrics.containers.*.blkio.{read,write}_bytes))

System Overview

  • Container utilization %: docker-metrics.aggregated.system.container_utilization_percent
  • Running vs total containers: aliasByNode(docker-metrics.system.containers.{running,total}, 3)
  • Container states breakdown: aliasByNode(docker-metrics.system.containers.*, 3)
  • Unused volumes: docker-metrics.aggregated.volumes.unused_count
  • Volume usage ratio: divideSeries(docker-metrics.aggregated.volumes.in_use_count, docker-metrics.aggregated.volumes.total_count)

Container Lifecycle

  • Containers by state: aliasByNode(docker-metrics.containers.*.state, 2) (2 = running, 1 = paused, 0 = stopped)
  • Restart count trends: aliasByNode(docker-metrics.containers.*.restart_count, 2)
  • Containers restarted recently: aliasByNode(highestCurrent(docker-metrics.containers.*.restart_count, 5), 2)

Self-Monitoring (Service Health)

  • Service uptime: docker-metrics.service.uptime_seconds
  • Collection performance: aliasByNode(docker-metrics.service.collection_duration_{avg,last}_seconds, 3)
  • Metrics collected per iteration: docker-metrics.service.metrics_collected_last
  • Total metrics collected: docker-metrics.service.metrics_collected_total
  • Service memory usage (MB): docker-metrics.service.memory_rss_mb
  • Service CPU usage: docker-metrics.service.cpu_percent
  • Collector success rates: aliasByNode(docker-metrics.service.collector.*.success_total, 3)
  • Collector error counts: aliasByNode(docker-metrics.service.collector.*.errors_total, 3)
  • Export success vs errors: aliasByNode(docker-metrics.service.exports_{success,errors}_total, 2)
  • Service health score: divideSeries(docker-metrics.service.exports_success_total, sumSeries(docker-metrics.service.exports_{success,errors}_total))

Advanced Queries

  • Memory usage % across containers: aliasByNode(docker-metrics.containers.*.memory_percent, 2)
  • Containers using most network bandwidth: aliasByNode(highestMax(sumSeriesWithWildcards(docker-metrics.containers.*.network.{rx,tx}_bytes, 3), 10), 2)
  • I/O per container (read + write): aliasByNode(sumSeriesWithWildcards(docker-metrics.containers.*.blkio.{read,write}_bytes, 3), 2)
  • Active images ratio: divideSeries(docker-metrics.system.images.active_count, docker-metrics.system.images.total)

🛠️ Development

Running Locally

cd src
pip install -r requirements.txt

export GRAPHITE_ENDPOINT=http://localhost:2003
export DEBUG=true

python main.py

Adding Custom Collectors

Extend BaseCollector to create new metric collectors:

from collectors.base import BaseCollector

class MyCollector(BaseCollector):
    def get_name(self) -> str:
        return "mycollector"

    def collect(self) -> list:
        return [{'name': 'my.metric', 'value': 42, 'timestamp': time.time()}]

Performance

  • Memory: ~50-100MB
  • CPU: <1% (during collection)
  • Collection time: 1-5 seconds
  • Network: Minimal (Graphite plaintext protocol)

Why This Tool?

This tool brings comprehensive Docker monitoring to Graphite with:

Modular design - Easy to extend and customize
Lightweight - Minimal resource usage
Comprehensive - 50+ metrics out of the box
Production-ready - Runs in container, non-root, read-only socket access

Perfect for monitoring Docker hosts without complex setups.