Modern DevOps is built on one principle: if you do it twice, automate it. This project turns a standard Xubuntu laptop into a fully monitored infrastructure node — deploying Prometheus, Grafana, Loki, and Alertmanager end-to-end using Vagrant, Docker Compose, and Ansible. No manual steps, no clicking through UIs — one command and the entire stack is live.
The Problem With Manual Monitoring Setup
Every team eventually builds a monitoring stack. Most build it manually: install Prometheus here, configure Grafana there, wire Alertmanager separately, then document none of it. The result is a stack nobody can reproduce, nobody can debug, and nobody trusts. When the monitoring server dies, the monitoring setup dies with it.
This project solves that. Every component is defined as code. The entire stack is reproducible in minutes on any machine.
Architecture — Three-Tier Observability
Each component runs as a Docker container. Docker Compose defines their relationships, volumes, and restart policies. Ansible provisions the host, installs dependencies, copies configs, and fires up the stack — idempotently.
The Networking Challenge
The biggest technical hurdle: when Prometheus runs inside Docker, localhost refers to the container itself, not the host machine. Node Exporter runs on the host and exposes metrics on port 9100 — but the container can't reach it via localhost:9100.
The fix: use the Docker bridge gateway IP (172.17.0.1) as the scrape target in prometheus.yml. This lets the container reach out to the host's network interface. UFW also needed an explicit allow rule for port 9100 from the Docker interface:
ufw allow from 172.17.0.0/16 to any port 9100
The Ansible Playbook
The full stack — Node Exporter, Prometheus, Grafana, Loki, Alertmanager — is deployed through a single Ansible playbook. This ensures idempotency: run it once, run it ten times, the result is always the same.
- name: Deploy Monitoring Stack
hosts: localhost
connection: local
become: yes
tasks:
- name: Run Node Exporter
community.docker.docker_container:
name: node-exporter
image: prom/node-exporter:latest
state: started
restart_policy: always
ports: ["9100:9100"]
- name: Run Prometheus
community.docker.docker_container:
name: prometheus
image: prom/prometheus:latest
state: started
recreate: yes
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
- "./alert_rules.yml:/etc/prometheus/alert_rules.yml"
ports: ["9091:9090"]
- name: Run Grafana
community.docker.docker_container:
name: grafana
image: grafana/grafana:latest
state: started
ports: ["3000:3000"]
Proactive Alerting
Monitoring without alerting is just a dashboard you have to stare at. I wired Alertmanager with a custom rule: if CPU usage exceeds 85% for more than 2 minutes, fire a CRITICAL alert. This moves the stack from passive observation to active incident response.
- alert: HighCPUUsage
expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 2m
labels:
severity: critical
annotations:
summary: "CPU usage above 85% for 2 minutes"
Vagrant — Reproducible Test Environment
The stack runs inside a Vagrant-managed VM, giving a clean, reproducible environment that mirrors a real server. Spin it up, test the stack, destroy it, spin it up again — zero residue on your host machine.
vagrant up # provision the VM
ansible-playbook monitoring.yml # deploy the stack
# → Grafana live at http://localhost:3000
Key Takeaways
- Infrastructure is code. Never manually configure what you can define in a playbook.
- Docker networking has gotchas. Always check gateway IPs when containers need to reach the host.
- Firewalls matter. UFW blocked the first 30 minutes of debugging — check it early.
- You don't need cloud budget. This entire stack runs on an 8GB RAM Dell Latitude.
What's Next
- Loki + Promtail for centralized log aggregation
- Slack webhook integration for alert delivery
- Terraform provisioning for cloud deployment
- Multi-node scraping across a homelab cluster