Distributed Deployment¶
Deploy Machineuse across multiple nodes for scalability and high availability.
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ Control Plane │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Scheduler │ │ API Server │ │ PostgreSQL │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
│ NNG (tcp://5555)
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker N │
│ systemd-nspawn│ │ systemd-nspawn│ │ systemd-nspawn│
│ containers │ │ containers │ │ containers │
└───────────────┘ └───────────────┘ └───────────────┘
Prerequisites¶
Control Plane Node¶
- Ubuntu 20.04+ / Debian 11+
- Python 3.11+
- PostgreSQL 13+
- 2+ CPU cores, 4+ GB RAM
Worker Nodes¶
- Ubuntu 20.04+ / Debian 11+
- Python 3.11+
- systemd-nspawn support
- 4+ CPU cores, 8+ GB RAM
- Root/sudo access
Network¶
- TCP connectivity between all nodes
- Port 5555 for NNG messaging
- Port 8000 for API (control plane)
Control Plane Setup¶
1. Install PostgreSQL¶
sudo apt install postgresql postgresql-contrib
# Create database
sudo -u postgres createuser machineuse
sudo -u postgres createdb -O machineuse machineuse
sudo -u postgres psql -c "ALTER USER machineuse PASSWORD 'your-secure-password';"
2. Install Machineuse¶
git clone https://github.com/dotcommoners/machineuse.git
cd machineuse/machineuse-api
poetry install
3. Configure Control Plane¶
Create /etc/machineuse/config.json:
{
"deployment_mode": "control_plane",
"node_id": "control-plane-1",
"api": {
"host": "0.0.0.0",
"port": 8000
},
"storage": {
"backend": "postgresql",
"database_url": "postgresql://machineuse:your-secure-password@localhost:5432/machineuse"
},
"messaging": {
"bind_address": "tcp://0.0.0.0:5555",
"heartbeat_interval_seconds": 10
},
"scheduling": {
"algorithm": "weighted_score",
"weights": {
"cpu": 0.3,
"memory": 0.4,
"disk": 0.2,
"network": 0.1
}
}
}
4. Start Control Plane¶
./scripts/start-control-plane.sh
# Or manually
poetry run python -m machineuse.nodes.control_plane --bind tcp://0.0.0.0:5555
Worker Node Setup¶
1. Install Machineuse¶
On each worker node:
git clone https://github.com/dotcommoners/machineuse.git
cd machineuse/machineuse-api
poetry install
# Install system dependencies
sudo ./scripts/install.sh --mode worker_node
2. Configure Worker¶
Create /etc/machineuse/config.json:
{
"deployment_mode": "worker_node",
"node_id": "worker-1",
"api": {
"host": "0.0.0.0",
"port": 8001
},
"storage": {
"backend": "sqlite",
"database_path": "/var/lib/machineuse/worker.db"
},
"messaging": {
"control_plane_address": "tcp://control-plane-ip:5555"
},
"containers": {
"max_instances": 50,
"default_memory_mb": 2048
}
}
3. Start Worker¶
./scripts/start-worker-node.sh
# Or manually
poetry run python -m machineuse.nodes.agent worker-1 tcp://control-plane-ip:5555
Adding More Workers¶
- Set up a new server with dependencies
- Copy and modify worker configuration (unique
node_id) - Start the worker agent
- Verify registration:
Load Balancing¶
Scheduling Algorithm¶
The scheduler uses weighted scoring:
| Factor | Weight | Description |
|---|---|---|
| CPU | 0.3 | Available CPU capacity |
| Memory | 0.4 | Available memory |
| Disk | 0.2 | Available disk space |
| Network | 0.1 | Network capacity |
Instances are placed on the node with the highest score.
Affinity Rules¶
# Prefer specific node
machineuse-cli create --node-preference worker-1
# Anti-affinity (spread instances)
machineuse-cli create --anti-affinity-group "web-servers"
Rebalancing¶
The scheduler automatically rebalances when: - Node utilization exceeds threshold - Node goes offline - Manual trigger
High Availability¶
Control Plane HA¶
For production, run multiple control plane instances:
┌──────────────┐ ┌──────────────┐
│ Control Plane│ │ Control Plane│
│ (Active) │◄──►│ (Standby) │
└──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ PostgreSQL │
│ (Shared) │
└──────────────┘
Worker Resilience¶
If a worker fails: 1. Control plane detects missing heartbeat 2. Instances are marked for migration 3. Scheduler places instances on healthy workers 4. Dormant instances are revived on new nodes
Monitoring¶
Cluster Status¶
Node Health¶
Output:
ID Status Instances CPU Memory Last Heartbeat
───────────────────────────────────────────────────────────────────
worker-1 online 15/50 45% 62% 2s ago
worker-2 online 23/50 67% 78% 1s ago
worker-3 offline 0/50 - - 5m ago
Metrics¶
# Cluster-wide metrics
machineuse-cli metrics
# Per-node metrics
machineuse-cli metrics --node worker-1
Security¶
Network Isolation¶
- Use private network for NNG communication
- Expose only API port (8000) publicly
- Use firewall rules:
# Control plane
ufw allow from 10.0.0.0/8 to any port 5555 # NNG from workers
ufw allow 8000/tcp # API
# Workers
ufw allow from 10.0.0.0/8 to any port 8001 # Local API
TLS for NNG¶
Configure TLS for inter-node communication:
{
"messaging": {
"tls": {
"enabled": true,
"cert_file": "/etc/machineuse/server.crt",
"key_file": "/etc/machineuse/server.key",
"ca_file": "/etc/machineuse/ca.crt"
}
}
}
Troubleshooting¶
Worker Not Connecting¶
# Check connectivity
nc -zv control-plane-ip 5555
# Check worker logs
journalctl -u machineuse-agent -f
# Verify configuration
machineuse-cli config validate
Scheduling Failures¶
# Check available capacity
machineuse-cli nodes list
# View scheduler logs
journalctl -u machineuse | grep scheduler