Skip to content

Distributed Deployment

Deploy Machineuse across multiple nodes for scalability and high availability.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Control Plane                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  Scheduler  │  │  API Server │  │  PostgreSQL │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘
                           │ NNG (tcp://5555)
        ┌──────────────────┼──────────────────┐
        │                  │                  │
        ▼                  ▼                  ▼
┌───────────────┐  ┌───────────────┐  ┌───────────────┐
│   Worker 1    │  │   Worker 2    │  │   Worker N    │
│ systemd-nspawn│  │ systemd-nspawn│  │ systemd-nspawn│
│  containers   │  │  containers   │  │  containers   │
└───────────────┘  └───────────────┘  └───────────────┘

Prerequisites

Control Plane Node

  • Ubuntu 20.04+ / Debian 11+
  • Python 3.11+
  • PostgreSQL 13+
  • 2+ CPU cores, 4+ GB RAM

Worker Nodes

  • Ubuntu 20.04+ / Debian 11+
  • Python 3.11+
  • systemd-nspawn support
  • 4+ CPU cores, 8+ GB RAM
  • Root/sudo access

Network

  • TCP connectivity between all nodes
  • Port 5555 for NNG messaging
  • Port 8000 for API (control plane)

Control Plane Setup

1. Install PostgreSQL

sudo apt install postgresql postgresql-contrib

# Create database
sudo -u postgres createuser machineuse
sudo -u postgres createdb -O machineuse machineuse
sudo -u postgres psql -c "ALTER USER machineuse PASSWORD 'your-secure-password';"

2. Install Machineuse

git clone https://github.com/dotcommoners/machineuse.git
cd machineuse/machineuse-api
poetry install

3. Configure Control Plane

Create /etc/machineuse/config.json:

{
  "deployment_mode": "control_plane",
  "node_id": "control-plane-1",
  "api": {
    "host": "0.0.0.0",
    "port": 8000
  },
  "storage": {
    "backend": "postgresql",
    "database_url": "postgresql://machineuse:your-secure-password@localhost:5432/machineuse"
  },
  "messaging": {
    "bind_address": "tcp://0.0.0.0:5555",
    "heartbeat_interval_seconds": 10
  },
  "scheduling": {
    "algorithm": "weighted_score",
    "weights": {
      "cpu": 0.3,
      "memory": 0.4,
      "disk": 0.2,
      "network": 0.1
    }
  }
}

4. Start Control Plane

./scripts/start-control-plane.sh

# Or manually
poetry run python -m machineuse.nodes.control_plane --bind tcp://0.0.0.0:5555

Worker Node Setup

1. Install Machineuse

On each worker node:

git clone https://github.com/dotcommoners/machineuse.git
cd machineuse/machineuse-api
poetry install

# Install system dependencies
sudo ./scripts/install.sh --mode worker_node

2. Configure Worker

Create /etc/machineuse/config.json:

{
  "deployment_mode": "worker_node",
  "node_id": "worker-1",
  "api": {
    "host": "0.0.0.0",
    "port": 8001
  },
  "storage": {
    "backend": "sqlite",
    "database_path": "/var/lib/machineuse/worker.db"
  },
  "messaging": {
    "control_plane_address": "tcp://control-plane-ip:5555"
  },
  "containers": {
    "max_instances": 50,
    "default_memory_mb": 2048
  }
}

3. Start Worker

./scripts/start-worker-node.sh

# Or manually
poetry run python -m machineuse.nodes.agent worker-1 tcp://control-plane-ip:5555

Adding More Workers

  1. Set up a new server with dependencies
  2. Copy and modify worker configuration (unique node_id)
  3. Start the worker agent
  4. Verify registration:
# On control plane
machineuse-cli nodes list

Load Balancing

Scheduling Algorithm

The scheduler uses weighted scoring:

Factor Weight Description
CPU 0.3 Available CPU capacity
Memory 0.4 Available memory
Disk 0.2 Available disk space
Network 0.1 Network capacity

Instances are placed on the node with the highest score.

Affinity Rules

# Prefer specific node
machineuse-cli create --node-preference worker-1

# Anti-affinity (spread instances)
machineuse-cli create --anti-affinity-group "web-servers"

Rebalancing

The scheduler automatically rebalances when: - Node utilization exceeds threshold - Node goes offline - Manual trigger

# Trigger rebalancing
machineuse-cli cluster rebalance

High Availability

Control Plane HA

For production, run multiple control plane instances:

┌──────────────┐    ┌──────────────┐
│ Control Plane│    │ Control Plane│
│   (Active)   │◄──►│  (Standby)   │
└──────────────┘    └──────────────┘
┌──────────────┐
│  PostgreSQL  │
│   (Shared)   │
└──────────────┘

Worker Resilience

If a worker fails: 1. Control plane detects missing heartbeat 2. Instances are marked for migration 3. Scheduler places instances on healthy workers 4. Dormant instances are revived on new nodes

Monitoring

Cluster Status

machineuse-cli cluster status

Node Health

machineuse-cli nodes list

Output:

ID          Status    Instances    CPU      Memory    Last Heartbeat
───────────────────────────────────────────────────────────────────
worker-1    online    15/50        45%      62%       2s ago
worker-2    online    23/50        67%      78%       1s ago
worker-3    offline   0/50         -        -         5m ago

Metrics

# Cluster-wide metrics
machineuse-cli metrics

# Per-node metrics
machineuse-cli metrics --node worker-1

Security

Network Isolation

  • Use private network for NNG communication
  • Expose only API port (8000) publicly
  • Use firewall rules:
# Control plane
ufw allow from 10.0.0.0/8 to any port 5555  # NNG from workers
ufw allow 8000/tcp                            # API

# Workers
ufw allow from 10.0.0.0/8 to any port 8001   # Local API

TLS for NNG

Configure TLS for inter-node communication:

{
  "messaging": {
    "tls": {
      "enabled": true,
      "cert_file": "/etc/machineuse/server.crt",
      "key_file": "/etc/machineuse/server.key",
      "ca_file": "/etc/machineuse/ca.crt"
    }
  }
}

Troubleshooting

Worker Not Connecting

# Check connectivity
nc -zv control-plane-ip 5555

# Check worker logs
journalctl -u machineuse-agent -f

# Verify configuration
machineuse-cli config validate

Scheduling Failures

# Check available capacity
machineuse-cli nodes list

# View scheduler logs
journalctl -u machineuse | grep scheduler

Instance Migration Issues

# Check pending migrations
machineuse-cli migrations list

# Force migration
machineuse-cli instance migrate abc123 --target worker-2