Skip to content

Scheduling

The Machineuse scheduler intelligently places instances across worker nodes.

Overview

The scheduler runs on the control plane and makes placement decisions based on:

  • Node resource availability
  • User preferences (affinity)
  • Load balancing goals
  • Historical performance

Scheduling Algorithm

Weighted Score Calculation

Each node receives a score based on available resources:

Score = Σ (weight_i × normalized_resource_i)

Where:
- weight_i = configured weight for resource type
- normalized_resource_i = available / total (0-1)

Default Weights

Resource Weight Rationale
Memory 0.40 Browser instances are memory-intensive
CPU 0.30 Rendering and JavaScript execution
Disk 0.20 Snapshots and temporary files
Network 0.10 Stream bandwidth

Example Calculation

Node: worker-1
- CPU: 60% available → 0.60
- Memory: 40% available → 0.40
- Disk: 80% available → 0.80
- Network: 90% available → 0.90

Score = (0.30 × 0.60) + (0.40 × 0.40) + (0.20 × 0.80) + (0.10 × 0.90)
      = 0.18 + 0.16 + 0.16 + 0.09
      = 0.59

Placement Process

┌──────────────┐
│ API Request  │
│ Create Inst. │
└──────┬───────┘
┌──────────────┐
│   Validate   │
│   Request    │
└──────┬───────┘
┌──────────────┐     ┌──────────────┐
│  Filter      │────►│  Insufficient │
│  Candidates  │     │  Resources    │
└──────┬───────┘     └──────────────┘
┌──────────────┐
│    Score     │
│    Nodes     │
└──────┬───────┘
┌──────────────┐
│   Select     │
│  Best Node   │
└──────┬───────┘
┌──────────────┐
│   Dispatch   │
│   to Worker  │
└──────────────┘

Step 1: Filter Candidates

Eliminate nodes that cannot host the instance:

def filter_candidates(nodes, request):
    candidates = []
    for node in nodes:
        if node.status != "online":
            continue
        if node.instances_count >= node.max_instances:
            continue
        if node.available_memory < request.memory_mb:
            continue
        if node.available_cpu < request.cpu_cores:
            continue
        candidates.append(node)
    return candidates

Step 2: Score Candidates

Calculate weighted score for each candidate:

def score_node(node, weights):
    cpu_score = node.available_cpu / node.total_cpu
    mem_score = node.available_memory / node.total_memory
    disk_score = node.available_disk / node.total_disk
    net_score = node.available_network / node.total_network

    return (
        weights['cpu'] * cpu_score +
        weights['memory'] * mem_score +
        weights['disk'] * disk_score +
        weights['network'] * net_score
    )

Step 3: Select Best

Choose the node with the highest score, applying any tiebreakers:

def select_best(candidates, scores):
    # Sort by score descending
    ranked = sorted(
        zip(candidates, scores),
        key=lambda x: x[1],
        reverse=True
    )

    # Tiebreaker: prefer node with fewer instances
    best_score = ranked[0][1]
    ties = [n for n, s in ranked if s == best_score]

    return min(ties, key=lambda n: n.instances_count)

Affinity and Anti-Affinity

Node Affinity

Request placement on specific nodes:

machineuse-cli create --node-preference worker-1
# API
{
    "node_preference": "worker-1",
    "node_preference_strength": "required"  # or "preferred"
}

Strength levels: - required: Fail if node unavailable - preferred: Use if available, otherwise best alternative

Anti-Affinity

Spread instances across nodes:

machineuse-cli create --anti-affinity-group "web-servers"

Instances in the same anti-affinity group are placed on different nodes when possible.

Load Balancing

Even Distribution

The scheduler naturally balances load through resource-based scoring. Nodes with more available resources receive higher scores.

Rebalancing

Automatic rebalancing when: - Node utilization exceeds threshold (default: 80%) - Node goes offline - Manual trigger

# Trigger rebalancing
machineuse-cli cluster rebalance

# Rebalance specific node
machineuse-cli cluster rebalance --source worker-1

Migration

Move instances between nodes:

# Migrate specific instance
machineuse-cli instance migrate abc123 --target worker-2

# Migrate all from a node
machineuse-cli node drain worker-1

Configuration

Scheduling Weights

{
  "scheduling": {
    "algorithm": "weighted_score",
    "weights": {
      "cpu": 0.30,
      "memory": 0.40,
      "disk": 0.20,
      "network": 0.10
    }
  }
}

Rebalancing Thresholds

{
  "scheduling": {
    "rebalance_threshold": 0.80,
    "rebalance_interval_seconds": 300,
    "min_instances_for_rebalance": 5
  }
}

Placement Constraints

{
  "scheduling": {
    "constraints": {
      "max_instances_per_node": 50,
      "min_memory_headroom_mb": 1024,
      "min_cpu_headroom_percent": 10
    }
  }
}

Monitoring

Scheduler Metrics

machineuse-cli scheduler status

Output:

Scheduler Status: active
Pending Requests: 0
Average Placement Time: 45ms
Placements (24h): 150
Failed Placements: 2

Node Distribution:
  worker-1: 15 instances (45% load)
  worker-2: 23 instances (67% load)
  worker-3: 12 instances (35% load)

Placement History

machineuse-cli scheduler history --limit 10

Best Practices

  1. Balance weights based on your workload characteristics
  2. Set appropriate thresholds to prevent resource exhaustion
  3. Use anti-affinity for high-availability deployments
  4. Monitor scheduler metrics to identify bottlenecks
  5. Plan capacity to ensure headroom for scheduling flexibility