Storage Architecture¶
Machineuse uses a multi-tier storage architecture optimized for different workloads.
Storage Tiers¶
┌─────────────────────────────────────────────────────────────┐
│ Storage Architecture │
├─────────────────┬─────────────────┬─────────────────────────┤
│ Operational │ Analytics │ File System │
│ Database │ Database │ │
├─────────────────┼─────────────────┼─────────────────────────┤
│ SQLite/Postgres │ DuckDB │ Local Filesystem │
├─────────────────┼─────────────────┼─────────────────────────┤
│ • Instance meta │ • Time-series │ • Container images │
│ • Node registry │ • Metrics │ • Snapshots │
│ • Configuration │ • Usage stats │ • Logs │
│ • Sessions │ • Trends │ • Temporary files │
└─────────────────┴─────────────────┴─────────────────────────┘
Operational Storage¶
SQLite (Single Node / Workers)¶
Used for local state on worker nodes and single-node deployments.
Location: /var/lib/machineuse/machineuse.db
Schema:
-- Instances table
CREATE TABLE instances (
id TEXT PRIMARY KEY,
node_id TEXT NOT NULL,
status TEXT NOT NULL,
image TEXT NOT NULL,
config JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Snapshots table
CREATE TABLE snapshots (
id TEXT PRIMARY KEY,
instance_id TEXT NOT NULL,
path TEXT NOT NULL,
size_bytes INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (instance_id) REFERENCES instances(id)
);
PostgreSQL (Control Plane)¶
Used for distributed state in multi-node deployments.
Configuration:
{
"storage": {
"backend": "postgresql",
"database_url": "postgresql://user:pass@localhost:5432/machineuse"
}
}
Additional Features: - Connection pooling - Transactions for consistency - Full-text search - JSON operators for configuration
Analytics Storage¶
DuckDB¶
Optimized for time-series metrics and analytical queries.
Location: /var/lib/machineuse/analytics.duckdb
Use Cases: - Resource utilization trends - Instance lifecycle analytics - Capacity planning data - Billing calculations
Schema:
-- Metrics table (time-series)
CREATE TABLE metrics (
timestamp TIMESTAMP,
node_id TEXT,
instance_id TEXT,
metric_name TEXT,
metric_value DOUBLE,
PRIMARY KEY (timestamp, node_id, metric_name)
);
-- Aggregated daily stats
CREATE TABLE daily_stats (
date DATE,
node_id TEXT,
total_instances INTEGER,
avg_cpu_percent DOUBLE,
avg_memory_percent DOUBLE,
PRIMARY KEY (date, node_id)
);
Query Examples:
-- CPU trend over 24 hours
SELECT
date_trunc('hour', timestamp) as hour,
avg(metric_value) as avg_cpu
FROM metrics
WHERE metric_name = 'cpu_percent'
AND timestamp > now() - interval '24 hours'
GROUP BY 1
ORDER BY 1;
-- Instance lifecycle summary
SELECT
date_trunc('day', created_at) as day,
count(*) as instances_created,
avg(extract(epoch from (deleted_at - created_at))) as avg_lifetime_seconds
FROM instance_events
GROUP BY 1;
Retention Policy¶
Configure data retention:
{
"analytics": {
"backend": "duckdb",
"retention_days": 30,
"aggregation": {
"hourly_retention_days": 7,
"daily_retention_days": 90
}
}
}
File System Storage¶
Directory Structure¶
/var/lib/machineuse/
├── containers/ # Running container data
│ └── instance-xyz/
├── snapshots/ # Dormant instance snapshots
│ └── snap-abc123.gz
├── logs/ # Application logs
│ ├── api.log
│ └── agent.log
├── machineuse.db # SQLite database
└── analytics.duckdb # Analytics database
Container Storage¶
Each container uses an overlayfs mount:
/var/lib/machines/instance-xyz/
├── lower/ # Read-only base image
├── upper/ # Writable changes
├── work/ # Overlayfs work directory
└── merged/ # Final merged view
Snapshot Storage¶
Snapshots are compressed filesystem archives:
# Snapshot structure
/var/lib/machineuse/snapshots/
├── snap-abc123.gz # Compressed snapshot
├── snap-abc123.meta.json # Metadata
└── snap-abc123.manifest # File listing
Metadata format:
{
"snapshot_id": "snap-abc123",
"instance_id": "instance-xyz",
"created_at": "2026-03-19T10:30:00Z",
"size_bytes": 1073741824,
"compression": "gzip",
"checksum": "sha256:abc123..."
}
Storage Provider Interface¶
Abstract Interface¶
from abc import ABC, abstractmethod
from typing import Dict, List, Optional
class StorageProvider(ABC):
"""Abstract storage provider interface."""
@abstractmethod
async def create_instance(self, data: Dict) -> str:
"""Create instance record, return instance_id."""
pass
@abstractmethod
async def get_instance(self, instance_id: str) -> Optional[Dict]:
"""Get instance by ID."""
pass
@abstractmethod
async def update_instance(self, instance_id: str, data: Dict) -> bool:
"""Update instance record."""
pass
@abstractmethod
async def delete_instance(self, instance_id: str) -> bool:
"""Delete instance record."""
pass
@abstractmethod
async def list_instances(self, filters: Dict = None) -> List[Dict]:
"""List instances with optional filters."""
pass
Analytics Interface¶
class AnalyticsProvider(ABC):
"""Abstract analytics provider interface."""
@abstractmethod
async def record_metric(
self,
metric_name: str,
value: float,
node_id: str = None,
instance_id: str = None
) -> None:
"""Record a metric value."""
pass
@abstractmethod
async def query_metrics(
self,
metric_name: str,
start_time: datetime,
end_time: datetime,
aggregation: str = "avg"
) -> List[Dict]:
"""Query metrics with aggregation."""
pass
Storage Factory¶
from machineuse.core.storage import StorageFactory
# Create provider based on configuration
provider = StorageFactory.create(config)
# Available backends
StorageFactory._providers = {
"sqlite": SQLiteProvider,
"postgresql": PostgreSQLProvider,
}
StorageFactory._analytics_providers = {
"duckdb": DuckDBAnalyticsProvider,
}
Configuration¶
SQLite Configuration¶
{
"storage": {
"backend": "sqlite",
"database_path": "/var/lib/machineuse/machineuse.db",
"journal_mode": "WAL",
"synchronous": "NORMAL"
}
}
PostgreSQL Configuration¶
{
"storage": {
"backend": "postgresql",
"database_url": "postgresql://user:pass@localhost:5432/machineuse",
"pool_size": 10,
"max_overflow": 20
}
}
Analytics Configuration¶
{
"analytics": {
"backend": "duckdb",
"database_path": "/var/lib/machineuse/analytics.duckdb",
"retention_days": 30,
"metrics_interval_seconds": 30
}
}
Backup and Recovery¶
Database Backup¶
# SQLite
sqlite3 /var/lib/machineuse/machineuse.db ".backup /backup/machineuse.db"
# PostgreSQL
pg_dump -U machineuse machineuse > /backup/machineuse.sql
# DuckDB
duckdb /var/lib/machineuse/analytics.duckdb "EXPORT DATABASE '/backup/analytics';"
Snapshot Backup¶
Recovery¶
# Restore SQLite
cp /backup/machineuse.db /var/lib/machineuse/machineuse.db
# Restore PostgreSQL
psql -U machineuse machineuse < /backup/machineuse.sql
# Restore snapshots
rsync -av /backup/snapshots/ /var/lib/machineuse/snapshots/
Performance Tuning¶
SQLite Optimization¶
{
"storage": {
"backend": "sqlite",
"pragma": {
"journal_mode": "WAL",
"synchronous": "NORMAL",
"cache_size": -64000,
"mmap_size": 268435456
}
}
}
PostgreSQL Optimization¶
- Use connection pooling (PgBouncer)
- Configure appropriate
shared_buffers - Enable query plan caching
- Use indexes for common queries
DuckDB Optimization¶
- Partition by time for large datasets
- Use appropriate aggregation intervals
- Configure memory limits for queries