About Our Infrastructure
An overview of the engineering decisions behind MoldPlan — container orchestration, automated deployments, observability, and multi-tenant isolation.
Why We Built Our Own Platform
MoldPlan runs on-premise at each customer's facility — factory data stays in the factory. Every environment we manage is provisioned, deployed, and monitored through the same automated pipeline.
This post walks through some of the engineering decisions that make that possible.
Container Orchestration
We use HashiCorp Nomad for container orchestration.
MoldPlan deployments typically run on a single powerful node per customer. Nomad gives us container orchestration, health checking, rolling updates, and service discovery with low operational overhead. One binary, one config file.
Our services — APIs, workers, databases, message brokers, identity providers — all run as Nomad jobs with clearly defined resource constraints, health checks, and restart policies.
service {
name = "moldplan-api"
port = "http"
check {
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
}
Simple, readable, and it just works.
Infrastructure as Code — All 40+ Roles of It
Every piece of our infrastructure is defined in Ansible — DNS, reverse proxies, databases, monitoring stacks, certificate management, firewall rules, identity providers.
We maintain over 40 reusable Ansible roles, organized into a shared library that works across customer environments, development setups, and our own internal infrastructure. A single deployment command can provision an entire environment from bare metal to production-ready:
./deploy.sh -e production -c customer-name -a all
That all expands into the correct deployment order — databases first, then identity, then gateway, then application services. Each role handles its own health validation before the next one begins.
Observability
Every environment we manage runs a full observability stack:
- Prometheus collects metrics from every service, node, and Nomad job
- Grafana provides dashboards for operations and customer-facing health views
- Loki aggregates logs from all containers with structured labels
- Alertmanager routes alerts to the right people based on severity and time
Monitoring is part of the Ansible deployment pipeline. When a new service deploys, its metrics endpoint is automatically discovered, its dashboards are provisioned, and its alert rules are active. Zero manual configuration.
Multi-Tenancy That Actually Isolates
Each customer gets:
- Their own Nomad cluster running on dedicated hardware inside their facility
- Isolated networking via Tailscale mesh VPN with independent connectivity per environment
- Independent databases — SQL Server, MongoDB, and PostgreSQL per environment
- Separate identity providers — each customer has their own Keycloak realm with OAuth2/OIDC
- Customer-specific certificates — automated renewal with multiple modes (download, generate, or full lifecycle management)
We can manage dozens of independent customer environments from a central control plane without any of them knowing about each other.
Deployment Pipeline
Our CI/CD runs on GitLab with a modular component architecture. When code is pushed:
- Change detection determines which services are affected
- Docker images are built and pushed to our container registry
- Ansible playbooks deploy the updated services to the target environment
- Post-deployment checks verify health, connectivity, and service availability
A typical deployment from commit to production takes minutes. Every deployment is codified, so rollbacks are straightforward — point to the previous image tag and re-run.
The Networking Layer
Connecting on-premise customer environments to our management plane requires care. We use Tailscale to create an encrypted mesh network that spans every environment we manage.
On top of that, Caddy serves as our reverse proxy with automatic TLS certificate management. Consul provides service discovery, so services find each other by name. CoreDNS handles internal DNS resolution.
Our developers can securely reach any service in any customer environment, connecting by name.
Certificate Lifecycle
Our cert-renewal service fully automates TLS certificate management across three modes:
- Download mode — fetches wildcard certificates from Azure Blob Storage
- Generate mode — creates certificates locally with a customer-specific CA
- Renew mode — full lifecycle management with renewal and redistribution
Certificates are rotated automatically. The reverse proxy hot-reloads new certificates. Expiry monitoring alerts fire 30 days before any certificate expires.
Secret Management
We use Ansible Vault for infrastructure secrets and HashiCorp tools for service-level configuration. Every sensitive value is encrypted at rest and decrypted only at deployment time.
Customer credentials, database passwords, API keys, OAuth client secrets — all managed through a unified workflow.
What We Get From All This
Build infrastructure like this and the day-to-day becomes surprisingly calm:
- New customer onboarding is a single playbook run — about an hour from bare hardware to production
- Updates ship to any environment with one command, during business hours, with zero downtime
- Incidents are usually detected by monitoring first
- Debugging happens remotely through secure tunnels, with full access to logs, metrics, and dashboards
Manufacturing software reliability directly impacts factory output. The infrastructure quality defines the level of service we can deliver.
Looking Ahead
We're continuously evolving this platform. Current areas of focus:
- AI-powered diagnostics — automated root cause analysis for production issues
- Edge computing — running ML models closer to the factory floor for real-time predictions
- Object storage migration — moving from SMB file shares to S3-compatible storage
The infrastructure is the foundation that lets us move fast on product features.
Have questions about our infrastructure? We enjoy talking about this stuff. Get in touch.