engineeringinfrastructuredevops

About Our Infrastructure

An overview of the engineering decisions behind MoldPlan — container orchestration, automated deployments, observability, and multi-tenant isolation.

MoldPlan Engineering2026-02-065 min read

Why We Built Our Own Platform

MoldPlan runs on-premise at each customer's facility — factory data stays in the factory. Every environment we manage is provisioned, deployed, and monitored through the same automated pipeline.

This post walks through some of the engineering decisions that make that possible.

Container Orchestration

We use HashiCorp Nomad for container orchestration.

MoldPlan deployments typically run on a single powerful node per customer. Nomad gives us container orchestration, health checking, rolling updates, and service discovery with low operational overhead. One binary, one config file.

Our services — APIs, workers, databases, message brokers, identity providers — all run as Nomad jobs with clearly defined resource constraints, health checks, and restart policies.

service {
  name = "moldplan-api"
  port = "http"
  check {
    type     = "http"
    path     = "/health"
    interval = "10s"
    timeout  = "2s"
  }
}

Simple, readable, and it just works.

Infrastructure as Code — All 40+ Roles of It

Every piece of our infrastructure is defined in Ansible — DNS, reverse proxies, databases, monitoring stacks, certificate management, firewall rules, identity providers.

We maintain over 40 reusable Ansible roles, organized into a shared library that works across customer environments, development setups, and our own internal infrastructure. A single deployment command can provision an entire environment from bare metal to production-ready:

./deploy.sh -e production -c customer-name -a all

That all expands into the correct deployment order — databases first, then identity, then gateway, then application services. Each role handles its own health validation before the next one begins.

Observability

Every environment we manage runs a full observability stack:

Prometheus collects metrics from every service, node, and Nomad job
Grafana provides dashboards for operations and customer-facing health views
Loki aggregates logs from all containers with structured labels
Alertmanager routes alerts to the right people based on severity and time

Monitoring is part of the Ansible deployment pipeline. When a new service deploys, its metrics endpoint is automatically discovered, its dashboards are provisioned, and its alert rules are active. Zero manual configuration.

Multi-Tenancy That Actually Isolates

Each customer gets:

Their own Nomad cluster running on dedicated hardware inside their facility
Isolated networking via Tailscale mesh VPN with independent connectivity per environment
Independent databases — SQL Server, MongoDB, and PostgreSQL per environment
Separate identity providers — each customer has their own Keycloak realm with OAuth2/OIDC
Customer-specific certificates — automated renewal with multiple modes (download, generate, or full lifecycle management)

We can manage dozens of independent customer environments from a central control plane without any of them knowing about each other.

Deployment Pipeline

Our CI/CD runs on GitLab with a modular component architecture. When code is pushed:

Change detection determines which services are affected
Docker images are built and pushed to our container registry
Ansible playbooks deploy the updated services to the target environment
Post-deployment checks verify health, connectivity, and service availability

A typical deployment from commit to production takes minutes. Every deployment is codified, so rollbacks are straightforward — point to the previous image tag and re-run.

The Networking Layer

Connecting on-premise customer environments to our management plane requires care. We use Tailscale to create an encrypted mesh network that spans every environment we manage.

On top of that, Caddy serves as our reverse proxy with automatic TLS certificate management. Consul provides service discovery, so services find each other by name. CoreDNS handles internal DNS resolution.

Our developers can securely reach any service in any customer environment, connecting by name.

Certificate Lifecycle

Our cert-renewal service fully automates TLS certificate management across three modes:

Download mode — fetches wildcard certificates from Azure Blob Storage
Generate mode — creates certificates locally with a customer-specific CA
Renew mode — full lifecycle management with renewal and redistribution

Certificates are rotated automatically. The reverse proxy hot-reloads new certificates. Expiry monitoring alerts fire 30 days before any certificate expires.

Secret Management

We use Ansible Vault for infrastructure secrets and HashiCorp tools for service-level configuration. Every sensitive value is encrypted at rest and decrypted only at deployment time.

Customer credentials, database passwords, API keys, OAuth client secrets — all managed through a unified workflow.

What We Get From All This

Build infrastructure like this and the day-to-day becomes surprisingly calm:

New customer onboarding is a single playbook run — about an hour from bare hardware to production
Updates ship to any environment with one command, during business hours, with zero downtime
Incidents are usually detected by monitoring first
Debugging happens remotely through secure tunnels, with full access to logs, metrics, and dashboards

Manufacturing software reliability directly impacts factory output. The infrastructure quality defines the level of service we can deliver.

Looking Ahead

We're continuously evolving this platform. Current areas of focus:

AI-powered diagnostics — automated root cause analysis for production issues
Edge computing — running ML models closer to the factory floor for real-time predictions
Object storage migration — moving from SMB file shares to S3-compatible storage

The infrastructure is the foundation that lets us move fast on product features.

Have questions about our infrastructure? We enjoy talking about this stuff. Get in touch.