Case Study - Cloud Exit: From €200K/yr Cloud Bills to Owned Infrastructure

How we helped a European hosting company reclaim control of its infrastructure — moving off cloud managed services onto a self-operated, on-premises Kubernetes cluster with full data sovereignty.

Client: Medium-Size Hosting Provider
Year: 2025
Service: Infrastructure Architecture & Cloud Repatriation

The Problem with "Just Use the Cloud"

The cloud is genuinely convenient — until the bill arrives. For this client, a profitable European hosting provider running a mix of SaaS products and managed infrastructure services, the cloud had become a cost centre nobody could explain. Managed Kubernetes on two providers. Object storage priced by the egress byte. Databases as a service with opaque pricing tiers. The total was pushing €200,000 a year and trending up.

Beyond the cost, there were structural concerns. Customer data was distributed across multiple cloud regions with inconsistent residency guarantees. GDPR compliance was technically met on paper, but the operational reality — data moving across provider regions, limited audit trails, shared tenancy — made the compliance team uncomfortable. And there was a harder question underneath: if a cloud provider changed their pricing or deprecated a service, what would they do?

The CTO had read Basecamp's writing on their cloud exit. It resonated. But Basecamp is an engineering company that can operate its own infrastructure. This client needed help getting there.

What We Did

Capacity Planning

Before touching any hardware, we mapped existing workloads: what ran where, actual resource consumption versus what they were paying for, and realistic growth over three years. Cloud bills are a poor proxy for compute needs — managed services carry significant overhead. The actual workload fit comfortably within a cluster a fraction of the managed cost.

We sized a 5-node cluster around peak load with room to grow. EPYC processors, high-density RAM, and NVMe drives per node feeding Ceph directly. 51TB usable storage with 3-way replication. Enough for current needs, expandable without re-architecting.

Hardware Procurement & Deployment

The hardware decision was about reliability, not specs. Enterprise servers with full vendor support coverage — next-business-day onsite. Hardware failure at 3am is not a software problem; it needs a truck roll, not a Slack message.

We specified redundant power, out-of-band management, and dedicated network separation: management, VM traffic, Ceph storage, and live migration each on their own isolated paths. No single point of failure at the network layer. The hardware was racked and cabled by the colocation partner. We validated the physical layer before touching a config file.

ProxMox Cluster

ProxMox was the virtualisation layer. Not because it's fashionable — it isn't — but because it's operationally honest. No vendor lock-in, solid live migration, Ceph integration that works, and a community that documents failure modes rather than hiding them. The client's team would need to operate this for years. They needed something they could understand and fix.

Five nodes, Ceph across all of them, VLAN separation throughout. The cluster was live and stable before Kubernetes entered the picture.

Kubernetes on Bare Metal

Kubernetes ran as VMs on top of ProxMox, not on bare metal directly. This was a deliberate choice: it preserves live migration for maintenance without disrupting the cluster, and keeps infrastructure concerns separate from workload concerns. The platform team manages Kubernetes. The infra team manages ProxMox. Neither needs deep knowledge of the other's domain.

MetalLB for load balancer IPs. Cilium as CNI — chosen for its observability and network policy capabilities. ArgoCD for GitOps delivery. Ceph RBD and CephFS storage classes for persistent volumes.

Migration

The migration followed a blue-green pattern: new cluster ran in parallel with cloud, workloads moved service by service with validation gates between each. No big-bang cutover. DNS TTLs pre-staged weeks in advance. The final cutover was a non-event.

Security & Compliance

Network segmentation from day one: management traffic never touches workload traffic. VPN for all remote access. No public SSH. Certificates managed internally with short rotation cycles.

Data residency became concrete: all customer data in a single, known physical location. The compliance team got an actual data processing register they could stand behind — not a spreadsheet of cloud regions with asterisks.

Monitoring

Prometheus and Grafana across the full stack — ProxMox nodes, Kubernetes, storage, network. Alert routing via PagerDuty. The client now knows when a disk is degrading before Ceph marks it failed, and when a node is memory-pressured before a pod gets OOMkilled. With managed cloud, they had dashboards with no context. Now they have signal.

The Result

Infrastructure cost reduction: ~50%
120 cores / 2.5TB RAM: 5 nodes
Usable distributed storage (Ceph 3×): 51TB
Data in known physical location: 100%

The OpEx-to-CapEx shift changed the financial conversation. Cloud spend was a recurring line item that grew with usage and resisted forecasting. Owned hardware is a capital investment that amortises over five years, with predictable costs for colocation, power, and maintenance. The total cost of ownership is substantially lower — and the client owns the asset at the end.

More practically: the team operates infrastructure they understand. When something breaks, they know how to fix it. That operational confidence is worth more than any single cost line.

What We Used

ProxMox VE
Ceph
Kubernetes
Cilium
MetalLB
ArgoCD
Prometheus
Grafana

Our offices

Follow us