Platform Architecture

Engineered for Performance,
Designed for Resilience

Two compute tiers. Two storage models. One design principle: performance where it matters, affordability where it fits, and predictable recovery backed by a strict 50% maximum-provisioning ceiling.

MCC (Gen4) GPC (Gen3) Proxmox VE Ceph + HPE Alletra

Inside the Cloud Propeller Platform

Cloud Propeller is a purpose-built enterprise cloud that favors predictability over scale.

What runs beneath the VMs matters. This page walks through the layers of Cloud Propeller’s platform architecture: Proxmox VE at the foundation, the host architecture with data, control, and storage planes distributed across four NICs for layered redundancy, and the compute and storage tiers that deliver two distinct price/performance options under one shared design philosophy.

Hypervisor Foundation

After Nine Years on VMware, We Rebuilt on Something Better

For Cloud Propeller’s first nine years, every host we operated ran VMware ESXi. In mid-2024, just a few months after Broadcom’s acquisition of VMware, we set ESXi aside and re-tested every serious hypervisor platform on the market. What followed was a twelve-month evaluation across raw performance, operational behavior under load, licensing economics, and day-to-day operator experience — the kind of test cycle most providers never run unless they are forced to come up with a new solution.

Proxmox VE emerged as the clear winner — not just in our testing, but in the real-world test workloads our clients helped validate.

One of Cloud Propeller’s core differentiators has always been our focus on high-frequency CPU performance over hyperscale-style density. When we built our previous-generation platform (Gen3), now serving as our General Purpose Compute (GPC) tier, we chose Intel^® Xeon^® Gold 6246R processors because they delivered an unusually high 3.4 GHz base clock for a 16-core server CPU — exactly the kind of per-core performance profile we wanted, while still keeping core counts practical under VMware’s core-based licensing model.

The move to Proxmox VE came at exactly the right time. As we were designing our next-generation Mission Critical Compute (MCC) platform, Intel introduced the kind of processor we had been waiting for: the Intel^® Xeon^® 6745P, a 32-core, all-performance-core CPU capable of running at a 3.6 GHz base clock in Intel^® SST-PP compute-optimized mode. In the server world, that combination is exceptional: high core count without giving up high base frequency. VMware’s per-core licensing would have penalized that choice precisely because it delivered more cores. Proxmox VE turned that equation around, letting us choose the CPU architecture we actually wanted.

That combination — Proxmox VE’s licensing model and the 6745P’s performance profile — lets our MCC platform deliver exceptional per-core performance, more total compute headroom, and better economics without falling back to slower, density-first CPU choices designed around provider consolidation instead of client performance.

On top of that foundation, our Cloud Manager portal, based on MultiPortal, gives clients a cleaner single pane of glass for everyday cloud operations: faster provisioning, simpler VM lifecycle management, clearer resource visibility, and a more agile operating experience than the legacy VMware Cloud Director model allowed.

Host Architecture

Four NICs, Three Planes, One Design

Every Cloud Propeller host has four NICs — deliberately wired so the busiest, most critical traffic gets the most bandwidth and fault-tolerance, and any failure that does occur stays small, bounded, and predictable by design.

Both of our Platforms, Mission Critical Compute (MCC) and General Purpose Compute (GPC), are wired identically. What differs between them (besides compute) is NIC speed (100 Gbps vs. 10 Gbps), and primary storage (Ceph vs. iSCSI).

The first two NICs are bonded into bond0, an LACP/MLAG across two Arista DCS-7060CX2-32S top-of-rack switches. Bond0 carries the most critical and busiest traffic on the host — VM tenant data, the Management VLAN and Cluster Heartbeat Ring 0, and (on MCC) the Ceph Public VLAN — all on the same high-bandwidth, dual-NIC fault-tolerant path.

Third and fourth NICs are unbonded. NIC 3 carries a second Management VLAN, Cluster Heartbeat Ring 1, and iSCSI Fabric A, and NIC 4 is in charge of Cluster Heartbeat Ring 2 and iSCSI Fabric B.

On MCC, Ceph is the production storage. The Ceph Public VLAN rides bond0; the Ceph Private VLAN, used for OSD↔OSD replication, rides NIC 4. If NIC 4 drops, replication and backfill on that node stop and its OSDs are marked degraded until the link returns, however, production I/O on bond0 keeps running because Ceph Public traffic is unaffected. iSCSI Fabrics A and B are wired and available, but reserved as tier-2 (optional).

On GPC, iSCSI is the active and only storage path — multipathed across both fabrics to the HPE Alletra SAN. Ceph isn’t deployed; NIC 4 simply carries Cluster Heartbeat Ring 2 and iSCSI Fabric B.

Conceptually, we have three intentional traffic planes sharing NICs by design: a data plane on bond0, a control plane spread across all four NICs, and a storage plane on bond0 for Ceph, and NICs 3 and 4 for iSCSI. Our clusters are designed to tolerate the loss of any physical NIC (both ports), any optic, or even an entire top-of-rack switch — and continue to function as if nothing had happened.

Architectural Decision: Redundancy Over Peak Throughput

The bond0 LAG is built from two NICs on separate NUMA nodes and separate Arista ToR switches. Aggregate bandwidth is essentially unaffected — each NIC pushes line-rate to its local socket — and with careful IRQ pinning and NUMA-aware scheduling, the cost on individual TCP flows that land cross-socket is just 5–10% (vs. 25–30% on untuned hosts). What that small per-flow cost buys us is quite significant: switch-level fault tolerance, hitless maintenance behavior, and predictable degradation under load. Wherever necessary, we always pick fault tolerance over absolute maximum benchmark headroom.

Bonded Tenant Data Path

bond0 — 2 NICs in LACP/MLAG across two Arista ToRs
Carries VM data, mgmt, cluster ring 0, Ceph Public VLAN
MTU 9000 jumbo frames end-to-end
Hitless single-NIC failure, switch-level fault tolerance

Triple-Redundant Cluster

Corosync rings 0/1/2, one per NIC class (bond0, NIC 3, NIC 4)
Each ring intentionally shares hardware with a different traffic class
Rings can’t all fail to a single physical fault
Quorum survives any single-NIC outage

Ceph Storage Plane Design

Ceph Public rides bond0 — client-facing I/O on the redundant path
Ceph Private on NIC 4 — OSD↔OSD replication
Triple-replica writes; if NIC 4 drops, replication stops and OSDs go degraded — production I/O keeps running
Replication storms can’t crowd tenant traffic

iSCSI Storage Multipath

NIC 3 → iSCSI Fabric A; NIC 4 → iSCSI Fabric B
Multipath to HPE Alletra across both fabrics
200 Gbps (8 × 25 Gbps) per controller, MTU 9000
Automatic Controller Failover (< 18 seconds I/O pause)
Survives loss of either fabric end-to-end

Compute Tiers

Two Tiers. Same Foundation. Different Economics

MCC and GPC are powered by the same hypervisor, have similar host architecture, and ride over the same network fabric. Silicon, memory, storage architecture, and host networking are the levers that differentiate them.

Specification	Gen 4 MCC Mission Critical Compute	Gen 3 GPC General Purpose Compute
Platform generation	4th Generation Cloud Propeller Architecture 6th-gen Intel^® Xeon^® Family	3rd Generation Cloud Propeller Architecture 2nd-gen Intel^® Xeon^® Scalable Family
Hypervisor	Proxmox VE 9.1.x (KVM + LXC, native Linux)
CPU	Intel^® Xeon^® 6745P (Granite Rapids, “P”-variant high-clock, all performance cores — no efficiency or low-priority cores) — Two CPUs per host	Intel^® Xeon^® Gold 6246R (Cascade Lake Refresh, all performance cores — no efficiency or low-priority cores) — Two CPUs per host
CPU speed	32 cores @ 3.6 GHz base (running in SST-PP compute-optimised mode, 4.1 GHz max turbo)	16 cores @ 3.4 GHz base (4.1 GHz max turbo)
Memory	DDR5 ECC · 6400 MT/s 2.3 TB (96 GB × 24 DIMMs) per host	DDR4 ECC · 2933 MT/s 1 TB (64 GB × 16 DIMMs) per host
Storage architecture	All-flash NVMe Ceph, triple-mirror replication 180 TB (12 × 15 TB SSDs) per host	HPE Alletra NVMe over iSCSI, triple+ parity RAID Dual-controllers, 8 × 25 Gbps per controller
Storage performance	TBD — disk IOPS / throughput	TBD — disk IOPS / throughput
Host networking	4 × 100 Gbps NICs (2 bonded into 200G LAG, 2 dedicated to storage + cluster paths)	4 × 10 Gbps NICs (2 bonded into 20G LAG, 2 dedicated to storage + cluster paths)
Top-of-Rack	Arista DCS-7060CX2-32S Switches (redundant)
Uplink to core	200G ToR-to-core uplink
L3 Core	Extreme Networks MLXe-8 Routers (redundant)
Provisioning ceiling	50% (hard-cap, no over-provisioning)
Uptime SLA	99.9999% (six nines)	99.99% (four nines)
Billing models	Pay-As-You-Go (5-min granularity) + Dedicated Capacity
Recommended for	Mission-critical production, high-throughput, low-latency workloads	Cost-sensitive, general-purpose enterprise workloads, dev/test, batch
Starting price	$180 /month	$90 /month

Explore Mission Critical Compute Explore General Purpose Compute

Storage Options

Triple-Mirrored on MCC. Enterprise SAN on GPC

Storage defines how a cloud platform behaves under pressure. We take two different paths between the tiers — both fast under load, both engineered to keep tenant I/O running through hardware failures.

Our Mission Critical Compute (MCC) platform runs all-flash NVMe Ceph with triple-mirror replication across the cluster. There is no central storage controller to fail and no RAID rebuild window to wait through — if a disk or even an entire host disappears, Ceph re-replicates the missing copies onto remaining capacity in the background while tenant I/O keeps running.

Capacity, throughput, and parallel client count all scale with cluster size — adding NVMe hosts grows the three together, with no central controller to bottleneck later. Each of our MCC hosts contains 12 × 15 TB Kioxia CM-7 enterprise NVMe SSDs and contributes 180 TB raw to the underlying cluster. Public I/O (tenant data) rides the 200 Gbps MLAG, while private I/O (back-end OSD↔OSD replication) runs on its own dedicated 100 Gbps NIC — isolating tenant traffic from replication storms so neither can crowd the other.

Our General Purpose Compute (GPC) platform runs on HPE Alletra (aka Nimble) NVMe SAN over iSCSI — mature enterprise storage with redundant controllers and triple+ parity protection. The recovery model is different (controller failover plus parity reconstruction) but the operational behavior is the same predictable, well-understood pattern enterprise storage teams already know and trust.

Alletra’s controller pair runs active/passive — one controller serves I/O at the full 200 Gbps front-end (8 × 25 Gbps) while the other stands by, ready to take over. Any active-controller fault triggers automated failover in under 18 seconds, with multipath drivers transparently rerouting I/O to the standby. HPE’s InfoSight platform watches the array’s behavior in production and surfaces the failure modes that matter before they impact tenant I/O.

Gen 4 — NVMe Ceph

All-flash NVMe, triple-mirror replication
180 TB per host (12 × 15 TB SSDs)
No central storage controller — no SPOF
Auto-healing rebalance, no RAID rebuild window

Gen 3 — HPE SAN

HPE Alletra NVMe SAN over iSCSI
Redundant controllers with automatic failover
Triple+ parity RAID protection
Enterprise-class storage, well-understood operationally

Recovery Posture

Ceph self-heal (Gen 4) on disk or full-host loss
HPE controller failover (Gen 3) on controller fault
Both survive full-host failure with zero data loss
Optional Veeam backup & DR overlay

Our Design Principle

Why We Never Run Past 50% Utilization

Every cluster — MCC and GPC, Pay As You Go and Dedicated — is capped at 50% of its design capacity. Not as an aspirational target. As an actual line we do not cross.

Hyperscalers can get away with aggressive over-provisioning because, at their scale, spare capacity and noisy neighbors get averaged out statistically across enormous fleets. But that does not mean every individual workload gets a clean host, consistent neighbors, or predictable performance. Anyone who has rebooted an instance hoping to land on better underlying hardware understands the difference.

We do not operate at that scale, and we do not want to. Instead, we buy headroom into the design: a 50% ceiling means a host losing a peer does not tip anyone into contention, a maintenance window does not concentrate load, and an unexpected burst has room to breathe.

That headroom is what makes Cloud Propeller feel hyperscale to our own clients. A workload can double overnight — and in real cases, we have had clients grow many times beyond their original footprint within a span of just a few days — without our platform running into a capacity wall of any kind. The room is already built in.

It is more expensive to operate infrastructure with this much headroom, but it also means the platform behaves the way it was architected — under load, during maintenance, and in the worst five minutes of the worst day of the year.

Put the Platform to Work

Stand up a Pay As You Go VDC and evaluate Cloud Propeller under your own workloads — no long-term commitment, no setup fees.

Start with MCC Start with GPC

Uptime SLAs are expressed as a percentage of total time the service is available. The “nines” shorthand counts how many 9s precede any non-9 digit. Below: the maximum allowed downtime for each tier, by time window.

Period	Six Nines 99.9999%	Four Nines 99.99%
Daily	0.086s	8.6s
Weekly	0.6s	1m 0.48s
Monthly	2.6s	4m 23s
Quarterly	7.9s	13m 8.9s
Yearly	32s	52m 36s

Reference data: uptime.is