Bonded Tenant Data Path
- bond0 — 2 NICs in LACP/MLAG across two Arista ToRs
- Carries VM data, mgmt, cluster ring 0, Ceph Public VLAN
- MTU 9000 jumbo frames end-to-end
- Hitless single-NIC failure, switch-level fault tolerance
Two compute tiers. Two storage models. One design principle: performance where it matters, affordability where it fits, and predictable recovery backed by a strict 50% maximum-provisioning ceiling.
Cloud Propeller is a purpose-built enterprise cloud that favors predictability over scale.
What runs beneath the VMs matters. This page walks through the layers of Cloud Propeller’s platform architecture: Proxmox VE at the foundation, the host architecture with data, control, and storage planes distributed across four NICs for layered redundancy, and the compute and storage tiers that deliver two distinct price/performance options under one shared design philosophy.
For Cloud Propeller’s first nine years, every host we operated ran VMware ESXi. In mid-2024, just a few months after Broadcom’s acquisition of VMware, we set ESXi aside and re-tested every serious hypervisor platform on the market. What followed was a twelve-month evaluation across raw performance, operational behavior under load, licensing economics, and day-to-day operator experience — the kind of test cycle most providers never run unless they are forced to come up with a new solution.
Proxmox VE emerged as the clear winner — not just in our testing, but in the real-world test workloads our clients helped validate.
One of Cloud Propeller’s core differentiators has always been our focus on high-frequency CPU performance over hyperscale-style density. When we built our previous-generation platform (Gen3), now serving as our General Purpose Compute (GPC) tier, we chose Intel® Xeon® Gold 6246R processors because they delivered an unusually high 3.4 GHz base clock for a 16-core server CPU — exactly the kind of per-core performance profile we wanted, while still keeping core counts practical under VMware’s core-based licensing model.
The move to Proxmox VE came at exactly the right time. As we were designing our next-generation Mission Critical Compute (MCC) platform, Intel introduced the kind of processor we had been waiting for: the Intel® Xeon® 6745P, a 32-core, all-performance-core CPU capable of running at a 3.6 GHz base clock in Intel® SST-PP compute-optimized mode. In the server world, that combination is exceptional: high core count without giving up high base frequency. VMware’s per-core licensing would have penalized that choice precisely because it delivered more cores. Proxmox VE turned that equation around, letting us choose the CPU architecture we actually wanted.
That combination — Proxmox VE’s licensing model and the 6745P’s performance profile — lets our MCC platform deliver exceptional per-core performance, more total compute headroom, and better economics without falling back to slower, density-first CPU choices designed around provider consolidation instead of client performance.
On top of that foundation, our Cloud Manager portal, based on MultiPortal, gives clients a cleaner single pane of glass for everyday cloud operations: faster provisioning, simpler VM lifecycle management, clearer resource visibility, and a more agile operating experience than the legacy VMware Cloud Director model allowed.
Every Cloud Propeller host has four NICs — deliberately wired so the busiest, most critical traffic gets the most bandwidth and fault-tolerance, and any failure that does occur stays small, bounded, and predictable by design.
Both of our Platforms, Mission Critical Compute (MCC) and General Purpose Compute (GPC), are wired identically. What differs between them (besides compute) is NIC speed (100 Gbps vs. 10 Gbps), and primary storage (Ceph vs. iSCSI).
The first two NICs are bonded into bond0, an LACP/MLAG across two Arista DCS-7060CX2-32S top-of-rack switches. Bond0 carries the most critical and busiest traffic on the host — VM tenant data, the Management VLAN and Cluster Heartbeat Ring 0, and (on MCC) the Ceph Public VLAN — all on the same high-bandwidth, dual-NIC fault-tolerant path.
Third and fourth NICs are unbonded. NIC 3 carries a second Management VLAN, Cluster Heartbeat Ring 1, and iSCSI Fabric A, and NIC 4 is in charge of Cluster Heartbeat Ring 2 and iSCSI Fabric B.
On MCC, Ceph is the production storage. The Ceph Public VLAN rides bond0; the Ceph Private VLAN, used for OSD↔OSD replication, rides NIC 4. If NIC 4 drops, replication and backfill on that node stop and its OSDs are marked degraded until the link returns, however, production I/O on bond0 keeps running because Ceph Public traffic is unaffected. iSCSI Fabrics A and B are wired and available, but reserved as tier-2 (optional).
On GPC, iSCSI is the active and only storage path — multipathed across both fabrics to the HPE Alletra SAN. Ceph isn’t deployed; NIC 4 simply carries Cluster Heartbeat Ring 2 and iSCSI Fabric B.
Conceptually, we have three intentional traffic planes sharing NICs by design: a data plane on bond0, a control plane spread across all four NICs, and a storage plane on bond0 for Ceph, and NICs 3 and 4 for iSCSI. Our clusters are designed to tolerate the loss of any physical NIC (both ports), any optic, or even an entire top-of-rack switch — and continue to function as if nothing had happened.
MCC and GPC are powered by the same hypervisor, have similar host architecture, and ride over the same network fabric. Silicon, memory, storage architecture, and host networking are the levers that differentiate them.
| Specification | Gen 4 MCC Mission Critical Compute | Gen 3 GPC General Purpose Compute |
|---|---|---|
| Platform generation |
4th Generation Cloud Propeller Architecture
6th-gen Intel® Xeon® Family |
3rd Generation Cloud Propeller Architecture
2nd-gen Intel® Xeon® Scalable Family |
| Hypervisor | Proxmox VE 9.1.x (KVM + LXC, native Linux) | |
| CPU | Intel® Xeon® 6745P (Granite Rapids, “P”-variant high-clock, all performance cores — no efficiency or low-priority cores) — Two CPUs per host | Intel® Xeon® Gold 6246R (Cascade Lake Refresh, all performance cores — no efficiency or low-priority cores) — Two CPUs per host |
| CPU speed | 32 cores @ 3.6 GHz base (running in SST-PP compute-optimised mode, 4.1 GHz max turbo) | 16 cores @ 3.4 GHz base (4.1 GHz max turbo) |
| Memory |
DDR5 ECC · 6400 MT/s
2.3 TB (96 GB × 24 DIMMs) per host |
DDR4 ECC · 2933 MT/s
1 TB (64 GB × 16 DIMMs) per host |
| Storage architecture |
All-flash NVMe Ceph, triple-mirror replication
180 TB (12 × 15 TB SSDs) per host |
HPE Alletra NVMe over iSCSI, triple+ parity RAID
Dual-controllers, 8 × 25 Gbps per controller |
| Host networking | 4 × 100 Gbps NICs (2 bonded into 200G LAG, 2 dedicated to storage + cluster paths) | 4 × 10 Gbps NICs (2 bonded into 20G LAG, 2 dedicated to storage + cluster paths) |
| Top-of-Rack | Arista DCS-7060CX2-32S Switches (redundant) | |
| Uplink to core | 200G ToR-to-core uplink | |
| L3 Core | Extreme Networks MLXe-8 Routers (redundant) | |
| Provisioning ceiling | 50% (hard-cap, no over-provisioning) | |
| Uptime SLA | 99.9999% (six nines) | 99.99% (four nines) |
| Billing models | Pay-As-You-Go (5-min granularity) + Dedicated Capacity | |
| Recommended for | Mission-critical production, high-throughput, low-latency workloads | Cost-sensitive, general-purpose enterprise workloads, dev/test, batch |
| Starting price | $180 /month | $90 /month |
Storage defines how a cloud platform behaves under pressure. We take two different paths between the tiers — both fast under load, both engineered to keep tenant I/O running through hardware failures.
Our Mission Critical Compute (MCC) platform runs all-flash NVMe Ceph with triple-mirror replication across the cluster. There is no central storage controller to fail and no RAID rebuild window to wait through — if a disk or even an entire host disappears, Ceph re-replicates the missing copies onto remaining capacity in the background while tenant I/O keeps running.
Capacity, throughput, and parallel client count all scale with cluster size — adding NVMe hosts grows the three together, with no central controller to bottleneck later. Each of our MCC hosts contains 12 × 15 TB Kioxia CM-7 enterprise NVMe SSDs and contributes 180 TB raw to the underlying cluster. Public I/O (tenant data) rides the 200 Gbps MLAG, while private I/O (back-end OSD↔OSD replication) runs on its own dedicated 100 Gbps NIC — isolating tenant traffic from replication storms so neither can crowd the other.
Our General Purpose Compute (GPC) platform runs on HPE Alletra (aka Nimble) NVMe SAN over iSCSI — mature enterprise storage with redundant controllers and triple+ parity protection. The recovery model is different (controller failover plus parity reconstruction) but the operational behavior is the same predictable, well-understood pattern enterprise storage teams already know and trust.
Alletra’s controller pair runs active/passive — one controller serves I/O at the full 200 Gbps front-end (8 × 25 Gbps) while the other stands by, ready to take over. Any active-controller fault triggers automated failover in under 18 seconds, with multipath drivers transparently rerouting I/O to the standby. HPE’s InfoSight platform watches the array’s behavior in production and surfaces the failure modes that matter before they impact tenant I/O.
Every cluster — MCC and GPC, Pay As You Go and Dedicated — is capped at 50% of its design capacity. Not as an aspirational target. As an actual line we do not cross.
Hyperscalers can get away with aggressive over-provisioning because, at their scale, spare capacity and noisy neighbors get averaged out statistically across enormous fleets. But that does not mean every individual workload gets a clean host, consistent neighbors, or predictable performance. Anyone who has rebooted an instance hoping to land on better underlying hardware understands the difference.
We do not operate at that scale, and we do not want to. Instead, we buy headroom into the design: a 50% ceiling means a host losing a peer does not tip anyone into contention, a maintenance window does not concentrate load, and an unexpected burst has room to breathe.
That headroom is what makes Cloud Propeller feel hyperscale to our own clients. A workload can double overnight — and in real cases, we have had clients grow many times beyond their original footprint within a span of just a few days — without our platform running into a capacity wall of any kind. The room is already built in.
It is more expensive to operate infrastructure with this much headroom, but it also means the platform behaves the way it was architected — under load, during maintenance, and in the worst five minutes of the worst day of the year.
Stand up a Pay As You Go VDC and evaluate Cloud Propeller under your own workloads — no long-term commitment, no setup fees.
Uptime SLAs are expressed as a percentage of total time the service is available. The “nines” shorthand counts how many 9s precede any non-9 digit. Below: the maximum allowed downtime for each tier, by time window.
| Period | Six Nines 99.9999% |
Four Nines 99.99% |
|---|---|---|
| Daily | 0.086s | 8.6s |
| Weekly | 0.6s | 1m 0.48s |
| Monthly | 2.6s | 4m 23s |
| Quarterly | 7.9s | 13m 8.9s |
| Yearly | 32s | 52m 36s |
Reference data: uptime.is