How Kilimo built a hybrid cloud network without splitting the platform

Start Your Project

Service Image

Case Study

When credits land in a second cloud, you have a choice: rebuild everything, or build the network that makes both clouds feel like home. Here's what we did.

Many founders assume multi-cloud means redesigning the platform. It doesn't — and it shouldn't. Hybrid architectures are common in mature companies; they're rare in startups because it's easier to teach a Series A startup how to optimize for one cloud than to manage two. But when you've already optimized for one, and new capital (in credits) is contingent on using another, the math changes. You preserve what works, extend what you've built, and move forward. At Renaiss, we help founders make that call technically sound — and profitable.

Why build hybrid instead of rebuild

Many founders assume multi-cloud means redesigning the platform. It doesn't — and it shouldn't. Hybrid architectures are common in mature companies; they're rare in startups because it's easier to teach a Series A startup how to optimize for one cloud than to manage two. But when you've already optimized for one, and new capital (in credits) is contingent on using another, the math changes. You preserve what works, extend what you've built, and move forward. At Renaiss, we help founders make that call technically sound — and profitable.

1. IP allocation

Get this right before anything else; it's the cheapest mistake to avoid and the most expensive to fix later.

  • No overlapping address space anywhere. Across both clouds, on-prem, and — critically — the Kubernetes pod and service ranges, every CIDR must be globally unique. Overlap forces you into NAT or renumbering, and both are miserable.
  • Allocate in summarizable blocks. Carve a large supernet and hand out aggregatable ranges per cloud (e.g. a /12 for AWS, a separate /12 for Azure) so route tables and BGP advertisements stay compact. Govern it from real IPAM tooling, not a spreadsheet.
  • Watch the Kubernetes IP appetite. EKS with the VPC CNI gives every pod a routable VPC IP and can consume tens of thousands of addresses; AKS forces an early CNI choice (Azure CNI puts pods on the VNet and is routable cross-cloud, while overlay/kubenet hides them and isn't).
  • Make pod CIDRs routable and non-overlapping if pods talk across clouds. Direct pod-to-pod reachability is only possible when both sides use routable, conflict-free pod ranges.

2. Connectivity fabric and appliances

Two layered decisions: how the two clouds physically connect, and what sits in the path to route and inspect traffic.

  • How the clouds connect. IPsec VPN over the internet is cheap and fast to stand up but has per-tunnel ceilings and public-internet variability; dedicated Direct Connect + ExpressRoute joined at a cloud exchange (Megaport, Equinix) buys predictable latency and throughput at higher cost and lead time.
  • What sits in the path. Cloud-native fabric (Transit Gateway + Virtual WAN) auto-scales with minimal ops but is L3-only; managed firewalls (AWS Network Firewall, Azure Firewall) add inspection; third-party NVAs (Palo Alto, Fortinet) give consistent L7 inspection and one policy model across both clouds, at the cost of licensing and owning HA, scaling, and patching yourself.
  • Route dynamically and symmetrically. Use BGP for failover rather than static routes, and design symmetric paths — asymmetric routing through a stateful appliance is the classic silent hybrid outage.

For Kilimo we stayed fully cloud-native: an IPsec VPN between AWS Transit Gateway and Azure Virtual WAN, with no third-party appliances. Their traffic volumes and inspection requirements didn't justify NVAs or a dedicated circuit, and keeping the interconnect on managed appliances meant no HA, patching, or licensing burden to carry — the cloud providers own that.

3. Hybrid DNS for private zones

A workload in EKS must resolve an Azure private name (and vice versa) without leaking the query to public DNS.

  • Conditional forwarding in both directions. Use Route 53 Private Zones with Resolver inbound/outbound endpoints on the AWS side and Azure Private DNS with the DNS Private Resolver on the other, each forwarding the peer cloud's subdomain to the peer's inbound endpoint.
  • Give each cloud a distinct subdomain. Separate namespaces keep forwarding rules clean and avoid collisions.
  • Make resolver endpoints HA across AZs. A single resolver endpoint is a single point of failure for all cross-cloud name resolution.
  • Don't forget the cluster layer. CoreDNS in each cluster needs forward rules so workloads inside EKS and AKS actually reach the right resolver.

4. Private PKI with a shared trust root

In a cross-cloud microservices platform you'll often want mTLS or internal TLS — which means one coherent trust hierarchy, not a pile of self-signed certs.

  • One root, per-cloud intermediates. Distribute the same root bundle to both clouds so an EKS workload trusts a certificate issued in AKS.
  • Prefer a cloud-neutral issuer. HashiCorp Vault gives one consistent issuance and rotation model rather than stitching together ACM PCA and an Azure CA.
  • Automate distribution and rotation. Point cert-manager at the shared issuer in each cluster so certificates are short-lived and self-renewing.

For Kilimo this didn't apply — their cross-cloud requirements didn't call for service-to-service mTLS, so we deliberately kept a private PKI out of scope rather than build something unused. We include it here because most cross-cloud platforms reach for it eventually, and it's far cheaper to plan the trust hierarchy early than to retrofit it.

5. Cluster-to-cluster communication

This is the question that ties the whole design together: how does a service in EKS actually reach a service in AKS? There are three broad answers, in ascending order of capability and operational weight.

  • Direct pod-to-pod (routable pods) — the lowest-latency path and the most coupled. It demands routable, non-overlapping pod CIDRs on both sides and ties each cluster to the other's internal addressing, so a change on one side can ripple across the interconnect.
  • Internal load balancers — each cluster fronts its cross-cloud services with an internal LB (an internal NLB in AWS, an internal Standard Load Balancer in Azure) reachable over the private interconnect. The calling cluster targets a stable LB endpoint rather than individual pods, so the two clusters stay decoupled from each other's pod-level details.
  • Multi-cluster service mesh — Istio, Cilium Cluster Mesh, or Linkerd. The richest option (mTLS, traffic shifting, locality-aware routing) and the heaviest to run.

For Kilimo we chose internal load balancers. Only a handful of services needed to talk across clouds, and the internal-LB model gave us the cleanest balance of simplicity and control: the clusters stay decoupled, we don't need to make pod CIDRs mutually routable, and the LB endpoints pair naturally with the conditional DNS forwarding from point 3 — a service simply resolves the peer's private name and gets a stable address on the other cloud. The trade-offs are an extra network hop and owning LB health and endpoint configuration ourselves, both of which were easy prices to pay versus standing up and operating a cross-cloud mesh for a small set of flows.

What we built for Kilimo

The topology: a managed hub in each cloud, an IPsec interconnect between Transit Gateway and Virtual WAN carrying BGP, conditional DNS forwarding between the two resolvers, and internal load balancers as the cross-cloud service entry points. No third-party appliances, no private PKI.

A cross-cloud call follows the dotted and solid lines together: a pod in EKS resolves the Azure service name (conditional forwarding hands it the internal LB address), the traffic crosses the IPsec link via Transit Gateway and Virtual WAN, and lands on the Azure internal load balancer fronting AKS — and symmetrically in reverse.

What Kilimo learned

Kilimo now orchestrates water management intelligence across AWS and Azure without platform fragmentation. Latency is predictable, failure modes are understood, and the cost per byte of data moved across the interconnect is known and acceptable. The Azure credits, which arrived as an obligation, became a strategic asset.

The lesson is broader: hybrid doesn't mean compromise. It means deliberate choices made early. It means knowing your IP space before you need it, routing traffic symmetrically before you have an outage, and thinking about mTLS before you're in a compliance audit.

If you're building climate tech, fintech, or any infrastructure-heavy startup and you're facing the same decision — credits in one cloud, architecture in another, a mission that won't wait — let's talk about how to scale without splitting.

IP ALLOCATION

IP ALLOCATION

Plan your address space before you connect: no overlaps, aggregatable CIDRs, and know Kubernetes' IP appetite. The cheapest mistake to avoid, the most expensive to fix later. Kilimo learned this the hard way.

CONNECTIVITY FABRIC AND APPLIANCES

CONNECTIVITY FABRIC AND APPLIANCES

IPsec, dedicated circuits, or fully managed cloud fabric. Decide what sits in the path before traffic arrives. For Kilimo: cloud-native. Transit Gateway and Virtual WAN, no third-party appliances, no licensing burden.

HYBRID DNS FOR PRIVATE ZONES

HYBRID DNS FOR PRIVATE ZONES

A workload in EKS resolves Azure service names without leaking queries to public DNS. Conditional forwarding in both directions, resolver endpoints in HA across AZs, CoreDNS rules in every cluster.

PRIVATE PKI WITH A SHARED TRUST ROOT

PRIVATE PKI WITH A SHARED TRUST ROOT

One certificate hierarchy that crosses clouds: a shared root, per-cloud intermediates. Kilimo didn't need mTLS, so we left this out. But it's cheaper to plan early than retrofit later.

CLUSTER-TO-CLUSTER COMMUNICATION

CLUSTER-TO-CLUSTER COMMUNICATION

One certificate hierarchy that crosses clouds: a shared root, per-cloud intermediates. Kilimo didn't need mTLS, so we left this out. But it's cheaper to plan early than retrofit later.

The five decisions that made Kilimo's hybrid network possible

Start Your Project

INFRASTRUCTURE MAPPING

01 / 05

CONNECTIVITY DESIGN

02 / 05

DNS AND NAME RESOLUTION

03/ 05

LOAD BALANCER ORCHESTRATION

04 / 05

TESTING AND HANDOFF

05 / 05

INFRASTRUCTURE MAPPING

We audited Kilimo's existing AWS architecture, mapped the workloads that would cross clouds, and calculated how much IP space each cloud needed. The goal was simple: know exactly what you're building before you build it. Most teams skip this. We don't.

CONNECTIVITY DESIGN

We modeled three scenarios: IPsec VPN over the internet, dedicated circuits via Equinix, and cloud-native managed fabric. For Kilimo's traffic volume and inspection needs, IPsec between Transit Gateway and Virtual WAN was the answer. No third-party appliances, no licensing overhead.

DNS AND NAME RESOLUTION

We built conditional forwarding so an EKS pod can resolve an Azure service name and vice versa, without leaking queries to public DNS. Route 53 Private Zones on one side, Azure Private DNS on the other, with resolver endpoints in HA across availability zones.

LOAD BALANCER ORCHESTRATION

We fronted the cross-cloud services with internal load balancers — an NLB in AWS, a Standard LB in Azure. The clusters stay decoupled from each other's pod-level details. A service resolves the peer's private name and gets a stable endpoint. Simple, predictable, operationally sound.

TESTING AND HANDOFF

We validated latency, failover, and symmetric routing before handing over to Kilimo's ops team. Documented the BGP topology, the DNS rules, and the cost per byte crossing the interconnect. The platform was live, the team understood it, and it stayed live.

What is nearshore software development?

What time zone does Renaiss operate in?

What cloud services does Renaiss specialize in?

Do you work with AWS, Azure, or GCP?

Can Renaiss help us modernize a legacy application?