Our infrastructure is 3x redundant, self-checking, self-healing and distributed over several availability zones to ensure high availability.
We use a Kubernetes cluster, that:
- Stretches over two different Google Cloud availability zones
- Runs every production deployment on at least one instance per zone
- Features a synchronized master in each zone.
Our cluster has automatic health checks on both hard- and software. This ensures the appropriate distribution of load across all healthy machines and automatic discovery, isolation, and replacement of faulty components resulting in very high reliability.
Our database servers have a standby copy in a second availability zone with automatic handover in the case of catastrophic failure and a separate backup mechanism allowing us to go back to any point in time within the last 7 days.