Cloud computing has well and truly made its mark on the business world, with most business leaders now accepting it as a part of the modern workplace. In fact, people are becoming so used to cloud as a way of life, that when it comes to the resilience of their cloud environment, they’re risking complacency.
Increasingly, there is a belief amongst business owners that once data is in the cloud - it’s safe. A myth perhaps created by the vast array of marketing messages pushed out by cloud providers. And while this may well be the case for some of the enterprise public cloud providers, it’s important to give serious consideration to resilience when exploring other cloud avenues.
Cloud solutions are capable of protecting against and dealing with a potential catastrophe, but it’s important to know exactly what plans are in place and how the measures taken will cope. Cloud security, data management and cloud availability must be taken into account as part of a modern cloud strategy.
No system is free of failures. No matter how good the process, or the measures taken to address the risks, the worst can still happen. Which is why resilience is so important in a cloud hosting environment.
Resiliency is the ability to handle any failures gracefully, limiting damage and ensuring business continuity. And given that the cloud is being used increasingly as a disaster recovery tool, and not just “anytime/anywhere” access and data storage, then the need to have a full understanding of your cloud resilience is even greater.
Cloud hosting resiliency is actually a huge challenge. Where components compete for resources, and depend on other internal or external components or services that may fail, planning the way that those failures will be detected, logged, fixed and recovered forms an integral part of a cloud strategy.
Although it largely depends on the size and scope of the production workloads to be protected, there are three basic techniques that should be used to increase the resiliency of a cloud environment.
Checking and monitoring
Continuous review of the system to detect failures and make sure minimum specifications are met.
Checkpoint and restart
Process of restoration to the latest correct checkpoint and system recovery following a failure.
A system is replicated from a primary to a secondary location, ideally geographically separate, using additional resources to ensure it’s available at any time.
As ever, it’s a balance between risk and cost and finding an acceptable level of resilience to achieve the best results possible within a budget. Multiple machines used as replicas in the same cluster, or cluster replication in the same data centre would provide some resilience, but the data centre is your single point of failure. A more reliable scenario is the replication of systems in different data centres. This way, even with large outages, the resilience of a system can be guaranteed.
Active-active cloud is a failsafe way to replicate. Continuous, real-time replication to a secondary cloud platform through active-active technology will provide a seamless continuation of service in the event of a failover. Remember though, in this scenario mistakes and deletions are also replicated to the secondary platform, so a backup is still important here - don’t assume that a replicated cloud platform replaces your backup - it doesn’t.
So what about the data centres themselves? Well data centre tiers are a good way of helping you decide whether a particular provider offers a suitable level of resilience for your needs - Tier 1 being the lowest, and Tier 4 offering the greatest redundancy. It’s all about assessing the business need. While a Tier 1 data centre might leave you open to some risk, a Tier 4, for most businesses, can be an over investment.
Tier 1 = Non-redundant capacity components (single uplink and servers).
Tier 2 = Tier 1 + Redundant capacity components.
Tier 3 = Tier 1 + Tier 2 + Dual-powered equipments and multiple uplinks.
Tier 4 = Tier 1 + Tier 2 + Tier 3 + all components are fully fault-tolerant including uplinks, storage, chillers, HVAC systems, servers etc. Everything is dual-powered.
But a Tier rating isn’t always the whole picture. Especially when two or more data centres within the same Tier classification level can vary wildly in terms of actual protection and strength of design.
For example, one Tier 3 data centre may offer dual rack power from a single N+1 UPS bank on a single power distribution feed, whereas another may offer ‘real’ dual power to the rack with each feed delivered from diverse UPS banks, for far better levels of resilience and protection. So Tier ratings aside, you need to be asking the right questions...
Does the data centre power itself through separate utility feeds? Is there a redundant power supply, generator or UPS that will keep your data centre running through an outage? How long will it run for? How is the data centre cooled? What fire suppression system is in place? How is the facility managed on a day-to-day basis - maintenance practices and policies? Despite the advent of AI and bots, good data centres don’t run themselves, so ask the right questions, look at reputation, talk to existing customers and seek out testimonials or referrals.
And if you’re replicating to a secondary site through failover, is the secondary site offering the same level of resilience?
Think also about your workloads. While it makes sense to plan for a worst case scenario, you could increase cost efficiencies by moving less critical workloads to a lower redundancy scenario, whilst giving high priority to business critical data, assets and apps through an active-active cloud replication infrastructure.
Finally, consider connectivity. If you choose a data centre with an array of network providers and connectivity options on site, you’ll benefit from lower latency and higher performance services across a wider geographical area. What’s more, you’ll also benefit from greater resilience as a result of the diversity of connections to and from the facility.
While cloud computing can undoubtedly offer unprecedented levels of resilience, it isn’t simply the case that if you’re in the cloud, you’re in safe hands. There is more than one cloud, and it’s crucial that businesses seek out a cloud provider that offers the right levels of resilience and continuity for their particular needs, weighing up all the options and asking the right questions.