As we migrate applications and/or data to public cloud storage, a big question needs to be asked: how is my data protected?
Cloud providers are singularly focused on making the infrastructure that holds your data highly available via extremely resilient design. They do not give the same priority to your data, however.
A glance at any of the cloud providers’ terms will reveal a “shared responsibility” statement. This outlines the responsibilities you and the provider have when it comes to their services. Usually, they are responsible for the infrastructure, but the data is your responsibility.
What does this mean for the protection of cloud-hosted data?
We need to assess the data and ask these questions: How important is it? What is the impact of loss? How quickly do we need to recover it? Knowing the answers to these questions allows us to start to identify the right strategy.
So, what options can we expect from the major public clouds?
Cloud providers are no different from the established enterprise storage suppliers in the methods they employ.
Core to cloud data protection capabilities are snapshots and backups which, although different technology approaches, both deliver the same result – a point-in-time data copy that can be restored in the event of something impacting our production dataset.
To choose the correct mix of technologies to protect our data, we need to understand how these technologies differ and the restoration options they provide.
Snapshots are about speed. They allow an almost instantaneous copy of a dataset to be taken in a live environment. This copy can then be made available for recovery, but also to other systems – often as a clone copy – for testing and development, for example.
Snapshots are standard in most enterprise storage systems, and are often seamlessly integrated with technology such as Windows VSS to ensure we get rapid and consistent data copies.
But it is not quite this straightforward when it comes to cloud-based snapshots.
Firstly, cloud snapshots are not as simple as those we may be used to on-premise.
They often require some coding to automate their creation. We must also be aware that applications may be writing to disk when we create a snapshot. Whereas in the enterprise there will often be an integration between the host operating system (OS) and storage to ensure consistency, this is not the case in the cloud. That puts the onus on us to manage our data consistency.
We must also be aware of the cost. Snapshots consume cloud capacity, which includes any replications of snapshots to other zones and regions.
Immediacy of recovery is where snapshots excel. The ability to quickly re-present our data to an environment is hugely valuable, especially when faced with tight recovery time objectives. We can do this because, as with most snapshot technology, those copies can only exist and be used inside the environment in which they are created. For example, an EBS snapshot in Amazon Web Services (AWS) can only be recovered to an EBS environment, not restored to an alternative service.
This inability to hold snapshots in alternative environments also leads to the much-debated question about whether snapshots are truly interchangeable with backups.
That is because they cannot adhere to the 3-2-1 rule, which says best practice is to keep at least three copies of your data and store two on different storage media, with one located off-site.
When it comes to the robust world of major cloud providers, this may not be a concern, but it should be considered.
Backups are an alternative to snapshots and are often the preferred data protection method for cloud providers.
Unlike snapshots, backups are usually an external service that “pulls” data from production and copies it to a backup repository. Backups can often utilise snapshots to provide faster performance and ensure application consistency, but ultimately they must move data from a source to a backup target.
Each of the major cloud providers offers a native backup service, but their capabilities differ hugely.
AWS provide tools to back up almost all services, while Microsoft can protect Azure virtual machines (VMs) and is currently previewing a service to protect Azure Files. There is no current service for Azure Blob object storage.
Google’s service seems much more limited, with capabilities to back up MySQL databases and some persistent disks.
AWS and Azure also provide a hybrid backup option with which you can protect your cloud and traditional on-premise workloads and consolidate all backup data to the same cloud-based repository.
If we have decided to protect our cloud-based data, we must then decide whether the cloud providers’ backups are up to the job.
To do this, we must fully define our data protection requirements. These can include recovery times, recovery points and retention periods, but also a decision on whether holding backups inside a cloud repository alongside production data is acceptable.
The native backup tools are useful, but the main service providers also have a network of partners that integrate with their cloud offerings. These often extend traditional enterprise solutions to protect cloud-based data and services.
The use of third-party data protection tools is driven by the wider demands of your infrastructure.
If you run a hybrid datacentre and have services in multiple clouds, third-party options may offer a single platform that will protect workloads regardless of location and provide portability and the ability to restore from any location.
Although many native cloud tools are functional, they can lack some of the finesse of third-party – cloud-to-cloud – backup tools. Also, if you have already invested in an on-premise backup system, then the ability to extend this to the cloud can be attractive.
Perhaps the most appealing driver for third-party backups is the desire to keep backup data outside the cloud that holds production data. Backing up to an alternative cloud or your own datacentre may not only be desirable, but a non-negotiable requirement.
Selecting the right strategy and technology to protect your data in the cloud begins with understanding your data protection needs as well as the capabilities of the technology on offer. Without taking the time do this, the risk of an ineffective solution is significant.
It is also crucial to remember the shared responsibility model and realise that the data you have in the cloud is your data and your responsibility to protect.
Understanding these elements is crucial in making the right choices for cloud data protection.