Western Digital has linked up with StorReduce in a move that will allow it to bundle the startup’s inline data deduplication software into its object storage appliances.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
Key use cases targeted by the hardware maker will be backup and data protection. By offering StorReduce’s deduplication software with its storage arrays, Western Digital aims to offer an attractive alternative to backup appliances from those sold by Dell EMC, HPE and Quantum.
According to Stefan Vervaet, director for strategic alliances and development in the storage systems division of Western Digital, a backup product that couples StorReduce and ActiveScale will give average savings of 45% compared with traditional backup appliances.
It is easy to see why StorReduce’s technology is attractive – the software can be deployed on a physical server or as a virtual machine (VM) in an enterprise datacentre or in a public cloud.
Each appliance acts as a data deduplication path between servers and the object storage target. The appliances are stateless, which permits deployment in clusters, up to a maximum of 31.
Communications between server and appliances is done through the S3 protocol, which has emerged as a de facto standard for accessing back-end storage, although Microsoft Azure and Google Cloud Platform are also supported.
Data ingested by the appliances is first processed using an inline deduplication algorithm, then compressed to obtain the highest possible rate of data reduction. According to StorReduce this adds a maximum of 50ms of latency between servers and storage.
To function, each appliance needs a fast flash layer to store the deduplication index and metadata. When several appliances are connected in a cluster, this information is distributed across the cluster to protect against node failure.
A complete log of transactions is also sent to object storage as they are written to allow reconstruction of the index in case an outage affects all nodes. Because the process of index reconstruction takes time, it is also snapshotted periodically to accelerate the rebuilding process. This allows rebuilding to be carried out from only the most recent snapshot.
Within a cluster, ingestion and data access performance increases with the number of nodes. A cluster of StorReduce appliances forms a single deduplicated global namespace and can scale to several hundred PB of data, with each appliance capable of a maximum of 80PB of deduplicated data.
According to StorReduce, each appliance can provide deduplication throughput at around 2GBps (approximately 7.2TB per hour) during ingestion and rehydration of data. That makes 60GBps (216 TB per hour) for a cluster. For these rates of processing to be supported, you will need storage and networking capacity to suit.
To compare, the best performing hardware from Dell EMC’s Data Domain family can manage a maximum of 50PBps of backup data, with claimed ingestion rates of 68TBps when deduplication work is carried out at the source with DDBoost, and 31TBps without DDBoost.
According to Western Digital, the ActiveScale/StorReduce product is certified by a number of backup software providers, including Veritas NetBackup, Backup Exec, Commvault Simpana, Veeam and EMC Networker.
Western Digital points to several of its customers that already use StorReduce with its ActiveScale hardware for nearline data or as back-end storage for data in Hadoop clusters.
The massive deduplication log on an ActiveScale deployment allows, for example, a considerable reduction in the cost of storage and in the physical space occupied without a major impact on performance.
Elsewhere, Western Digital has contributed several enhancements to the S3A client to the Apache Hadoop community that allow a Hadoop cluster to access stored data on an S3-based object storage system. The firm also works with Microsoft to give the same access to a Hadoop cluster in Windows.
More generally, said Vervaet, deduplicated object storage is potentially interesting for customers who have hitherto used tape or disk for big data workloads.
The use of StoreReduce on an ActiveScale appliance adds a software cost of around 7 cents per gigabyte, which would decrease at greater volume. Minimum storage capacity on an ActiveScale system is 480TB.