Veeam and EMC Data Domain Integration
For several years now, Veeam has been producing a wonderful product to protect VMware and Hyper-V virtualized environments. As one of the easiest backup products to setup and manage, Veeam Backup and Replication (B&R) has just become so much better with each version. Today, with version 8 of the Availability Suite or the B&R product, EMC Data Domain’s DDBoost capability is fully supported. For those of you out there that aren’t familiar, EMC Data Domain is an award winning purpose-built backup appliance that features in-line deduplication, data invulnerability, and a fully implemented Open Storage Technology (OST) interface called DDBoost. We’ll get into the details of Data Domain in a moment, but suffice it to say, now that Veeam supports writing directly to Data Domain is a really good thing.
In the summer of 2009, EMC announced their agreement to fully acquire Data Domain. This acquisition brought some really excellent technology in the backup target space to EMC’s Data Protection Solutions portfolio. The idea behind Data Domain’s technology is to provide a disk-based target for backup software, eliminating tape, increasing backup performance and doing so reliably and efficiently. The key benefit of Data Domain is its in-line deduplication which, depending on data types, can achieve compression factors of 10-30x. The significance of this point alone is that in a relatively small form factor (as small as 2 Standard Rack Units), usable storage on Data Domain is multiplied by many times into a massive logical capacity. Further, Data Domain doesn’t need to use fast disk to achieve excellent backup performance, since the deduplication is done completely in memory and on CPU – in-line. By leveraging large capacity SATA disks, the costs of storage is kept very low while the ever growing capacity of such drives and the ever increasing CPU speeds make Data Domain a very scalable solution. For other disk-based storage solutions to compete, the quantity of hardware and consumption of precious data center resources just doesn’t make sense when compared to the much more efficient Data Domain. Capacity, however, is only part of the problem, tapes can have great capacity and legacy storage systems could look appealing given the materials on-hand. However, the problem with tapes is two-fold, limited stream count (typically one stream per drive in non-multiplexed operations) and the data on the tape is only known to be good when the recovery happens successfully, and then of course, it’s too late if it fails. Legacy storage systems that may be on-hand seem appealing too, and they may offer good performance and capacity. However, as we know, old hard drives are smaller and old systems are power hungry using tons of space, power and cooling in the least efficient manner. Further, if the legacy system is end of support, then we’re talking about 3rd party support options and additional maintenance load on the team and risk to the data. Lastly, there’s no way to be sure that the data on the legacy disk filesystem is readable any more so than if it were on tape. So if you can’t read the data you backed up, when you need to recover, what value is that? Suddenly tape and legacy disk solutions are no longer data protection assets, but are actually liabilities to the business.
Data Domain addresses all of this, by using space more efficiently with its inline deduplication capabilities. Two standard rack units could easily represent upwards of 200TB of logical disk space and more. Add to that the idea that the entry-level Data Domain supports up to 60 backup/recovery streams simultaneously. It would take 60 tape drives to accomplish this!
The next key feature, and one that makes the integration with Veeam possible and so interesting is Data Domain DD Boost. DD Boost is the EMC implementation of the OST library functionality on Data Domain. The intention of the OST library is to give backup applications awareness of the intelligence built into the disk-based solutions they may be backing up to. In the case of Data Domain, DD Boost enables so many interesting things within the backup environment, such as:
1. Distributed Segment Processing (DSP) – Gives the backup software the power to preprocess some of the data for deduplication and ship it to Data Domain already compressed thereby limiting the impact on the network and increasing the speed of the backup. This is key in the Data Domain enabling certain backup products to write to the Data Domain DD Boost target over a WAN, directly (yes, backup over the WAN)
2. Simplified Disaster Recovery – Data Domain supports replication natively, however DD Boost works with the Data Domain replication to inform the backup software of duplicated copies. This keeps the indexes up to date and allows for duplicated backup images without having to read all the data back through the backup server.
3. Enterprise Application Backup Control – DD Boost integrates directly with applications such as Oracle, SAP and MS SQL Server. DBA’s now can leverage the power and efficiency of the Data Domain while retaining 100% of the control over their database backup strategies. The business benefits from having data protected on a shared resource thereby reducing the cost in the infrastructure.
Veeam Availability Suite and B&R v8 can leverage DSP, advanced source side deduplication to avoid sending data already stored on Data Domain storage, as well as create new full backup files without physically moving data into the file, but rather by synthesizing them from the existing data. As you can see below, tests from Veeam Software show off the impressive capabilities of Veeam with DD Boost. The VM on left had its synthetic full produced with EMC Data Domain Boost (11 minutes and 27 seconds) and the one on the right had it produced without (2 hours and 54 minutes).
In the example above, Veeam shows that the deduplication ratio was very low, which is due to the 100% change rate baked into the test. Even so, with the synthetic full backup capability, Data Domain is able to identify data already on disk and reuse the redundant segments to construct a mapping to the backup data without creating new blocks. To illustrate how the connection between Veeam and Data Domain functions and where the Deduplication operations are taking place, see the graphic below:
In keeping with Veeam’s easy to use approach to backup, the configuration of Data Domain within the backup server is very simple. Simple point the server at the Data Domain FQDN, enter the DD Boost User credentials and browse to the DD Boost appliance target (logical entity) on the Data Domain. At that point you’re ready to configure your Veeam jobs to use Data Domain as a repository for Veeam backups!
There you go, a ready-made backup solution for your VMware or Hyper-V environment that is completely integrated with the best of breed backup storage appliances, EMC Data Domain. If you’re looking to bring enterprise grade functionality into your Veeam installation or you need a better way to store your backups, this is one very good way to go.
Please feel free to contact me directly with any questions or concerns about the information presented in this document.