Why disk beat tape in the backup wars | Virtual Reality
Any backup experts worth their salt switched to disk as the primary target for backups many years ago. Tape still reigns in long-term archival, for the reasons laid out here. But tape is also quite problematic when it comes to day-to-day operational backup and recovery.
Which is why disk has essentially replaced tape for operational backup and recovery. This is a factor of two things: random-access and multiple simultaneous operations.
Disk is about random access
When comparing disk and tape, the one thing that disk will always have over tape is random-access. By contrast, tape is a serial access device. At first, this would seem an advantage for tape, as backup is typically a process that generates a stream of data that would work very well with a serial access device. However, the problem with the backup stream is that it is too slow to keep modern streaming tape drives happy. But a random-access device can easily handle a slower stream because it can write the data that it has, pause for a moment, and then write the next chunk of data – all without suffering any performance loss. There are also recovery advantages to this as well (see below).
Multiple simultaneous operations
A tape drive has a single read/write head that can write a single block of data at a time. Disk drives have multiple read/write heads that can simultaneously write multiple streams. Couple this with its random-access capabilities and disk can actually handle dozens of simultaneous write operations of various speeds, all without sacrificing performance on any of those streams. It’s like those plate-spinning performers who can simultaneously spin dozens of plates even though they are only one person. That’s essentially what a disk drive is doing; it’s receiving dozens of simultaneous backups and writing each of them quickly enough that the backup doesn’t realize that it’s only one of many operations the disk is servicing.
Disk can do things tape cannot
Deduplication is an interesting technology that is both required by and enabled by disk. Deduplication was necessary with disk-based backups because disk was so much more expensive than tape. At the same time, deduplication is much more possible on disk, so you can also say that disk enabled the concept of deduplication.
But deduplication does far more than reduce the effective cost of disk. Without deduplication, the only way for most customers to get their backups offsite is to hand a box of tapes to a man in a van. But by reducing the amount of data that must be written to disk after any particular backup, deduplication enables customers to send their backups across a WAN connection. Some customers do this by backing up to a duplication-based backup target that can then replicate to a similar backup target off-site. Others do this by using source-based deduplication that can send backups directly to a service provider over the Internet. But the main point is that deduplication enables automated electronic transfer of backups.
If you can deduplicate your backups and transfer them to a physically separate location without having to handle something like tape, you can also use those backups to automatically create a copy of your environment that can be activated in case of disaster. Without disk, deduplication and electronic transfer of backups, it simply isn’t possible to support a cloud-based DR system that could take over for your data center