Backup
some thoughts regarding backups, what kind exists and what one should considerBackups can be to disks or tapes, they have different requirements. Moreover, they can be organized in several ways.
Before planning a backup strategy, one should organize a eventual restore. How much of a reinstallation will be done from other media? Is the disk partition and the installation of the operation system done before one restores the files? Should the backup system be able to start from scratch with a virgin harddisk?
Necessity #
Unfortunately, one often gets convinced that a backup is necessary the day one is in need of a backup. Most of the time, a backup is created and never used. It seems like a waste of time, which explains why many do not bother to create backups. This requires the backup to be highly automated to work around the lacking motivation.
Where to start restore? #
If the requirement is that servers have to restored from scratch, meaning with a blank harddisk, then the backup system must support booting via network (PXE), store information about the partitiions and network setup. Not only that, it needs to feed the required programs to the restored machine.
If a basic operating system is installed before restore starts, then the restore plan should not require any more advanced programs than those that are available at that moment. If the backup has been on tapes, are tape drivers available? If the backup is on a remote backup server, are the client programs available for download? If the backup is on remote NFS-drives, are the permissions given? Same question arises, if the files are accessible via SSH.
Requirements #
In principle, if one has lost a file due to some mishap, one needs 1 backup to restore it. But there is more to it. Backups are kind of an insurance against disasters. This led to the 3-2-1 rule.
3-2-1 rule #
Basically:
- There should be 3 copies of data
- On 2 different media
- With 1 copy being off site
The 3 copies are the one in use, a local copy and a remote copy for geo-redundancy. The 2 media rule is a bit outdated, but geo-redundancy is a protection against fires and similar losses.
Improvements #
The 2 media rule is outdated, one could keep all 3 copies on harddisks, like on a system harddisk for production and on removable harddisks for backup. Geo-redundancy is still relevant in case of accidents like fire. With the transport of media to a remote location, one should encrypt all data to ensure safety. Protection against ransomware requires some kind of gap which prevents access.
Updated requirements:
- 3 copies: production, local backup, remote backup
- 1 copy in remote location
- encryption
- air gap
- easy access on demand for restore
History #
There are “backups” that maintain a perfect copy of the harddisk somewhere else. The command rsync for example allows the creation of an exact copy in a remote location. However, one might want to restore an older version of the system. Imagine that ransomware has started to corrupt files on the system and a backup was made before the corruption got known. Then the backup is useless, since it does not contain uncompromized files.
RAID is not a replacement for backup #
A RAID array might protect against a harddisk failure, but otherwise offers no protection. For example, one takes daily backup with the history of 30 days and stores the work in progress on a RAID disk array. One mistake, and the file is deleted from the RAID and no longer accessible. However, there are still 30 versions on the backup.
Application backups #
Databases in production are notoriously difficult to take backup. After a restore, the database should be in a consistent state. Taking backup should not interfere with production. Hence, one cannot stop the whole system just to take a backup, a different solution is required. MariaDB and similar systems offer to dump a snapshot of the database, sometimes with a necessary lock on all tables - which would stop production. If possible, one should have a dedicated slave instance, which can be stopped and which is used to take a data dump. The data files of the database should be kept out of the regular file backup.
Tapes or disks #
Tapes have been the desired media for backups, commands like tar and rmt bear witness. Bigger organization employ tape robots, to manage the required amount. Disk space has become cheap, but is a different medium than tapes and that should be reflected in the backup organization.
Furthermore, the production system might come with additional requirements, for example allowing the user to retrieve copies of backed up files at their will. Then an automated system with a tape robot or fully disk based is appropriate.
Tapes #
Tapes have been quite cheap compared to harddisk. They are sequential access media with no content structures, that is why tar has been used to write several files to one tape. Databases were not that common, hence, the modification date of files has been used as a selection criteria.
Common tape based backup strategies #
One differentiates between:
- full backup
- differential backup
- incremential backup
Other schemes are possible, but not well known.
The full backup can be used alone or in combination with one of the others. Basically, all files in certain locations that are not excluded by some pattern are backed up. This requires a bigger tape volume for each backup. When restoring, only one volume needs to be applied.
Differential backups create smaller volumes, only those files that differ from the last full backup are included. This means that the size of the differential backup increases until the next full backup. One full backup and the latest differential backup are needed for a restore.
Incremental backups saves files that differ from the last full backup or from the last incremential. Those volumes vary in size, but a full backup has to be taken from time to time. One full backup and all incremential backup since the full are needed for a restore. This scheme requires the least tape amount, but also the most time for restore.
The disadvantage of tape based systems is the missing coordination of backups between servers. Hence, there exist several copies of common files.
Disks #
Disks are not tapes. Except for weird cases, harddisk allow for random access. Disks are also more expensive than tapes, the space should be more effectively utilized. A different organization of the backup is needed.
The requirements for a good disk based backup system are:
- geo-redundancy
- restricted access functioning like an air gap to prevent corruption by ransomware
- deduplication, compression
- optional encryption
- low cost
- easy access for restore after authentication
Specialized disk based backups often require proprietary client software. The installation of that client software should be easy and without hazzle, when a restore is needed as time often is scarce.
Geo-redundancy is obvious, since using disks does not eliminate the risk of a local disaster. Compared with tapes, disks are easily accessible - this is an attack vector for ransomware. The best protection against ransomware offers a backup system with an API, where the saved files are protected againt corruption. If possible, the disk backup system should only offer limited access and only read access to authenticated systems when a restore is needed. For frequent restore needs, some other system must be in place.
Most likely, one takes a backup of a computer system, when employing a dedicated disk based backup system. Then, many hosts contain files with the same content, for which one only needs 2 remote copies of. The advantage of disk based backup is the random access, hence files can be stored independently, like once for all attached systems. Moreover, disks allow the deletion of old files when disk spce is needed. This allows a longer history at the beginning of the service.
Trial runs #
A trial run of restoring a backup should be part of any acceptance test. In case an operating system is installed before restoring files, it will reveal:
- missing informations like partitioning scheme
- subscription status for repositories
- partitioning information
- necessary client software for restore
- access permissions
If possible, restore trials should be conducted on spare systems.