BACKING UP FILES
TOPICS BELOW
What Are We Backing Up, How/Where to Backup, An Example,
True Backup vs. Synchronization,
Image Backups, Immutable Backups,
Your Backups
This topic is different from the rest of this site in that it deals with concepts rather than a checklist of details.
Having only one copy of a file, is the equivalent of drunk driving. You can lose the file, in an instant, for a many reasons.
Files, of course, but it's complicated. There are many ways to slice and dice that assorted files that need to be backed up.
Importance is a great place to begin. Some files/folders are very important, most are somewhat important and a few are just nice to have but no big woop.
If a file is somewhat important, then you should have two copies of it and each copy should reside in a different physical location. If a file is very important, then you need to have three copies of it in at least two locations. Three locations would be better. And, if possible, each copy should be managed by different companies or with different software to avoid having all your eggs in one basket.
Next up is sensitivity. Some files are sensitive in nature, some are not. Sensitive files require different care and feeding than non-sensitive ones. For example, sensitive files should not be stored on Google Drive, Microsoft OneDrive or Dropbox. These systems, and many others, are not appropriate because the companies running these systems can read your files. Until 2024, Apple could read all the files stored in the iCloud system, now (August 2024) it is more complicated and you can block Apple from reading some files but not all.
There is a page on this site listing some Secure File Storage providers. That is, companies that can not read your files. Even this is not simple, as some providers, such as Backblaze and iDrive offer one service where they can read your files and another service where they can not. This puts a burden on you to insure you use the correct options/service. And, while secure file storage is great for privacy, the burden is on you not to lose the password(s). Lost passwords (often called "keys") means lost access to the secure files.
FYI: Marketing works against you when it comes to secure file storage. Even companies that can read your files will brag about their encryption.
Still another way to segregate your files is the issue of updates. Some files, like photos of last years vacation, never change. Other files, like spreadsheets and Word documents are updated. Any file that is updated has different versions. If it is often updated, there is a version from Monday, a different version from Tuesday and stil a different version on Thursday. Some file storage services will keep the Monday, Tuesday and Thursday versions of the file and let you restore either of them. This is called versioning. A file storage service that does not support versioning, will destroy the Monday version when the Tuesday version is backed up and then destroy the Tuesday version when the Thursday version is backed up.
And, yes, there are different versions of versioning. Not being a smart ass here, different file storage companies handle file versioning differently. Perhaps they will keep the last three versions of each file. Or, they may keep multiple versions but delete anything over 30 days old. Or, the number of versions they keep may vary depending on storage limits.
Ready to give up yet?
File Size is still another way to segregate your files. Very large files (mostly videos) are limited in how they can be backed up - limited either by the time it takes to make the backup, the storage space required for the backup. Or, both.
Perfection I made the above distinctions because different types of files, in all likelihood, require different backup schemes; different software, different schedules and different target destinations. And, while it is easy for an amateur or a smart ass to propose a perfect scheme, where no data is lost, this takes a lot of planning, effort, computing horsepower, money and oversight. Some people or companies are up to the task of making and babysitting perfect backups, many are not. Most of us have to think about how much data loss we are willing to tolerate. In a simplistic example, if you use a desktop computer and make backups every night, then you are willing to lose a days' work.
Or, consider the photos we all take with our phones. A perfect backup scheme would send a copy of each picture off to the cloud as soon as it was taken. But, this consumes 4G/5G bandwidth which, for many of us, is limited. If you want every photo backed up immediately, you will likely have to pay more for a plan with more/unlimited bandwidth. If you prefer to wait until the phone has a Wi-Fi connection to backup the photos, then you are willing to lose the most recently taken pictures, if the phone is lost, stolen or broken before the pictures are backed up. Every backup scheme involves trade-offs. More on this in the Example section below. h
Finally, there are times when people want to backup an entire computer. Not just the files people normally deal with directly, but also the operating system, the application software, and all the millions of configuration settings for all the software. All of it, every last bit (literally). More on this below in the Image Backups topic.
After the issue of what to backup, comes the questions of How and Where. But, in-between, is still another variation on what to backup:
ALL the files
-or-
Just the NEW ones (with "new" including existing files that were updated)
In the old days, backing up all the files was called a "full" backup, including just the new ones was called an "incremental" backup.
While walking around with a phone taking pictures, we obviously only want to backup the newly taken photos. But, if working on a project that involves multiple Office-class files (spreadsheets, powerpoints, et) there are times when all the related files for that project should be backed up and times when only the files that changed today need to be copied.
Whenever a scheme is backing up only the new files, I always ask myself how it knows which files are new? To the best of my knowledge there are two answers: it either scans all the files and looks at the meta information about each file for some type of flag that indicates the file is new. Rather than a specific flag, the backup software might just look at the file creation date/time or the last update date/time. The other approach involves an intimate relationship between the backup software and the operating system. In this scheme the operating system knows about the backup software and taps it on the shoulder whenever a file is created or updated. This requires the backup software to be running all the time. The later is more efficient, but it has always scared me. Just my opinion
As to Where to store backups, the obvious choice is local vs. the cloud (aka "off site"). Local backups reside in your home (or office if we are talking about a company) cloud backups can reside in most any country on the planet.
Local backups normally exist on an external hard drive or a NAS (Network Attached Storage), which is one or more hard drives that live on your LAN (Local Area Network). Typically local backups have more storage capacity than cloud backups, are cheaper and run faster. The fatal flaw with local backups is that one disaster (fire, flood, hurricane, etc) can destroy both the original files and the backups. Cloud backups solve that potential problem.
That said, cloud backups have their issues too.
To me, the most important issue is whether the cloud provider can read your files. I have a whole page devoted to Secure File Storage on this site which lists some cloud storage providers that can read your files and some that can not.
Off site cloud storage companies have potential problems that come with any account: the company could stop offering the service (this happened with Amazon) or they could just fold up shop and go out of business (this happened to me). Or, your account could get screwed up or hacked and you could be locked out of your own files. This happened to people who had their iPhone stolen. When the bad guys got into their Apple account, the backed up photos were gone. If you use a free service, there is no tech support.
A cloud is just a data center and any data center can also suffer from a natural disaster. You are best off when your cloud files are fare away from you, but not all companies offer a choice of where you files are stored. If a single data center suffers an outage, are the files stored there lost or does the cloud provider store two copies of your files? Good luck getting an answer to that question.
One cloud provider that will answer that question is rsync.net. When you sign up with them, you get to chose which of their data centers you want your files kept. And, they also offer a choice about how many copies of your files they keep. Pay less, they keep one copy. Pay more and they will duplicate your files at a second data center of theirs. This duplication is done by them, not by you.
All things considered, any competent computer nerd would recommend a combination of both local and cloud/remote backups.
Cloud/offsite storage is typically thought of as residing with a corporation, but it can also be a friend or relatives house. One way to do this is to simply give your trusted person a copy of your files every now and then, be it on an external hard drive or a USB flash drive. Techies might setup an on-line link between the two locations and perhaps have a NAS in the original location directly talk to a NAS in the trusted persons location.
Quick point on backup software. In the server and desktop OS world (not so much on mobile) some backup software makes copies that only it can restore. This strikes me as a bad idea, I prefer backups that can be restored without the software that created them. More than once, Windows users found out that a new version of Windows no longer included the backup software in the previous version.
Yet another option as how HOW to make backups is whether the backup software is run by you manually, or whether things are automated and the backups run either on a set schedule or constantly. Automated backups obviously sound great, but, trust me, when automated backups break, the problem is often not detected. Any regularly scheduled backup needs to tell you when it has worked (I backed up 24 files today). This is the best way to know when it stops working. Consider the earlier example of automatically backing up every new picture taken on a phone. Would you know if those backups stopped happening?
As detailed below, I personally make end of day backups, and I rarely forget to do so. For my other periodic backups, the calendar app on my phone reminds when to do them.
As an example, of the trade-offs involved with backups, take me. My most important files reside in one folder on one computer. Once a month, I combine them up into one big compressed and encrypted file. They reside on a Windows computer and I use 7-Zip for both the compression and encryption. Then a copy is kept locally (in my home) on a USB flash drive plugged into the computer, and on a NAS box. In addition, the file is uploaded to two different cloud storage providers.
This simple scheme leaves me vulnerable to losing 30 days worth of updates. So, at the end of my day, I copy the files updated today to a USB flash drive plugged into the computer. The end-of-day backups are not encrypted. If the computer dies, the flash drive can be moved anywhere. This still leaves me vulnerable to losing a whole day's worth of updates. And, if my home goes up in flames, I can lose a month's worth of updates. On the other hand, if I'm home before the place goes up in flames, I just need to grab one flash drive to have all my files as of yesterday. Trade-offs.
To capture file updates as they happen in real-time would require software running on the computer all the time. I prefer not to do that, reasonable people can disagree. This decision depends on both how important the updated files are and your ability to handle the technical care and feeding of any real-time backup system. That is, how do you verify the system is still doing what it should be doing, how do you insure it does not run out of space, how do you insure that the backups can really be restored. It can be a lot. In my case, if I have a busy day and make a lot of updates, I can run my end-of-day backup routine in the middle of the day. This is not a perfect system, I can lose a few hours of data/updates, but it is good enough for me. Again, trade-offs.
One flaw in my system is that there is no versioning. If I update a file on Monday, Tuesday and Wednesday, when Friday rolls around, my only backup is the Wednesday version of the file. That said, my end of day backup routine keeps 5 full copies of my important folder, one for each week of the month. As soon as week 1 ends, it starts making backups to a week 2 folder. Not real file versioning, but better than nothing.
My point here is not that this scheme is the best for you, let alone the best for me. Rather, this illustrates the decisions and trade-offs involved with any approach to backing up your files. But, as the next topic shows, there is still more to be aware of.
FYI: Me, being me, I also keep the NAS drive powered off and unplugged most of the time.
This should extend the life of the mechanical hard disks (yes I use RAID) and protect it from power surges.
Yet another issue is the most basic of all: what does backup mean? I say this because "synchronization" is often confused or taken for "backup", and they are different.
Synchronization (aka "sync" or "replication") refers to keeping the files in two folders in sync. Changes made to one folder are detected and made to the other folder. Frequently one folder is in your home/office and the other folder is in the cloud (this is not a requirement). If a new file is added to the local folder, it will be copied to the remote folder. If a file in the local folder is updated, the updated version will be sent to the remote folder. Microsoft OneDrive, Google Drive and Dropbox all do synchronization.
Sounds like a backup, but not quite. Deleted and renamed files illustrate the difference.
A file deleted in the local folder will be deleted by a synchronization system in the cloud too. If it was deleted on purpose, fine. If it was deleted by accident, tough luck. A backup program/app/system will keep the cloud copy of the locally deleted file. This is the crucial difference: a true backup system, never deletes anything. A file renamed in the local folder is only available by the new name when doing synchronization. With a true backup system, the cloud backup will store the file under both names and manage each independently of the other.
I often use renamed files to make a second backup, even though my personal backup scheme does not do versioning. For example, before I make a big update to file24.doc, I might first copy it to file24.asof.Aug9.2024.doc so that I always have that version to refer back to.
Both a true backup system and a synchronization system can either be run continuously in the background or on-demand. Or both.
Yet another concept here is file vs. image backup. Everything up to this point has been about backing up files. Image backups exist because even with your files backed up, things can still go wrong. Lots of things.
For starters: you might not really be backing up all your important files. Perhaps the way you work or store files changes and you forgot to change the backup software. Perhaps you forgot about some files from the get-go and are not backing up the location where they are stored at all. Perhaps there is a bug in the backup software or it is mis-configured. Image backups protect from these issues and many more.
An image backup is huge, as it is a backup of ALL THE FILES on a computer. The name comes from taking a picture of the hard disk and everything on it. Needless to say, this backs up files you don't care about it. But there is a big up side to this nonetheless.
If the hard drive in a computer fails, it can be replaced with a new one and the image backup restored. Part of the image backup software should be available as a bootable USB flash drive or CD or DVD. If a computer is infected with malicious software that either deletes all the files it can or encrypts them for ransom, then again, the image backup can be restored and the malicious software and the harm it caused, both wiped away. If a computer dies from a hardware failure other than a bad hard disk, an image backup can be restored to a similar model computer. And, if you use the right image backup software and prepare it correctly, the image backup should be able to be restored to a different computer. An image backup of a Dell laptop might be loaded onto an HP desktop.
Most likely the software used for an image backup will be different than that for file backups. It's a specialty.
While image backups are big, there are two obvious up-front tricks that reduce their size: ignoring unused space and file compression. Thus, the backup of a 900GB hard drive with 500GB of files on it should be less than 500GB.
Like file backups, image backups can be run either on-demand or continuously in the background. Unlike file backups, they can also run outside of the operating system, from a bootable USB flash drive, CD or DVD. When I was making regular image backups, I always preferred to make backups when the operating system was not running. The less installed software the better was my opinion.
Clearly image backups should not be anyone's only type of backup, but anyone serious about backups is making image copies.
This is our last and most advanced topic :-)
Files are stored in a computer using a File System. It is the file system that determines how long a file name can be and which characters are allow in file names. It also limits how big any one file can be and the total storage space that it can manage.
Windows has used the NTFS file system since you were born. USB flash drives use FAT or FAT32 or exFAT. CDs and DVDs have their own file systems and they use different ones depending on whether they are read-only or updateable. The mainframe computers that process all your credit cards use the VSAM file system. Linux is particularly flexible in that it supports a whole bunch of file systems such as EXT3, EXT4, Reiser4 and two others described below.
The files systems mentioned so far are stupid. They do what the operating system tells them to do. But there are also a couple (maybe more?) smart file systems that can think on their own. And these systems (ZFS and Btrfs) can make checkpoints. For example, the file system can be told to make a checkpoint at 5 minutes after the hour and to keep 10 such checkpoints. So, if anything goes wrong, the computer can be restored to the last checkpoint.
Better yet, these checkpoints are immutable. That is, the file system does not let anything change them. Malicious software may delete or encrypt all the files it can find, but no malicious software can modify the files as they were before the last checkpoint. Malware can see the files, and think it changed them, but it did not change them. New copies of updated files are created, the old copies remain unchanged.
The official term for this is CoW - Copy on Write. Software that thinks it wrote an updated file is tricked. You can think of a CoW file system as being like the file versioning feature that was discussed earlier in regard to file backups. When maintaining 10 checkpoints (for example) there can be as many of 10 versions of the same file being maintained by the file system.
If (when?) something bad happens, the entire computer can be restored to its state as of a checkpoint. Any checkpoint. In this case, the good guys have 10 hours (10 checkpoints once an hour) to figure out that something bad happened and restore the system to a good state.
Won't changes be lost? Yes. But since the state of things at each checkpoint never changes, you should be able to view the state at each checkpoint. That said, this is over my head. I have not used a system running ZFS.
For hands-on experience with Btrfs you can try a Synology NAS. Cloud storage provider Rsync.net offers ZFS based file storage. What I have been calling an immutable checkpoint, they refer to as point-in-time snapshots.
- - - -
Off topic: Wouldn't it be nice if Windows ran with a file system that supported checkpoints? Virtual Machine software can do this. Windows does have Restore Points but they are a poor mans version of immutable snapshots. There are many Virtual Machine products, such as VMware, VirtualBox, Parallels Desktop, QEMU, Xen and Hyper-V. Some are free, some are expensive. Some need a host operating system, some are the host operating system.
Any backup scheme involves lots of choices and decisions. And, we all have different needs and different technical abilities which is what makes any suggested scheme all but useless. Here I tried to explain the basic concepts and illustrate the necessary decisions involved in any backup scheme.
If your backup scheme is faulty, you are not alone. Ransomware would not be so popular and profitable if companies had better backup systems in place. The next time you hear about a company getting hacked and paying the ransom, realize that their backup scheme was lousy.
This page: 16 views per day (over 185 days) Total views: 2,937 Created: August 12, 2024 |
This Page Last Updated February 13, 2025 | Site Page Views TOTAL 1,097,399 | Site Page Views TODAY 30 |
Website by Michael Horowitz @defensivecomput |
top |