Backing Up Your Website Part II

Yesterday, I discussed backing up your website, more specifically, I talked about the piece on the web-host side. You may recall, the result of yesterday’s talk is two files one with all of you web-files and another one containing the SQL code to recreate the database that is — your data.

In addition, I gave you guidance on setting up your cron job to run the back up script. The piece left out though, is the part that pulls the backup files from the web-host onto your home computer. Part of the rationale of backing up is to prevent your data from loss in the event of system malfunction or malfeasance. Malfunctions may be brought on by human error or due to what are often referred to as Acts of God a fire at the host’s data center, a flood, or just plain old failure of the storage devices. Malfeasance might be a cracker (i.e. an unauthorized person who breaks into your system) who then goes and wipes out your data and files, or it could result from a plain old Oh $#!+ moment on your part, eventually you will have one of those.

Today we will talk about automating the backup file download.

To handle the automatic download of my backup files I wrote a Perl script that runs on my home tower workstation/server. The script does a few things.

  1. It logs on via FTP to the particular web-host account
  2. It gets the current date
  3. It looks in the backup directory for any files with the current date (recall the file form is FILENAME_YYYYMMDD.sql or .tar) and download’s those files to a location on the local system
  4. It then deletes those backup files that are older than a specified age in days. It deletes files both on the web-host and the local system

The last point is important as no one has infinite storage and from what I hear web-hosts can get grumpy about leaving scads of daily website snapshots on their system. Mine is good and doesn’t complain, but once I run out of storage space I have to pay for more and I want that space for publicly accessible web files not for backups.

Another reason we need to download the files is for redundancy. If your webhost has a serious malfunction there is no guarantee you will be able to access your backup files and then you are SOL. If someone cracks into your account, again, all they need to do is remove the files same result. No, you need to pull the backup files completely off of your webhost system (don’t think moving them to another account you have accomplishes this, if both accounts are hosted by the same company they could very well be on the same computer system) into a place under YOUR control.



use Net::FTP;
use Date::Calc qw(Delta_Days);

if (scalar(@ARGV) < 4) { printf("Call like this: REMOTE_HOST FTP_ID FTP_PWD Remote_Archive_Dir Loacal_Archive_Dir"); } my $RemoteHost = @ARGV[0]; my $RemoteID = @ARGV[1]; my $RemotePwd = @ARGV[2]; my $RemoteArchiveSubDir = @ARGV[3]; my $LocalArchiveSubDir = @ARGV[4]; $RemoteArchive = '/Archives'; $LocalArchives = '/home/mark/WebsiteArchives'; $BackupFileBaseDirectory = '/Archives'; ($second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $dayOfWeek, $dayOfYear, $daylightSavings) = localtime(); my $CurrentYear = 1900 + $yearOffset; my $CurrentMonth = 1 + $month; my $CurrentDay = $dayOfMonth; $CurrentDateString = sprintf("_%04u%02d%02d", 1900 + $yearOffset, 1 + $month, $dayOfMonth); print("$CurrentDateString\n"); print("$RemoteHost\n"); $FTPObject = Net::FTP->new($RemoteHost);
if($FTPObject) {
$FTPObject->login($RemoteID, $RemotePwd);

$ArchiveDir = sprintf(‘%s’, $RemoteArchive);
@ArchiveFileList = $FTPObject->ls();
foreach $File (@ArchiveFileList) {
if ($File =~ /$CurrentDateString/) {
print(“file found: $File\n”);
$SourceFile = sprintf(“%s”, $File);
$DestFile = sprintf(“%s/%s”, $LocalArchiveSubDir, $File);
$FTPObject->get($SourceFile, $DestFile);
else {
print(“I find, but not download file: $File\n”);
my $OldFileRegEx = “_(\\d{4,4})(\\d{2,2})(\\d{2,2})”;
if ($File =~ $OldFileRegEx) {
my $DayDifference = Delta_Days($1, $2, $3, $CurrentYear, $CurrentMonth, $CurrentDay);
printf(“The file %s was created %s days ago.\n”, $File, $DayDifference);
if ($DayDifference > 7) {
#Delete local & remote file
#Now unlink local file
# my $LocalFileToDelete = sprintf(“%s/%s/%s”, $LocalArchives, $ArchiveSubDir, $File);
my $LocalFileToDelete = sprintf(“%s/%s”, $LocalArchiveSubDir, $File);
else {
#File is not yet old enough for deletion
else {
#File does not match expected pattern
else {
print(“Can not create FTP object. Reason $!\n”);
exit -1;


I am not going to dive into the nitty-gritty detail of what is going on in this script. Let it be known, you need a Perl interpreter on your system to run the above script. What is that? You run Windows and do not have a Perl interpreter? Lucky you, you can obtain the Perl interpreter free and then you can run the above script (you may have to obtain a module or two) on your windows system.

A couple of things about the script. You may recall yesterday files were named: WEBSITENAME_YYYYMMDD.*. The important thing for the script is file location and date. So as long as the script is looking at the correct directory on the webhost it will find and download all files with today’s date (e.g. SOMENAME_20100513.*). That date matching is important, if the backups are created at 11:00 pm and the pull script runs 2:00 am the filename will be SOMENAME_20100512.* and the pull script will not download the file.

How do you run the backup retrieval script? Since I run Linux on my home systems I create another script with the following line:


eval "/home/mark/PERL/ userid password /Archives /home/mark/WebsiteArchives/somedomain"

The script is and is called with a number of parameters. The first is the address of the website, the second is the userid, the third is the password, the next one (/Archives) is the location of the backup files, and the last one is the local location of the back up files (i.e. where you want them on your home computer).


I have a number of improvements I would like to make. First is to alter the script so it deletes old files first and then synchronize files on both systems, that is, if the web-host has a new files it brings those files to the home system, regardless of the when the files were created and when the script is running. Secondly, I would like to add another parameter and that is the number of days to keep files. Another set of improvements I would like to make are to add logging and improve error handling. Oh well, other things to ignore.

In the next installment of this series I will discuss what to do with those files once you have them in your home!

Again, feel free to crib the script and offer your suggestions.

Good Stuff!

Be the first to comment

Leave a Reply

Your email address will not be published.