:::: MENU ::::

ABA – Developing Effective Back Ups

I worry about back ups, I have them in place but I still worry. This is me placing my thoughts about my back up mechanism down on paper and sharing it. I hope that someone will say either;

  • A) what you’re doing is fine, carry on
  • B) what you’re doing is wrong, do this instead

Yolk’s hosting infrastructure runs on multiple ServerGrove VPS servers. ServerGrove have a daily back up mechanism that targets files only – it isn’t clear on whether it charges to retrieve this data from them. Although I think they might give it you for free. This has another down side, other than costing, that in the event of a disaster you would need to wait to get that data before you were able to bring any web site back online. This isn’t an issue with ServerGrove, they’re a great host, but a lot of hosting companies operate like this. They have back ups, but those backs ups are not available to you instantly.

I therefore thought it a good idea to put in place my own back up mechanism. When doing this I had one key point in mind;

A back up is only effective if it can be taken from a production environment, into storage and then back into a production environment.

I will call this process ABA – going from A to B and back to A. It is very easy to set up a back up process that takes data from a production environment and places it into storage. But without reversing this process successfully the backed up data is useless. Going from a stored back up into production can be extremely problematic if you are doing it for the first time.

That first run, and it should be a dry run in a none disastrous situation, will have a steep learning curve. Having done this dry run, these are the aspects I hadn’t accounted for;

  • Data size
  • Missing data
  • Corrupt data
  • No new host in place
  • Original back up mechanism failed

This is now going to get progressively more technical, so feel free to zone out now. If you stick with me, well done!

Implementation

This I suspect is pretty standard. A cron runs daily, this in turn runs a shell script that backs up files and database information. The database is stored as GZipped MySQL and the files are stored in a Tar Ball.

MySQL back up – shell

#! /bin/sh
mysqldump -u database_user -pdatabase_password database_name | gzip > /var/backups/archives/mysql/filename_$(date +\%d-\%m-\%Y_\%T).sql.gz

This is fairly simple. The only note worthy item is that the password flag has no space between the flag of -p and the password. If you add the space then the script will prompt for the MySQL password, and seen as this is being run via a cron you won’t be present to input the password! The username flag of -u does need a space. The date format is very useful for more quickly identifying specific dates and also finding the latest back up. Without this you would need to rely on the modified time of the file which is not as easy to access at a glance.

File back up – shell

#! /bin/sh
tar -zcf /var/backups/archives/files/example.co.uk_$(date +\%d-\%m-\%Y_\%T).tar.gz --exclude='cache/*' /var/www/vhosts/example.co.uk

Another simple script that backs up all files within a given directory. The same date format is also used here. I exclude the cache directory as it contains temporary data and it can be extremely large.

Off Site

As you can see the back up files are placed in /var/backups/archives/, while this is a valid back up – I did not deem this sufficient. I wanted an off site back up system where any files were taken to another system just in case the whole production server blew up. I use Amazon S3 for this off site back up. Along with S3Tools for Linux, I have found S3Tools to be a really useful way of getting off site back ups in place quickly and cost effectively. S3Tools will sync a folder from your server into an S3 bucket, this takes the headaches out of managing the relationship between your production server and S3.

The following code is used at the end of the shell script to sync all the archives up to S3. These files can then be cleaned to free up space again on your production environment.

S3Tools code – shell

#! /bin/sh
s3cmd put --recursive /var/backups/archives/ s3://bucketname/archives/

Extra paranoia

I worry about these back ups, because if they ever fail and ServerGrove are unable to provide me with a back up then a web site could be lost. I therefore have a second script that runs using bash. This script runs a few hours after the cron for generating the back ups and moving them to S3.

Checking script – bash

#! /bin/bash

today=`echo $(date +%Y%m%d)`
lastDate=`s3cmd ls s3://bucketname/archives/files/* | sort -n | tail -n 1 | awk ‘{print $1}’`
lastDateReplaced=`echo $lastDate | sed -e “s/\-/${replace}/g”`
echo “Last back up ‘$lastDateReplaced’ ”
echo “Today ‘$today’ ”

if [ “$lastDateReplaced” == “$today” ]
then
echo “Dates match”
else
echo “Dates do not match”
SUBJECT=”S3 Backup Failed”
EMAIL=”jake@example.com”
EMAILMESSAGE=”/tmp/emailmessage.txt”
echo “S3 backup dates do not match”> $EMAILMESSAGE
/usr/bin/mail -s “$SUBJECT” “$EMAIL” < $EMAILMESSAGE
fi

This script checks the date of the last created back up file against today’s date. If they do not match an email is sent notifying me there is an issue. If the dates do match then nothing happens.

Back to A

I have now covered how we take live data to our S3 storage solution. Going back the other way is not something I will run through step by step because;

  • A) it is different for everyone
  • B) it is slow and boring

It is worth going over the issues mentioned above and how best to mitigate their negative effects.

Data size

If the size of your back up is very large it could take a long time to transfer it from a storage solution back into a live environment. While I am sure the transfer will eventually complete, especially if you’re able to move the data directly using wget if it takes 12 hours then that is a lot of lost traffic potentially.

Try and reduce the size by removing none essential files and folders, such as the cache directory. If there is a particular large directory that you can logically split or segment then tar ball individually. Or back up files in a separate Tar Ball. Using this method you can bring the majority of a web site back online quickly and then move the less important larger files later.

Missing data

It is important to do dry runs and check your back ups. If you miss anything out then a web site could be useless because you forget to back up an all important library that wasn’t in the web site’s web root. By doing dry runs you can identify this miscellaneous libraries and ensure you do add them to your nightly back up process.

Corrupt data

Again, dry runs for the win! Corrupt data could happen to anyone, only by doing dry runs and checking your back ups can you minimise the risk of corrupt data occurring. Unfortunately I don’t have a solution to this other than to check your back ups and keep them for at least 30 days. Then if the latest back ups is corrupt then the back up from the day before should be valid and usable. This will minimise the loss of data from a corrupt back up.

No new host in place

You should identify at least one other hosting provider just in case your primary host disappears  for business reasons for their infrastructure is so badly crippled they are unable to provide you with the service you need. You will then have a host in place and ready to go if something goes horribly wrong. Even better, do a dry run on their infrastructure so you are familiar with their systems and you’re more capable of bringing a web site online on their servers.

Original back up mechanism failed

The email check I have in place helps with this, by checking each day if there is s back up present from today I can ensure that the back ups are working and are in place.