Shit happens. You should keep these two simple words in your mind every day when you write the next line of code. Even if right now you are working on your best product/service/application that never failed before – believe me, it will fail in future. To endure such fault and restore code and data you should be prepared in advance. So, take a cup of coffee with a cookie and prepare to dive into the world of backups.
During the last eight years I saw a huge amount of different approaches to back up the system and the database. To be honest, system backups are easy today. Every IaaS provider gives a one-click solution to backup the whole server. But be careful, this type of backup does not guarantee any data consistency in your database on restore! Why? The answer is easy: it is a low level hot backup, which does not know anything about your files. Imagine that your database uses a huge data file and constantly changes it. The file cannot be copied in a single moment thus it will be changed during the copy process. As a result data consistency is lost.
To preserve consistency database backup should be separated from system backup. For sure it should be an automated highly configurable tool with a large community. I tried a dozen of different tools and recommend using gem backup (https://github.com/backup/backup). Just take a look at the supported features:
- Archives
- Databases (MongoDB, MySQL, OpenLDAP, PostgreSQL, Redis, Riak, SQLite)
- Compressors (Bzip2, Custom, Gzip)
- Encryptors (OpenSSL, GPG)
- Storages (CloudFiles, Dropbox, FTP, Local, NineFold, SCP, SFTP, RSync, S3)
- Syncers (CloudFiles, RSync, S3)
- Notifiers (AWS SES, Command, Campfire, DataDog, Flowdock, HipChat, HttpPost, Mail, Nagios, Pagerduty, Prowl, Pushover, Slack, Twitter, Zabbix)
Let me guide you through the main parts of setup and configuration process. Let’s use the following project conditions as an example:
- We use PostgreSQL
- We want to backup to S3
- We want to store:
- 12 daily backups
- 3 weekly backups
- 6 monthly backups
- We want to bzip2 our backup
- We want automatic recurrent backups
- We want to be notified on any backup errors via email
General configuration
First and foremost, you should install backup gem:
[ruby]gem install backup[/ruby]
You SHOULD NOT include backup gem in your Gemfile. The main reason is a huge amount of runtime dependencies (40+ gems). All of them will be loaded into your RAM in case you added backup gem to Gemfile.
Well, gem was installed and it is time to configure backup. To help with first configuration backup gem has Generator module. As a result you should just run:
[ruby]
backup generate:model –trigger my_full_backup –archives
–databases=’postgresql’ –storages=’s3′ –compressor=’bzip2′
–notifiers=’mail’ –config-file=’config/backup/config’
[/ruby]
And you will get configuration skeleton inside config/backup folder. It is commented very well and easy to extend. You can use ruby functions and expressions everywhere since it is ruby based configuration.
Source database
Database configuration is very similar to config/database.yml file. You should just provide values for user, password, db, etc. Here is an example:
[ruby]
database PostgreSQL do |db|
db.name = “my_db”
db.username = “user”
db.password = “password”
db.host = “localhost”
db.port= 5432
db.additional_options = [“-xc”, “-E=utf8”]
end
[/ruby]
Additional options are to force utf8 for dump (-E=utf8) , drop database objects before recreating (-c) and to prevent privileges dump (-x).
Backup to S3
Backup gem supports cycling stored backups and allows creating configuration dynamically. Thus we can easily accomplish our complex task with monthly, weekly and daily backups:
[ruby]
time = Time.now
if time.day == 1 # first day of the month
storage_id = :monthly
keep = 6
elsif time.sunday?
storage_id = :weekly
keep = 3
else
storage_id = :daily
keep = 12
end
[/ruby]
[ruby]
store_with S3, storage_id do |s3|
# AWS Credentials
s3.access_key_id = “key”
s3.secret_access_key = “secret”
s3.region = “us-east-1”
s3.bucket = “my-backups”
s3.path = “backups/#{ storage_id }”
s3.keep = keep
end
[/ruby]
Here we calculate storage_id and keep amount on the fly and do not have to create different configurations for each case. We can distinguish backups since each type will be placed in a separate folder and more than that variable storage_id will be attached to backup filename. I was impressed with such approach.
Compression
We can enable compression with a single line:
[ruby]
compress_with Bzip2
[/ruby]
But in case we need custom configuration it is possible to pass a configuration block :
[ruby]
compress_with Bzip2 do |compression|
compression.level = 9
end
[/ruby]
One has to mention that you should check the redundancy level of your database at first. If you have low redundancy level you might save only 10% of space and loose 10-100 times more time on each backup. So be careful and check compression efficiency of your first backup manually.
Reccurent backups
It is recommended to use this gem in conjunction with whenever gem. This gem exposes ruby DSL to define recurrent tasks and export them to crontab. Here is an example
[ruby]schedule.rb file:
every 1.day, :at => ‘4:30 am’ do
command “backup perform -t my_full_backup”
end
[/ruby]
Nothing complex, right?
Email notifications
Same as before, backup gem has all the functionality inside. All you need is just configure that feature:
[ruby]
notify_by Mail do |mail|
mail.on_success = false
mail.on_warning = true
mail.on_failure = true
[/ruby]
[ruby]
mail.from = “from”
mail.to = “to”
mail.reply_to = “[email protected]”
mail.address = “smtp.sendgrid.net”
mail.port = 587
mail.domain = “test.com”
mail.user_name = “user”
mail.password = “password”
mail.authentication = “plain”
mail.encryption = :starttls
end
[/ruby]
I am sure you are familiar with all the configuration options above.
Conclusion
Take a look at your current backup solution and compare to the described above. Feel free to use backup gem and help the community to expand its functionality. If you found another amazing solution that works even better – do not wait, ping me in the comments below!
P.S. Full configuration is way too large for the article, so take a look at the gist.
Questions? Comments? Let’s talk about them in the comments section below.