data-backup

I don’t know if the expression “Digital Data Hypochondriac” is clear enough. They are people always worried to lose all their digital data. Every time any components of their digital ecosystem doesn’t work as expected they’re scared.

I’m one of them. In 2000, when I was 15, my huge 14 GB hard disk contains almost all my digital life. One day the HD controller burn (literally, because of a short circuit) and I lost everything. EVERYTHING 🙁

backup

Since that day I became “hypochondriac” about backups. Now, working with multiple huge projects and living as digital commuter I need a safe backup strategy able to handle almost any problem. Here is what I do.

My digital ecosystem is made by different “objects”: Macbook, iPhone, iPad and several online services (iCloud, Google Documents, GMail, Dropbox, Evernote, Todoist, 1Password, …).

For mobile devices and online services is available an online backup directly from provider (iCloud for Apple devices and application, Google Drive for Google services, …). Anyway, to be safe I take a snapshot of these services and devices twice a month.

My aggregated digital footprint is about 550 GB.

Everything single byte is on my Macbook. Obviously doing so, my notebook became a fucking single point of failure and this is not a good choice. Notebook backup strategy is on three different levels.

Apple Time Machine

time_machine

Everyday, during usual work, I run Time Machine in order to backup data on a WD My Passport Ultra 1TB. I choose this disk because use a single platter and hardware failures are less frequent. Backup happens incrementally about every hour and you can restore a single file from versioned snapshots.

my-passport-ultra

Backblaze

backblaze-storage-pod

backblaze-logoUnfortunately external hard disks are safe but are a piece of hardware I usually take with me during commuting and everything could happen. I can loose my bag, someone can stole it or I can broke the disk hitting something. Online backup is a great complement and Backblaze is a great provider. You just need to install their agent, generate encryption key and backup starts. Backup is encrypted client side so your data isn’t vulnerable to man-in-the-middle attack and is safe on servers based in US. If you lost your data you can download it as zip or buy an hard drive shipped to you containing you snapshot. 30 days of historic on your files are available.

Carbon Copy Cloner

carbon-copy-clonerYou can assume to be safe with Time Machine and Backblaze but they are both incremental backups. After 30 days or when the hard disk is full, old data is deleted. This is ok almost every time except when you absolutely need it. To be safe is better to take a monthly snapshot using Carbon Copy Cloner. It creates an mountable snapshot of the disk you can easily access. A 5TB hard drive could retain 9 or 10 months of historical data and you can archive them in an easy way.

Amazon Glacier

amazon-glacier-logo3 levels of backup are definitely enough. However some kind of data are really important for me. My Personal photos and documents related to my family, health and house is better tu have a long-term copy. Amazon Glacier is the right place for that. Pack these files per month, TAR them, calculate the checksum then upload them to a given bucket on S3 and configure lifecycle in order to archive on Glacier after 1 day. Pricing is between $0.70 and $1.10 a month for 100 GB.

About a month ago I wrote about how cool was to migrate to HHVM on OpenShift. Custom cartridge for Nginx + HHVM and MariaDB was running fast and I was really excited about the new stack. I was landing on a new, beautiful world.

About a week later I faced some problems because of disk space. Files took only 560MB over 1GB but OpenShift shell gave me an error for 100% disk usage (and a nice blank page on the home because cache couldn’t be written). I wasn’t able to understand why it was giving me that error. It probably depends on log files written by custom cartridge in other position inside of the filesystem. No idea. Anyway I had no time to go deeper so I bought 1 more GB of storage.

The day after I bought the storage blog speed goes down. It was almost impossible to open the blog and CloudFlare gives me timeout for half of the requests. Blog visits started to fall and I have no idea about how to fix that. Some weeks later I discover some troubles with My Corderwall Badges and Simple Sharer Button Adder but, in the OpenShift environment, I had no external caching system useful to handle this kind of problems.

I didn’t want to come back to MySQL and Apache but also trash all my articles wasn’t fun so I choose something I rejected 3 years ago: I took a standalone server.

server-in-datacenter

First choice was Scaleway. It’s trendy and is BareMetal. 3.5€ for a 4 core ARM (very hipster choice), 2 GB RAM, 50 GB SSD server. New interface is cool, better then Linode and Digital Ocean, server and resources are managed easily. Unfortunately HHVM is still experimental on ARM and SSD are on SAN, and they aren’t so fast (100MB/s).

Next choice was OVH. New 2016 VPS SSD (available in Canadian datacenters) are cheap enough (3.5$) and offer a virtual core Xeon with 2 GB RAM and 10 GB SSD. Multicore performances are lower and you have a lot of less storage but is an X86-64 architecture and SSD is faster (250 MB/s). I took this one!

Unfortunately my preferences aren’t changed since my first post. I’m still a developer, not a sysadmin. I’m not a master in Linux configuration and my stack has several running parts and my blog was still unavailable. My beautiful migration on cutting edge technologies became an emergency landing.

Luckily I found several online tutorial which explain how to master the WordPress stack. In the next days I completed the migration and my new stack now runs on: Pound, Varnish, Nginx, HHVM, PHP-FPM and MariaDB. I hope to have enough time in the coming days to publish all the useful stuff I used for configuration.

For the moment I’m proud to share average response time of the home page: 342ms 🙂

Yesterday I had to re-deploy the WordPress installation of PrimeGap.net on a new server and, looking for some tips about configuration, I found a new strange buzzword: LEMP Stack. 

We all know the LAMP Stack and we all know it’s old, slow and hard to scale. It includes any distribution of Linux, Apache with PHP as a module and MySQL 5.x.

A LEMP Stack is a bit different. First of all it uses nginx (pronounced “engine x”) and this explain the “E”. Then you can replace MySQL with any of the other fork. I personally use MariaDB 10.0. Many people also use Percona.

You can also replace PHP with another language such Python or Ruby but if you still use PHP choose PHP-FPM.

Many hosting provider provide useful guides to setup you server:

Linode is a bit different and uses PHP-FastCGI. Both uses MySQL. If you, like me, prefer MariaDB following guides should help you:

Current version of WordPress is easy to run on it. WordPress Codex provides a custom configuration to uses nginx. There are many optimization you can do. This Gist seems well done: https://gist.github.com/tjstein/902803

Welcome to the next-gen 🙂

When your WordPress blog gets bigger and traffic grows you have to handle cache. No excuse: is a fast, cheap and optimized way to deal with traffic problems.

W3 Total Cache and WP Super Cache are the most popular plugins to handle cache on WordPress which works well but I don’t like them. They are a good choice only if you don’t have many customizations or some hardware constrain. Recently I had many problems with both two because of other caching systems and some problems on the server.

Now if I have to activate cache on a production stack formed by Apache + PHP-CGI + WordPress I certainly choose nginx.

Using nginx you can activate a fully customized cache layer in front of your website. Unfortunately WordPress doesn’t support nginx cache natively. You need a plugin to deal with it: the WordPress Nginx proxy cache integrator. The code is below:

function add_xaccel_header() {
// Set the X-Accel-Expires header to never cache the page
// if it looks like the page needs to be tailored
// for a user.
$user_cookie_there = false;
foreach($_COOKIE as $key => $value){
if( preg_match('/wordpress_(?!test_cookie)|comment_author|wp-postpass/', $key) ){
$user_cookie_there = true;
}
}
if($user_cookie_there){
header("X-Accel-Expires: 0");
}
}
add_action('init','add_xaccel_header');

The plugin sets a cookie to enable nginx to discriminate between logged and guest user and not cache pages for authors. Default behavior for nginx is to not cache any pages if a cookie is set.

After you had set this plugin you can put nginx in front of Apache. The authors of the plugin suggest an optimal configuration but I prefer a simpler one written by my friend @dani_viga on his blog.

[UPDATE 2013-03-05] Look at the post (finally public :))
http://vfamilyserver.org/blog/2013/02/wordpress-caching-with-nginx/

In addition, to speed up IO (server’s NAS has some latency problems), we use a ramdrive to store cache files. We also had to disable mod_deflate in order to solve some problems when you active gzip compression before nginx.

Now average response time is under 100 ms, static contents are served without Apache and there is no potentially problematic plugin to manage. And purge cache takes less then 3 secs. 😀

Many high-traffic web applications take advantages from caching systems. HTML cache is easy and powerful. IMHO best solution is serving it using something like nginx and Varnish. Many people use custom solutions (for example a WordPress plugin) which produce an HTML snapshot of the page and save it to disk.

If you website il quite large in a flash you get a huge amount of small files stored on your disk. Clean cache is not easy as you would hope.

First solution is to use:

rm -Rf [path]

But this is dangerous because IO wait is terrible (even with SSD) and load average of you machine rise up to 50 in a minute.

find can help you:

find [path] -type f -print -delete

Unfortunately, after a while, load average rise again. Rising is slower but is still dangerous.

Solution is to use ionice. It enable you to limit priority of your process and avoid load average rise.

ionice -c 3 find [path] -type f -print -delete

Thanks to @dani_viga for the tips! He saves my day 🙂

More about ionice