How it works: YouPorn

YouPorn is one of the most visited porn site on the web. I applied for a job as developer in 2009 because of a post who talk about its technological stack.

I heard again about its infrastructure because, about an year ago, @ErikPickupYP spoke about the great switchover at CooFoo and @antirez tweet some details regarding datastore. The development team rewrote the entire site using Redis as primary database.

Original stack was based on Perl and Catalyst and powered the site from 2006 to 2011. After acquisition they rewrote the site using a well designed LAMP stack.

The chosen framework is Symfony2 (which uses Doctrine as ORM) running over nginx with PHP-FPM helped by Varnish (speed up requests, manage cache and check servers status) and HAProxy (load balance and health check of servers). Syslog-ng handle logs. They maintain two pools of servers: a write pool with a fail-over to backup-Master and a read pool will servers except the master.

Datastore is the most interesting part. Initially they used MySQL but more than 200 million of pageviews and 300K query per second are too much to be handled using only MySQL. First try was to add ActiveMQ to enqueue writes but a separate Java infrastructure is too expensive to be maintained. Finally they add Redis in front of MySQL and use it as main datastore.

Now all reads come from Redis. MySQL is used to allow the building new sorted sets as requirements change and it’s highly normalized because it’s not used directly for the site. After the switchover additional Redis nodes were added, not because Redis was overworked, but because the network cards couldn’t keep up with Redis 😀

Lists are stored in a sorted set and MySQL is used as source to rebuild them when needed. Pipelining allows Redis to be faster and Append-only-file (AOF) is an efficient strategy to easily backup data.

In the end YouPorn uses a LAMP stack “on-steroids” which smartly uses Redis and other modern middlewares.

Sources:

Virtual memory and Redis cluster: to scale Redis is hard

Redis is a RAM based key-value store. RAM is expensive. Hard disks (even SSD) are slow. It’s the truth, we know.

A few months ago people tried to use Redis instead of MySQL (or similar SQL DBs) as main datastore. When you do, is easy to clash against the memory limit. As we learned from operative systems, first solution is to use your disk space to “enlarge” your RAM. Redis versions from 2.0 to 2.6 offer a Virtual Memory implementation.

Virtual Memory seems to be really useful in many cases. If only a small part of your keys get the vast majority of accesses you can efficiently keep only that part of keys into RAM and leave the remaining part on disk.

To enable Virtual Memory you can switch it on using vm-enabled yes and set the memory limit using vm-max-memory. Additionally you can fine tune the configuration using vm-pages and vm-page-size for swap file and vm-max-threads for concurrency.

Anyway since version 2.4 Virtual Memory is deprecated. This is the official note about it:

Redis VM is now deprecated. Redis 2.4 will be the latest Redis version featuring Virtual Memory (but it also warns you that Virtual Memory usage is discouraged). We found that using VM has several disadvantages and problems. In the future of Redis we want to simply provide the best in-memory database (but persistent on disk as usual) ever, without considering at least for now the support for databases bigger than RAM. Our future efforts are focused into providing scripting, cluster, and better persistence.

The alternative is Redis cluster. It will be a “distributed and fault tolerant implementation of a subset of the features available in the Redis stand alone server”. At the moment is a work in progress. There are some client-side implementation (for Node.jsfor Ruby and more) but not yet an official, standalone version.

Virtual memory deprecation and Redis cluster long developing time make me think about a simple idea:

Redis is not ready to be the main datastore for a huge dataset, not yet. 

More about Redis scaling

[2013-03-09 UPDATE] @olinicola advises me a post by @antirez about to use Redis in memory and to swap on SSD. His response is the same:

TL;DR: the outcome of this test was expected and Redis is an in-memory system 🙂

Intersect a SET with a ZSET

Redis‘s SET and ZSET (sorted sets) are a really powerful structure. The only limits are about set operation you can perform. Using Redis you can’t obtain the intersection (or the union) between two sorted set or between a SET and a ZSET. You can use SINTER to intersect a group of SET or SUNION for union. Unfortunately there is no direct way for ZSET.

In our use case, we had to intersect a ZSET (a sorted rank) and a SET (a group of categorized items) to find the rank of the element inside selected category.

After a successful search on Google I found a way on StackOverflow (view below link): use ZINTERSTORE. It’s really simple: act like SINTER but store results into a new ZSET. It has a quite expensive memory footprint but is ok if you frequently reuse the result (is like a cache and you can set expire time using EXPIRE).

Source
http://stackoverflow.com/questions/10500695/redis-how-to-intersect-a-normal-set-with-a-sorted-set