YouPorn is one of the most visited porn site on the web. I applied for a job as developer in 2009 because of a post who talk about its technological stack.
I heard again about its infrastructure because, about an year ago, @ErikPickupYP spoke about the great switchover at CooFoo and @antirez tweet some details regarding datastore. The development team rewrote the entire site using Redis as primary database.
Original stack was based on Perl and Catalyst and powered the site from 2006 to 2011. After acquisition they rewrote the site using a well designed LAMP stack.
The chosen framework is Symfony2 (which uses Doctrine as ORM) running over nginx with PHP-FPM helped by Varnish (speed up requests, manage cache and check servers status) and HAProxy (load balance and health check of servers). Syslog-ng handle logs. They maintain two pools of servers: a write pool with a fail-over to backup-Master and a read pool will servers except the master.
Datastore is the most interesting part. Initially they used MySQL but more than 200 million of pageviews and 300K query per second are too much to be handled using only MySQL. First try was to add ActiveMQ to enqueue writes but a separate Java infrastructure is too expensive to be maintained. Finally they add Redis in front of MySQL and use it as main datastore.
Now all reads come from Redis. MySQL is used to allow the building new sorted sets as requirements change and it’s highly normalized because it’s not used directly for the site. After the switchover additional Redis nodes were added, not because Redis was overworked, but because the network cards couldn’t keep up with Redis 😀
Lists are stored in a sorted set and MySQL is used as source to rebuild them when needed. Pipelining allows Redis to be faster and Append-only-file (AOF) is an efficient strategy to easily backup data.
In the end YouPorn uses a LAMP stack “on-steroids” which smartly uses Redis and other modern middlewares.
Recently I needed to select best hosted service for some datastore to use for a large and complex project. Starting from Heroku and AppFog’s add-ons I found many free plans useful to test service and/or to use in production if your app is small enough (as example this blog runs on Heroku PostgreSQL’s Dev plan). Here the list:
- Xeround (Starter plan): 5 connection and 10 MB of storage
- ClearDB (Ignite plan): 10 connections and 5 MB of storage
- MongoHQ (Sandbox): 50MB of memory, 512MB of data
- MongoLab (Starter plan): 496 MB of storage
- RedisToGo (Nano plan): 5MB, 1 DB, 10 connections and no backups.
- RedisCloud by Garantia Data: 20MB, 1 DB, 10 connections and no backups.
- MyRedis (Gratis plan): 5MB, 1 DB, 3 connections and no backups.
- IrisCouch (up to 5$): No limits, usage fees for HTTP requests and storage.
- Cloudant (Oxygen plan): 150,000 request, 250 MB of storage.
PostgreSQL – Heroku PostgreSQL (Dev plan): 20 connections, 10.000 rows of data
Cassandra – Cassandra.io (Beta on Heroku): 500 MB and 50 transactions per second
Riak – RiakOn! (Sandbox): 512MB of memory
Hadoop – Treasure Data (Nano plan): 100MB (compressed), data retention for 90 days
Neo4j – Heroku Neo4j (Heroku AddOn beta): 256MB of memory and 512MB of data.
OrientDB – NuvolaBase (Free): 100MB of storage and 100.000 records
TempoDB – TempoDB Hosted (Development plan): 50.000.000 data points, 50 series.
JustOneDB – Heroku JustOneDB (Lambda plan): 50MB of data
In the beginning was RediSQL, a “Hybrid Relational-Database/NOSQL-Datastore written in ANSI C”. After a while they changed the name to AlchemyDB.
Everything is built over Redis adding a Lua interpreter to implement a really interesting technique they call Datastore-Side-Scripting. Is like to use stored procedure, putting logic into datastore. They can achieve many different goals using this technique:
- Implement a SQL-like language using Lua to decode SQL requests
- Implement many datatypes not supported by Redis using Lua to fit into common Redis types the new structure
- Serve content (like web pages o JSON data) directly from the datastore using a REST API.
- Implement a GraphDB using SQL for Index and Lua for graph-traversal logic.
- Implement Document-oriented model using Lua
- Implement an ObjectDB using Lua
Last year Citrusleaf acquired AlchemyDB and Russ Sullivan (the guy behind AlchemyDB) incrementally porting functionality to run on top of Citrusleaf’s proven distributed high-availability linearly-scalable key-value store: Aerospike. It is a distributed NoSQL database, the first solution to claim ACID support and an extremely fast architecture optimized to run using SSDs.
I didn’t test it yet but as far I can see they provide and SDK for most popular programming languages. The Ruby one requires a native library. To start you need to add a node:
c = Citrusleaf.new
c.add_node "10.1.1.6", 3000
delete operations are done as follow:
# Writing Values
c.put 'namespace', 'myset', 'mykey', 'bin_name', value
# Reading Values
rv, gen, value = c.get 'namespace', 'myset', 'mykey', 'binx'
# Deleting Values
c.delete 'namespace', 'myset', 'mykey'
Documentation isn’t useful yet. The only way to understand how if is cool or not is to test it. That’s what I’ll do.
Redis is widely used into projects I have to work on everyday at @thefool_it. My knowledge about it is really poor so I decided to improve my experience up to a PRO level. I understand basic Redis concepts because I worked with memcached in the past and differences were clearly explained into “Seven Databases in Seven Weeks“. My weaknesses are about everyday use: setup, administration, querying 🙁
Introduction and setup.
Installation is really easy because you can compile from source. On OSX you also have
port with an up-to-date package. Update isn’t so easy. Standard way is to start the updated version on another port and migrate data.
Data types are: Strings, Lists (ordered list of string), Hashes, Sets (no duplicated values) and Sorted Sets (sets sorted by a counter).
Standard distribution comes with a command line interface: the
redis-cli. There is a standard library for most common environments and programming languages such as Node.js (node_redis), Python (redis-py), Ruby (redis-rb) and more.
In the coming weeks I’m going to practice about commands and admin techniques using following resources.
by Tiago Macedo, Fred Oliveira
Other interesting sources
Redis‘s SET and ZSET (sorted sets) are a really powerful structure. The only limits are about set operation you can perform. Using Redis you can’t obtain the intersection (or the union) between two sorted set or between a SET and a ZSET. You can use SINTER to intersect a group of SET or SUNION for union. Unfortunately there is no direct way for ZSET.
In our use case, we had to intersect a ZSET (a sorted rank) and a SET (a group of categorized items) to find the rank of the element inside selected category.
After a successful search on Google I found a way on StackOverflow (view below link): use ZINTERSTORE. It’s really simple: act like SINTER but store results into a new ZSET. It has a quite expensive memory footprint but is ok if you frequently reuse the result (is like a cache and you can set expire time using EXPIRE).