Informations about VKontakte, the largest european social network, and its infrastructure are very few and fragmented. The only recent insights, in english, about its technology is a BTI’s press release which talks about VK migration on their infrastructure. Everything was top secret.
Only on 2011 at Moscow HighLoad++, Pavel Durov and Oleg Illarionov told something about the architecture of the social network and insights are collected into this post (in russian).
VK seems not different than any other popular social network: is over a LAMP stack and uses many other open source technologies.
- Debian is the base for their custom Linux distro.
- nginx mange load balancing in front of Apache who runs PHP using mod_php and XCache as opcode cacher.
- MySQL is the main datastore but a custom DBMS (written using C and based on memcached protocol) is used for some magics. memcached helps also page caching.
- XMPP is used for messages and chats and runs over node.js. Availability is granted by HAProxy who handle the node’s fragility.
- Multimedia files are stored using xfs and media encoding is made using ffmpeg.
- Everything is distributed over more than 4 datacenters
The main difference betweek VK and other social network is about server functions: VK servers are multifunctional. There is no clear distinction between database servers or file servers, they are used simultaneously in several roles.
Load balancing between servers occurs on a layered circuit which includes at balancing DNS, as well as routing requests within the system, wherein the different servers are used for different types of requests.
For example, microblogging is working on a tricky circuit using memcached protocol capability for parallel sending requests for data on a large number of keys. In the absence of data in the cache, the same request is sent to the storage system, and the results are subjected to sorting, filtering and discarding the excess at the level of PHP-code.
The custom database is still a secret and is widely used in VKontakte. Many services use it: private messages, messages on the walls, statuses, search, privacy, friends lists and probably more. It uses a non-relational data model, and most operations are performed in memory. Access interface is an advanced protocol memcached. Specially compiled keys return the results of complex queries. They said is developed “best minds” of Russia.
I wasn’t able to find any other insight about VK infrastructure after this speech. They are like KGB 😀