In previous post I analyzed Facebook main infrastructure. Now I’m going deeper into services.
Facebook is the biggest photo sharing service in the world and grows by several millions of images every week. Pre-2009 infrastructure uses three NFS tier. Also with some optimization this solution can’t easily scale over a few billions of images.
So in 2009 Facebook develop Haystack, an HTTP based photo server. It is composed by 5 layers: HTTP server, Photo Store, Haystack Object Store, Filesystem and Storage.
Storage is made on storage blades using a RAID-6 configuration who provides adequate redundancy and excellent read performance. The poor write performance is partially mitigated by the RAID controller NVRAM write-back cache. Filesystem used is XFS and manage only storage-blade-local files, no NFS is used.
Haystack Object Store is a simple log structured (append-only) object store containing needles representing the stored objects. A Haystack consists of two files:
the actual haystack store file containing the needles
plus an index file
Photo Store server is responsible for accepting HTTP requests and translating them to the corresponding Haystack store operations. It keeps an in-memory index of all photo offsets in the haystack store file. The HTTP framework we use is the simple evhttp server provided with the open source libevent library.
Insights and Sources
Facebook Messages and Chat
Facebook messaging system is powered by a system called Cell. The entire messaging system (email, SMS, Facebook Chat, and the Facebook Inbox) is divided into cells, and each cell contains only a subset of users. Every Cell is composed by a cluster of application server (where business logic is defined) monitored by different ZooKeeper instances.
Application servers use a data acces layer to communicate with metadata storage, an HBase based system (old messaging infrastructure relied on Cassandra) which contains all the informations related to messages and users.
Cells are the “core” of the system. To connect them to the frontend there are different “entry points”. An MTA proxy parses mail and redirect data to the correct application. Emails are stored in the same structure than photos: Haystack. There are also discovering service to map user-to-Cell (based on hashing) and service-to-Cell (based on ZooKeeper notifications) and everything expose an API.
There is a “dirty” cache based on Memcached to serve messages (from a local cache of datacenter) and social information about the users (like social indexes).
The search engine for messages is built using an inverted index stored in HBase.
Chat is based on an Epoll server developed in Erlang and accessed using Thrift and there is also a subsystem for logging chat messages (in C++). Both subsystems are clustered and partitioned for reliability and efficient failover.
Real-time presence notification is the most resource-intensive operation performed (not sending messages): keeping each online user aware of the online-idle-offline states of their friends. Real-time messaging is done using a variation of Comet, specifically XHR long polling, and/or BOSH.
Insights and Sources
Original Facebook search engine simply searches into cached users informations: friends list, like lists and so on. Typeahead search (the search box on the top of Facebook frontend) came on 2009. It try to suggest you most interesting results. Performances are really important and results must be available within 100ms. It has to be fast and scalable and the structure of the system is built as follow:
First attempts are still on browser cache where are stored informations about user (friends, like, pages). If cache misses starts and AJAX request.
Many Leaf services search for results inside theirs indexes (stored into an inverted index called Unicorn). When results references are fetched from the indexes, they are merged and loaded from global datastore. An aggregator provide a single channel to send data to client. Obviously query are cached.
On 2012 Facebook starts from the core part of typeahead search to build a new search tool. Unicorn is the core part of the new Graph Search. Formally is an in-memory inverted index which maps Facebook contents as a graph database would do and you can query it as graph traversing tool. To be used for Graph Search, Unicorn was updated to be more than a traversing tool. Now supports nested queries, scoring and support for different kind of resources. Results are aggregated on different levels.
Query lifecycle is usually made by 2 steps: the Suggestion Phase and the Search Phase.
The Suggestion Phase works like “autocomplete” and Is powered by a Natural Language Processing (NLP) module attempts to parse text based on a grammar. It identifies parts of the query as potential entities and passes these parts down to Unicorn to search for them.
The Search Phase begins when the searcher has made a selection from the suggestions. The parse tree, along with the fbids of the matched entities, is sent back to the Top Aggregator. A user readable version of this query is displayed as part of the URL.
Currently Graph Search is still in beta.
Insights and Sources
Resources and insights
This post and the previous one are based and inspired by the answer of Michaël Figuière to the following question on Quora: What is Facebook’s architecture?