I always like The Setup. Discover what kind of technologies, hardware and softwares other skilled people are using is extremely useful and really fun for me. This time I’d like to share some tips from the complete reboot I did to my personal ecosystem after switch to my new Macbook.

macbook_pro_13_retina

From the hardware side is a simple high-end 2015 Macbook Pro 13″ Retina with Intel Core i7 Haswell dual-core at 3,4GHz, 16GB of RAM and 1TB of SSD PCI Express 3.0. Is fast, solid, lightweight and flexible. The only required accessory is the Be.eZ LArobe Second Skin.

From the software side I decided to avoid Time Machine restore in order to setup a completely new environment. I started on a OS X 10.10 Yosemite fresh installation.

As polyglot developer I usually deal with a lot of different applications, programming languages and tools. In order to decide what top install, a list of what I had on the previous machine and what I need more was really useful.

Here is a list of useful software and some tips about the installation process.

Applications

paid_apps

Paid softwares worth having: Evernote (with Premium subscription and Skitch) and Todoist (with Premium subscription) both available on the Mac App Store. 1Password, Fantastical 2, OmniGraffle, Carbon Copy Cloner, Backblaze and Expandrive available on their own websites.

Free software worth having: Google Chrome and Mozilla Firefox as browser, Apache OpenOffice, Skype and Slack as chat, VLC for multimedia and Transmission for torrents.

app_from_suites

Suites or part of: Adobe Photoshop CC, Adobe Illustrator CC and Adobe Acrobat Pro DC are part of the Adobe Creative Cloud. Microsoft Word 2016, and Microsoft Excel 2016 are part of Microsoft Office 2016 for Mac (now in free preview). Apple Pages, and Apple Keynote are preinstalled as Apple iWork suite as well as Apple Calendar and Apple Contacts.

Development tools

Utilities for Power Users: Caffeine, Growl and HardwareGrowler, iStat Menu Pro, Disk Inventory X, Tor Browser and TrueCrypt 7.1a (you need to fix a little installation bug on OS X 10.10), Kinematic and Boot2Docker for Docker, Sublime Text 3 (with some additions like: Spacegray Theme, Soda Theme, a new icon, Source Code Pro font), Tower, Visual Studio Code, Android SDK (for Android emulator) and XCode (for iOS emulator), VirtualBox (with some useful Linux virtual images), iTerm 2.

CLI: OhMyZSH, Homebrew, GPG (installed using brew), XCode Command Line Tools (from Apple Developers website), Git (with git-flow installed using brew), AWS CLI (install via pip), PhantomJS, s3cmd and faster s4cmd, Heroku toolbelt and Openshift Client Tools (install via gem).

daemons

Servers: MariaDB 10.0 (brew), MongoDB 3.0 (brew), Redis 3.0 (brew), Elasticsearch 1.6 (brew), Nginx 1.8.0 (brew), PostgreSQL 9.4.2 (via Postgres.app), Hadoop 2.7.0 (brew), Spark 1.4 (download from official website), Neo4j 2.2 (brew), Accumulo 1.7.0 (download from official website), Crate 0.49 (download from official website), Mesos 0.22 (download from official website), Riak 2.1.1 (brew), Storm 0.9.5 (download from official website), Zookeeper 3.4.6 (brew), Sphinx 2.2 (brew), Cassandra 2.1.5 (brew).

languages

Programming languages: RVM, Ruby (MRI 2.2, 2.1, 2.0, 1.9.3, 1.8.7, REE 2012.02, JRuby 1.7.19 installed using RVM), PHP 5.6 with PHP-FPM (installed using brew), HHVM 3.7.2 (installed using brew with adding additional repo, has some issues on 10.10), Python 2.7 (brew python) and Python 3.4 (brew python3), Pip 7.1 (shipped with Python), NVM, Node.js 0.12 and IO.js 2.3 (both installed using NVM), Go 1.4.2 (from Golang website), Java 8 JVM (from Oracle website), Java 8 SE JDK (from Oracle website), Scala 2.11 (from Scala website), Clojure 1.6 (from Clojure website), Erlang 17.0 (brew), Haskell GHC 7.10 (brew), Haskell Cabal 1.22 (brew), OCaml 4.02.1 (brew), R 3.2.1 (from R for Mac OS X website), .NET Core and ASP.NET (brew using DNVM), GPU Ocelot (compiled with a lot of libraries).

Full reboot takes about 2 days. Some software are still missing but I was able to restart my work almost completely. I hope this list would be helpful for anyone 🙂

From the home page

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. […] Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.”

Introduction

Storm enables you to define a Topology (an abstraction of cluster computation) in order to describe how to handle data flow. In a topology you can define some Spouts (entry point for your data with basic preprocessing) and some Bolts (a single step of data manipulation). This simple strategy enable you to define a complex processing of streams of data.

Storm nodes are of two kinds: master and worker. Master node runs Nimbus, it is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. Worker nodes run Supervisor. The Supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Everything is done through ZooKeeper.

Libraries

Resources

Books

In previous post I analyzed Facebook main infrastructure. Now I’m going deeper into services.

Facebook Images

Facebook is the biggest photo sharing service in the world and grows by several millions of images every week. Pre-2009 infrastructure uses three NFS tier. Also with some optimization this solution can’t easily scale over a few billions of images.

So in 2009 Facebook develop Haystack, an HTTP based photo server. It is composed by 5 layers: HTTP server, Photo Store, Haystack Object Store, Filesystem and Storage.

Storage is made on storage blades using a RAID-6 configuration who provides adequate redundancy and excellent read performance. The poor write performance is partially mitigated by the RAID controller NVRAM write-back cache. Filesystem used is XFS and manage only storage-blade-local files, no NFS is used.

Haystack Object Store is a simple log structured (append-only) object store containing needles representing the stored objects. A Haystack consists of two files:

the actual haystack store file containing the needles

haystack_content

plus an index file

haystack_index

Photo Store server is responsible for accepting HTTP requests and translating them to the corresponding Haystack store operations. It keeps an in-memory index of all photo offsets in the haystack store file. The HTTP framework we use is the simple evhttp server provided with the open source libevent library.

Insights and Sources

Facebook Messages and Chat

Facebook messaging system is powered by a system called Cell. The entire messaging system (email, SMS, Facebook Chat, and the Facebook Inbox) is divided into cells, and each cell contains only a subset of users. Every Cell is composed by a cluster of application server (where business logic is defined) monitored by different ZooKeeper instances.

Application servers use a data acces layer to communicate with metadata storage, an HBase based system (old messaging infrastructure relied on Cassandra) which contains all the informations related to messages and users.

Cells are the “core” of the system. To connect them to the frontend there are different “entry points”. An MTA proxy parses mail and redirect data to the correct application. Emails are stored in the same structure than photos: Haystack. There are also discovering service to map user-to-Cell (based on hashing) and service-to-Cell (based on ZooKeeper notifications) and everything expose an API.

There is a “dirty” cache based on Memcached to serve messages (from a local cache of datacenter) and social information about the users (like social indexes).

facebook_messages_architecture

The search engine for messages is built using an inverted index stored in HBase.

Chat is based on an Epoll server developed in Erlang and accessed using Thrift and there is also a subsystem for logging chat messages (in C++). Both subsystems are clustered and partitioned for reliability and efficient failover.

Real-time presence notification is the most resource-intensive operation performed (not sending messages): keeping each online user aware of the online-idle-offline states of their friends. Real-time messaging is done using a variation of Comet, specifically XHR long polling, and/or BOSH.

Insights and Sources

Facebook Search

Original Facebook search engine simply searches into cached users informations: friends list, like lists and so on. Typeahead search (the search box on the top of Facebook frontend) came on 2009. It try to suggest you most interesting results. Performances are really important and results must be available within 100ms. It has to be fast and scalable and the structure of the system is built as follow:

typeahead_search

First attempts are still on browser cache where are stored informations about user (friends, like, pages). If cache misses starts and AJAX request.

Many Leaf services search for results inside theirs indexes (stored into an inverted index called Unicorn). When results references are fetched from the indexes, they are merged and loaded from global datastore. An aggregator provide a single channel to send data to client. Obviously query are cached.

On 2012 Facebook starts from the core part of typeahead search to build a new search tool. Unicorn is the core part of the new Graph Search. Formally is an in-memory inverted index which maps Facebook contents as a graph database would do and you can query it as graph traversing tool. To be used for Graph SearchUnicorn was updated to be more than a traversing tool. Now supports nested queries, scoring and support for different kind of resources. Results are aggregated on different levels.

unicorn

Query lifecycle is usually made by 2 steps: the Suggestion Phase and the Search Phase.

The Suggestion Phase works like “autocomplete” and Is powered by a Natural Language Processing (NLP) module attempts to parse text based on a grammar. It identifies parts of the query as potential entities and passes these parts down to Unicorn to search for them.

The Search Phase begins when the searcher has made a selection from the suggestions. The parse tree, along with the fbids of the matched entities, is sent back to the Top Aggregator. A user readable version of this query is displayed as part of the URL.

Currently Graph Search is still in beta.

Insights and Sources

Resources and insights

This post and the previous one are based and inspired by the answer of Michaël Figuière to the following question on Quora: What is Facebook’s architecture?

Additional stuff:

Facebook has one of the biggest hardware infrastructure in the world with more than 60.000 servers. But servers are worthless if you can’t efficiently use them. So how does Facebook works?

Facebook born in 2004 on a LAMP stack. A PHP website with MySQL datastore. Now is powered by a complex set of interconnected systems grown over the original stack. Main components are:

  1. Frontend
  2. Main persistence layer
  3. Data warehouse layer
  4. Facebook Images
  5. Facebook Messages and Chat
  6. Facebook Search

1. Frontend

hiphopThe frontend is written in PHP and compiled using HipHop (open sourced on github) and g++. HipHop converts PHP in C++ you can compile. Facebook frontend is a single 1,5 GB binary file deployed using BitTorrent. It provide logic and template system. Some parts are external and use HipHop Interpreter (to develop and debug) and HipHop Virtual Machine to run HipHop “ByteCode”.

Is not clear which web server they are using. Apache was used in the beginning and probably is still used for main fronted. In 2009 they acquired (and open sourced) Tornado from FriendFeed and are using it for Real-Time updates feature of the frontend. Static contents (such as fronted images) are served through Akamai.

Everything not in the main binary (or in the ByteCode source) is served using using Thrift protocol. Java services are served without Tomcat or Jetty. Overhead required by these application server is worthless for Facebook architecture. Varnish is used to accelerate responses to HTTP requests.

To speed up frontend rendering they use BigPipe, a sort of mod_pagespeed. After a HTTP request servers fetch data and build HTML structure. HTML is sent before data retrieval and is rendered by the browser. After render, Javascript retrieve and display data (already available) in asynchronous way.

memcachedThe Memcached infrastructure is based on more than 800 servers with 28TB of memory. They use a modded version of Memcached which reimplement the connection pool buffer (and rely on some mods on the network layer of Linux) to better manage hundred of thousand of connections.

Insights and Sources:

2. Main Persistence layer

Facebook rely on MySQL as main datastore. I know, I can’t believe it too…

mysql_clusterThey have one of the largest MySQL Cluster deploy in the world and use the standard InnoDB engine. As many other people they designd the system to easìly handle MySQL failure, they simply enforce backup. Recently Facebook Engineers published a post about their 3-level backup stack:

  • Stage 1: each node (all the replica set) of cluster has 2 rack backup. One of these backup binary log (to speed up slave promotion) and one with mysqldump snapshot taken every night (to handle node failure).
  • Stage 2: each binlog and backup are copied in a larger Hadoop cluster mapped using HDFS. This layer provide a fast and distributed source of data useful to restore nodes also in differenti locations.
  • Stage 3: provide a long-term storage copying data from Hadoop cluster to a discrete storage in a separate region.

Facebook collect several statistics about failure, traffic and backups. These statistics contribute to build a “score” of a node. If a node fails or has a score too low an automatic system provide to restore it from Stage 1 or Stage 2. They don’t try to avoid problem, they simply handle them as fast as possible.

Insights and Sources:

3. Data warehouse layer

hadoopMySQL data is also moved to a “cold” storage to be analyzed. This storage is based on Hadoop HDFS (which leverage on HBase) and is queried using Hive. Data such as logging, clicks and feeds transit using Scribe and are aggregating and stored in Hadoop HDFS using Scribe-HDFS.

Standard Hadoop HDFS implementation has some limitations. Facebook adds a different kind of NameNodes called AvatarNodes which use Apache ZooKeeper to handle node failure. They also adds RaidNode that use XOR-parity files to decrease storage usage.

hadoop_cluster

Analysis are made using MapReduce. Anyway analyze several petabytes of data is hard and standard Hadoop MapReduce framework became a limit on 2011. So they developed Corona, a new scheduling framework that separates cluster resource management from job coordination. Results are stored into Oracle RAC (Real Application Cluster).

Insights and Sources:

In the next post I’m going to analyze the other Facebook services: Images, Message and Search.