I always like The Setup. Discover what kind of technologies, hardware and softwares other skilled people are using is extremely useful and really fun for me. This time I’d like to share some tips from the complete reboot I did to my personal ecosystem after switch to my new Macbook.

macbook_pro_13_retina

From the hardware side is a simple high-end 2015 Macbook Pro 13″ Retina with Intel Core i7 Haswell dual-core at 3,4GHz, 16GB of RAM and 1TB of SSD PCI Express 3.0. Is fast, solid, lightweight and flexible. The only required accessory is the Be.eZ LArobe Second Skin.

From the software side I decided to avoid Time Machine restore in order to setup a completely new environment. I started on a OS X 10.10 Yosemite fresh installation.

As polyglot developer I usually deal with a lot of different applications, programming languages and tools. In order to decide what top install, a list of what I had on the previous machine and what I need more was really useful.

Here is a list of useful software and some tips about the installation process.

Applications

paid_apps

Paid softwares worth having: Evernote (with Premium subscription and Skitch) and Todoist (with Premium subscription) both available on the Mac App Store. 1Password, Fantastical 2, OmniGraffle, Carbon Copy Cloner, Backblaze and Expandrive available on their own websites.

Free software worth having: Google Chrome and Mozilla Firefox as browser, Apache OpenOffice, Skype and Slack as chat, VLC for multimedia and Transmission for torrents.

app_from_suites

Suites or part of: Adobe Photoshop CC, Adobe Illustrator CC and Adobe Acrobat Pro DC are part of the Adobe Creative Cloud. Microsoft Word 2016, and Microsoft Excel 2016 are part of Microsoft Office 2016 for Mac (now in free preview). Apple Pages, and Apple Keynote are preinstalled as Apple iWork suite as well as Apple Calendar and Apple Contacts.

Development tools

Utilities for Power Users: Caffeine, Growl and HardwareGrowler, iStat Menu Pro, Disk Inventory X, Tor Browser and TrueCrypt 7.1a (you need to fix a little installation bug on OS X 10.10), Kinematic and Boot2Docker for Docker, Sublime Text 3 (with some additions like: Spacegray Theme, Soda Theme, a new icon, Source Code Pro font), Tower, Visual Studio Code, Android SDK (for Android emulator) and XCode (for iOS emulator), VirtualBox (with some useful Linux virtual images), iTerm 2.

CLI: OhMyZSH, Homebrew, GPG (installed using brew), XCode Command Line Tools (from Apple Developers website), Git (with git-flow installed using brew), AWS CLI (install via pip), PhantomJS, s3cmd and faster s4cmd, Heroku toolbelt and Openshift Client Tools (install via gem).

daemons

Servers: MariaDB 10.0 (brew), MongoDB 3.0 (brew), Redis 3.0 (brew), Elasticsearch 1.6 (brew), Nginx 1.8.0 (brew), PostgreSQL 9.4.2 (via Postgres.app), Hadoop 2.7.0 (brew), Spark 1.4 (download from official website), Neo4j 2.2 (brew), Accumulo 1.7.0 (download from official website), Crate 0.49 (download from official website), Mesos 0.22 (download from official website), Riak 2.1.1 (brew), Storm 0.9.5 (download from official website), Zookeeper 3.4.6 (brew), Sphinx 2.2 (brew), Cassandra 2.1.5 (brew).

languages

Programming languages: RVM, Ruby (MRI 2.2, 2.1, 2.0, 1.9.3, 1.8.7, REE 2012.02, JRuby 1.7.19 installed using RVM), PHP 5.6 with PHP-FPM (installed using brew), HHVM 3.7.2 (installed using brew with adding additional repo, has some issues on 10.10), Python 2.7 (brew python) and Python 3.4 (brew python3), Pip 7.1 (shipped with Python), NVM, Node.js 0.12 and IO.js 2.3 (both installed using NVM), Go 1.4.2 (from Golang website), Java 8 JVM (from Oracle website), Java 8 SE JDK (from Oracle website), Scala 2.11 (from Scala website), Clojure 1.6 (from Clojure website), Erlang 17.0 (brew), Haskell GHC 7.10 (brew), Haskell Cabal 1.22 (brew), OCaml 4.02.1 (brew), R 3.2.1 (from R for Mac OS X website), .NET Core and ASP.NET (brew using DNVM), GPU Ocelot (compiled with a lot of libraries).

Full reboot takes about 2 days. Some software are still missing but I was able to restart my work almost completely. I hope this list would be helpful for anyone ūüôā

lucene

In the beginning was Apache Lucene. Written in 1999,¬†Lucene is an “information¬†retrieval¬†software library” built to index documents containing fields of text.¬†This flexibility allows Lucene’s API to be independent of the file format. Almost everything¬†can be indexed as long as its textual information can be extracted.

lucene_structure

Formally Lucene is an inverted full-text index. The core elements of such an index are segments, documents, fields, and terms. Every index consists of one or more segments. Each segment contains one or more documents. Each document has one or more fields, and each field contains one or more terms. Each term is a pair of Strings representing a field name and a value. A segment consists of a series of files.

Scaling is done by distributing indexes into multiple servers. One server ‚Äėshard‚Äô will get a query request and then search itself, as well as the other shards in the configuration, and return the combined results from each shard.

solr

Apache Solr is a search platform, part of the Apache Lucene¬†project.¬†Its major features include full-text search, hit highlighting,¬†faceted search, dynamic clustering, database integration, and rich document handling. It provide a REST-like API supporting XML and JSON format. It’s used by many notable sites¬†to index theirs contents, here is the public list.

There are many well-tested way to interact with Solr. If you use Ruby Sunspot can be a good choice. Here is a small example (from the official website). Indexing is made within a model:

class Post < ActiveRecord::Base   searchable do     text :title, :body     text :comments do       comments.map { |comment| comment.body }     end     integer :blog_id     integer :author_id     integer :category_ids, :multiple => true
time :published_at
string :sort_title do
title.downcase.gsub(/^(an?|the)\b/, '')
end
end
end

And when you search something you can specify many different conditions.

Post.search do
fulltext 'best pizza'
with :blog_id, 1
with(:published_at).less_than Time.now
order_by :published_at, :desc
paginate :page => 2, :per_page => 15
facet :category_ids, :author_id
end

solrcloudVersion 4.0 start supporting high availability through sharding using SolrCloud. It is a way to shard and scale indexes. Shards and replicas are distributed across nodes and nodes are monitored by ZooKeeper. Any node can receive query request and propagate it to the correct place. Image on the side (coming from an interesting blog post about SolrCloud) describe an example of setup.

elasticsearch

ElasticSearch is a search platform (written by Shay Banon the creator of Compass, another search platform). It provide a JSON API and supports almost every feature of Solr.

There are many way to use it, many also with Ruby. Tire seems a good choice. A small example (from the Github page). Define what attribute to index and index them:

Tire.index 'articles' do
delete
create :mappings => {
:article => {
:properties => {
:id       => { :type => 'string', :index => 'not_analyzed', :include_in_all => false },
:title    => { :type => 'string', :boost => 2.0,            :analyzer => 'snowball'  },
:tags     => { :type => 'string', :analyzer => 'keyword'                             },
:content  => { :type => 'string', :analyzer => 'snowball'                            }
}
}
}
store :title => 'One',   :tags => ['ruby']
store :title => 'Two',   :tags => ['ruby', 'python']
store :title => 'Three', :tags => ['java']
store :title => 'Four',  :tags => ['ruby', 'php']
refresh
end

Then search them:

s = Tire.search 'articles' do
query do
string 'title:T*'
end
filter :terms, :tags => ['ruby']
sort { by :title, 'desc' }
facet 'global-tags', :global => true do
terms :tags
end
facet 'current-tags' do
terms :tags
end
end

sphinx

Sphinx is the only real alternative to Lucene. Differently than Lucene, Sphinx is designed to index content coming from a database. It supports native protocols of MySQL, MariaDB and PostgreSQL or standard ODBC protocol. You can also run Sphinx as standalone server and communicating with it using the SphinxAPI.

Sphinx also offer a storage engine called SphinxSE. It’s compatible with MySQL and integrated into MariaDB. Querying is possible using¬†SphinxQL, a subset of SQL.

To use it in Ruby the official gem is Thinking Sphinx. Below some example of usage directly from the github page. Defining indexs:

ThinkingSphinx::Index.define :article, :with => :active_record do
indexes title, content
indexes user.name, :as => :user
indexes user.articles.title, :as => :related_titles
has published
end

and querying

ThinkingSphinx.search(
select: '@weight * 10 + document_boost as custom_weight',
order: :custom_weight
)

Others libraries

There are many other software and library designed to index and search stuff.

  • Amazon CloudSearch¬†is a fully-managed search service in the cloud. It’s part of the AWS cloud and should be “fast and highly scalable” as Amazon says.
  • Lemur Project is a kind of information¬†retrieval¬†framework. It integrates the Indri search engine, a C and C++ library who can easily index HTML and XML stuff and be distributed across cluster’s nodes.
  • Xaplan is¬†probabilistic¬†information retrieval¬†library. Is written in C++ and can be used with many popular languages.¬†It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

Sources: