neural-network

Deep Learning is a trending buzzword in the Machine Learning environment. All the major players in Silicon Valley are heavily investing in these topics and US universities are improving their courses offer.

I’m really interested in artificial intelligence both for fun and for work and I spent a few hours in the last weeks searching for best MOOCs about this topic. I found only a few courses but they are from the most notable figures in Deep Learning and Neural Networks environment.

Machine Learning
Stanford University on Coursera, Andrew Ng

Andrew Ng is Chief Scientist at Baidu Research since 2015, founder of Coursera and Machine Learning lecturer at Stanford University. He also founded the Google Brain project in 2011. His Machine Learning (CS229a) course at Stanford is quite mythical and, obviously, was my starting point.

machine-learning-ng

Machine Learning, Coursera

Neural Networks for Machine Learning
University of Toronto on Coursera, Geoffrey Hinton

Geoffrey Hinton is working at Google (probably on Google Brain) since 2013 when Google acquire his company DNNResearch Inc. He is a cognitive psychologist most noted for his work on artificial neural networks. His Coursera course on Neural Networks is related to 2012 but seem to be one of the best resource about these topics.

neural-networks-for-machine-learning

Neural Networks for Machine Learning, Coursera

Deep Learning (2015)
New York University on TechTalks, Yann LeCun (videos on techtalks.tv)

In 2013 LeCun became the first director of Facebook AI Research. He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNN), and is a founding father of convolutional nets. 2015 Deep Learning course at NYU is the last course about this topic hold by him.

Yann LeCun. CIFAR NCAP pre-NIPS' Workshop. Photo: Josh Valcarcel/WIRED

Yann LeCun. CIFAR NCAP pre-NIPS’ Workshop. Photo: Josh Valcarcel/WIRED

Big Data, Large Scale Machine Learning
New York University on TechTalks, John Langford and Yann LeCun

Another interesting course about Machine Learning hold by LeCun and John Langford, researcher at Yahoo Research, Microsoft Research and IBM’s Watson Research Center.

langford

John Langford, NYU

Deep Learning Courses
NVIDIA Accelerated Computing

This is not a college course. NVIDIA was one of the most important graphic board manufacturer in the early 2000s and now, with the experience of massive parallel computer on GPUs, is heavily investing in Deep Learning. This course is focused on usage of GPUs on most common deep learning framework: DIGITS, Caffe, Theano and Torch.

deep-learning-course

Deep Learning Courses, NVIDIA

Mastering Apache Spark
Mike Frampton, Packt Publishing

Last summer I had the opportunity to collaborate in review of this title. Chapter about MLlib contains a useful introduction to Artificial Neural Networks on Spark. Implementation seems still young but is already possible to distribute the network over a Spark cluster.

mastering-apache-spark

Mastering Apache Spark

[UPDATE 2016-01-31]

Deep Learning 
Vincent Vanhoucke, Google, Udacity

Google, a few days ago, releases on Udacity a Deep Learning course focused on TensorFlow, its deep learning tool. It’s the first course officially sponsored by a big companym is free and seems a great introduction. Thanks to Piotr Chromiec for pointing 🙂

deep-learning-google

I always like The Setup. Discover what kind of technologies, hardware and softwares other skilled people are using is extremely useful and really fun for me. This time I’d like to share some tips from the complete reboot I did to my personal ecosystem after switch to my new Macbook.

macbook_pro_13_retina

From the hardware side is a simple high-end 2015 Macbook Pro 13″ Retina with Intel Core i7 Haswell dual-core at 3,4GHz, 16GB of RAM and 1TB of SSD PCI Express 3.0. Is fast, solid, lightweight and flexible. The only required accessory is the Be.eZ LArobe Second Skin.

From the software side I decided to avoid Time Machine restore in order to setup a completely new environment. I started on a OS X 10.10 Yosemite fresh installation.

As polyglot developer I usually deal with a lot of different applications, programming languages and tools. In order to decide what top install, a list of what I had on the previous machine and what I need more was really useful.

Here is a list of useful software and some tips about the installation process.

Applications

paid_apps

Paid softwares worth having: Evernote (with Premium subscription and Skitch) and Todoist (with Premium subscription) both available on the Mac App Store. 1Password, Fantastical 2, OmniGraffle, Carbon Copy Cloner, Backblaze and Expandrive available on their own websites.

Free software worth having: Google Chrome and Mozilla Firefox as browser, Apache OpenOffice, Skype and Slack as chat, VLC for multimedia and Transmission for torrents.

app_from_suites

Suites or part of: Adobe Photoshop CC, Adobe Illustrator CC and Adobe Acrobat Pro DC are part of the Adobe Creative Cloud. Microsoft Word 2016, and Microsoft Excel 2016 are part of Microsoft Office 2016 for Mac (now in free preview). Apple Pages, and Apple Keynote are preinstalled as Apple iWork suite as well as Apple Calendar and Apple Contacts.

Development tools

Utilities for Power Users: Caffeine, Growl and HardwareGrowler, iStat Menu Pro, Disk Inventory X, Tor Browser and TrueCrypt 7.1a (you need to fix a little installation bug on OS X 10.10), Kinematic and Boot2Docker for Docker, Sublime Text 3 (with some additions like: Spacegray Theme, Soda Theme, a new icon, Source Code Pro font), Tower, Visual Studio Code, Android SDK (for Android emulator) and XCode (for iOS emulator), VirtualBox (with some useful Linux virtual images), iTerm 2.

CLI: OhMyZSH, Homebrew, GPG (installed using brew), XCode Command Line Tools (from Apple Developers website), Git (with git-flow installed using brew), AWS CLI (install via pip), PhantomJS, s3cmd and faster s4cmd, Heroku toolbelt and Openshift Client Tools (install via gem).

daemons

Servers: MariaDB 10.0 (brew), MongoDB 3.0 (brew), Redis 3.0 (brew), Elasticsearch 1.6 (brew), Nginx 1.8.0 (brew), PostgreSQL 9.4.2 (via Postgres.app), Hadoop 2.7.0 (brew), Spark 1.4 (download from official website), Neo4j 2.2 (brew), Accumulo 1.7.0 (download from official website), Crate 0.49 (download from official website), Mesos 0.22 (download from official website), Riak 2.1.1 (brew), Storm 0.9.5 (download from official website), Zookeeper 3.4.6 (brew), Sphinx 2.2 (brew), Cassandra 2.1.5 (brew).

languages

Programming languages: RVM, Ruby (MRI 2.2, 2.1, 2.0, 1.9.3, 1.8.7, REE 2012.02, JRuby 1.7.19 installed using RVM), PHP 5.6 with PHP-FPM (installed using brew), HHVM 3.7.2 (installed using brew with adding additional repo, has some issues on 10.10), Python 2.7 (brew python) and Python 3.4 (brew python3), Pip 7.1 (shipped with Python), NVM, Node.js 0.12 and IO.js 2.3 (both installed using NVM), Go 1.4.2 (from Golang website), Java 8 JVM (from Oracle website), Java 8 SE JDK (from Oracle website), Scala 2.11 (from Scala website), Clojure 1.6 (from Clojure website), Erlang 17.0 (brew), Haskell GHC 7.10 (brew), Haskell Cabal 1.22 (brew), OCaml 4.02.1 (brew), R 3.2.1 (from R for Mac OS X website), .NET Core and ASP.NET (brew using DNVM), GPU Ocelot (compiled with a lot of libraries).

Full reboot takes about 2 days. Some software are still missing but I was able to restart my work almost completely. I hope this list would be helpful for anyone 🙂

python_logo

Yesterday, after a beautiful (and greedy) Christmas, I decided to start learning basic of new tech stuff. First choice was Python programming language because seems quite similar to Ruby, several friends are skilled and is widely used in Google and for Spark so is probably one of the best language to learn at the moment.

I’m quite fluent with Ruby, PHP and Javascript but I have no skills about Python. I choose CodeAcademy for a basic introduction and I’m at about 50% of the course. Here is what I learned by now.

Python is dynamic typed (you have not to declare it) and types are almost the same of Ruby and PHP:

my_int = 42
my_float = 108.0
my_string = "The answer to life, the universe and everything"

There is no parenthesis, no begin/end structure. Everything is related to indentation (I absolutely love it). Function are defined as follow:

def my_function(argument): # colon start a new indentation level
return argument * 2

Control structures are always the same. Conditionals:

if condition1:
return 1
# elif is like elsif in Ruby and elseif in PHP
elif condition2:
return 2
else:
return 3

Loops:

for item in my_list
print item

Instead Hash and Array the use Dictionaries and Lists but they are almost the same:

my_list = ["daniele", "luca", "michele"]
my_dictionary = {"marco": 1, "matteo": 7, "michele": 4}

I started a few hours ago and I’m just a newbie. I hope the rest of the course on CodeAcademy will teach me about more complex topics and, talking about real world usage, I probably need to learn how to handle version management and discover most powerful libaries. Anyway I can already say is pretty cool to program using Python 🙂

The big-data environment at the moment is really “collaborative”. Each project is ready to run on almost every available platform and this is good. However, recently two factions are forming: people who use the Hadoop 2.0 Stack and people who use the BDAS.

Hadoop 2.0 Stack

hadoop_stack

The most important difference between Hadoop 2.0 and previous versions is YARN, the new cluster resource manager and next generation MapReduce. It can run almost every kind of big-data project:

  • Traditional Map Reduce (new version is backward compatible) with Hive or Pig or Cascading for query.
  • Interactive near real-time Map Reduce (using Tez)
  • HBase and Accumulo
  • Storm and S4 for stream processing
  • Giraph for graph processing
  • OpenMPI for message passing
  • Spark as In-memory Map Reduce
  • HDFS as distributed filesystem
  • more…

Most interesting companies here are IntelCloudera, MapR and Hortonworks.

BDAS (Berkely Data Analytics Stack)

berkeley_stack

On the BDAS everything is built around Mesos: the cluster resource manager. It’a relative new project is already widely used. Traditional HDFS is accelerated by Tachyon (in-memory file system). The main integration is around Spark which is the base for:

Mesos can also run traditional Hadoop environment and other projects (such as Storm and OpenMPI). You can also run traditional applications (also Rails apps) using Marathon.

The most interesting companies here are Databricks and Mesosphere.

Who will win? 😀