Has been a long time since the last time I wrote on this blog. Many things are changed in my life since then. My journey at Curcuma wasn’t so happy as I hoped and after 6 months of hard work I left and joined the amazing team at Ernest.ai.

Ernest: Your financial coach

We are building a smart chatbot to help people managing personal finance. Currently, we are in closed beta (here you can sign.up to the waiting list). Team is distributed between London and Milan. Here is a beautiful photo taken during last meeting in London a couple of months ago.

The Ernest team, WeWork Old Street, London, December 2016.

During the last year, career path switch and parenting took all my time and chances to write vanished. Anyhow experiences I did allow me to learn a lot about Machine Learning, Artificial Intelligence, Conversational Interfaces, Chatbots, Functional and Reactive programming and many other exciting topics and now, the beginning of 2017, could be the right time to restart giving back to the community.

See you on this feed ūüėČ

Every time I attend a tech conference I usually meet interesting people and found new awesome technologies. It’s great. This year I attended 4 conference in a row (36 talks in 10 days)¬†and I started a new job a few days earlier. During May 2016 I discover dozens of new technologies and I’d like to do my part and “give back to the community” talking about them in a series of posts.

Here is what I discovered attending JSDay 2016 in Verona.

jsday-logo

Progressive Web Apps

Progressive Web Apps¬†“take advantage of new technologies to bring the best of mobile sites and native applications to users. They’re reliable, fast, and engaging.” Google says. They are web apps with a great offline experience. Here is a great list of examples.

Service Workers and Web Workers are modern way to run background services in browser using Javascript. Web Workers has better support, Service Workers has better features.

Physical Web is about the ability of the browser to interact with beacons and other external devices without require a native app. Web Bluetooth API gives browsers again more flexibility (the speaker drove a robot using Javascript in browser)

UpUp¬†helps you to make your progressive app to be “Offline First“. The idea is to¬†makes sure your users can always access your site’s content, even when they’re on a plane, in an elevator, or 20,000 leagues under the sea

general-theory-of-reactivity

Reactive Programming

Reactive Programming Paradigm is trendy by now. Cycle.js and ReactiveX takes Observer and Iterators on functional programming. Speakers talks a lot about them.

Traditional frameworks are going reactive thanks to extensions like Ngrx

Model View Intent¬†is going (probably) to replace previous MVC, MVP, MVVM, …

While JSX gains traction thanks to React.js, other solutions, like Hyperscript, are springing up.

mvc-mvi

Javascript for cross platform apps

Electron, NW.js and many other platforms make possibile to use javascript to build cross platform apps like Slack, Atom and Visual Studio Code.

Async Javascript

Asyncronous programming is hard. Javascript never make easier to do it but now things are getting better and better thanks to many new libraries. Fluorine can simply be though of as an abstraction or a DSL and is a structure of code in which you can manage complex asynchronous code with ease. Co do almost the same thing on Node.js.

Task.js takes concept of Generators and Promises to another level defining the concept of Task.

ES7 will close the circle using async and await with the pleasure of native implementation.

async-await

chrome-canary-512Debugging in Chrome Canary

The Bleeding edge version of Chrome, Canary, offers you several beautiful beta features like an integrated layout editor able to edit SASS and works with Rails, a powerful animation inspection and enable you to emulate network connectivity and speed.

Most of these feature are available inside Chrome Workspaces.

Also Node.js stuff could be debugged using Chrome engine thank to Iron-node. If you prefer Firefox engine, Webkit, look at Node-inspector.

Misc

I also discover Modernizr and his stunning new website, Can I Use tells you how much is a technologies available on browser, ZeroClipboard and Clipboard.js makes copy and paste easier, Hapi.js is an interesting framework and Highland a powerful stream library for Node.js. I also discovered that I can use WebGL for data processing in browser.

In the end a lot of new discoveries at JSDay 2016 ūüėČ

The answer to question “what’s the best things to do now?” has always been really important for me.

I have a really busy life and prioritisation is critical in order to accomplish everything and save some free time. Handle the huge amount of personal data (mails, messages, chats, documents, ideas, …) I¬†receive everyday and convert it to useful and usable information is essential but hard. Choose right tools is great¬†starting point. Here is¬†mine and why I’m using them.

time-management

Why

Many people think to personal and work data as two different environment. I don’t.

I don’t think separate environment¬†is a good idea because everything we do could be modeled¬†as a task: work on a software is a task but also play with your son is and also sleep, go to work, go out with your partner¬†and read a book.

Each of these task require a timeframe (that could be scheduled or not) and some data. For instance you need date, hour and restaurant name to go out with your partner. You can decide to allocate a given timeframe for personal life and another for work but you are simply scheduling. Activities are always the same. Use data and accomplish tasks. This data usually arrives from someone else or from another task.

When you receive any communication (verbal or by message/chat) or find some new information making something else you can categorize what you get into three simple categories:

  • Useless:¬†Isn’t useful (spam mails, tv spots, boring messages, boring people…) so¬†you can ignore/trash it.
  • Now: Is useful and can be managed within 2 minute (mail with simple response, messages that require information about something you already know, a colleague asking you anything vis a vis) so you can do it.
  • Later: Is useful but you¬†don’t need¬†it at the moment (phone numbers, interesting information, ideas) or can’t manage within 2 minutes (tasks, projects, structured questions, …) so you need to store them.

Store these information is critical because if you store them in the right way you will be faster in task execution and better in prioritization.

Faster execution means less time for single task, more task accomplished and more free time for you. Optimal prioritization means you accomplish right tasks in the right time within deadlines.

If you think of how many task you do everyday you could understand why do it the right way is a good idea.

time-management-01

Started from this idea I split information to store into 3 categories:

  • To-do: something you can do
  • Appointments: something you can do in a specific date and time
  • Information: some information useful for you

Each of these category could be handled using best tools.

Tools

To-do list

todoist-logoA place where to store a list of task you need to do. When the list is small, a paper note or a text file are enough. When you realize your life is really busy, something with projects, prioritization, notes, remineders and deadlines became useful.

In the past I used Things for many years but now I use Todoist and I won’t go back.

todoist-screenshot

A tons of products are available but Todoist is the best. Has great feature and is available everywhere (web, OS X, Windows, iPhone, Android, Chrome, Firefox and more). Handle syncronizations and backup gracefully and I love its interface.

Calendar

fantastical-logoA place where to store your scheduled appointments. A few year ago a paper personal organizer inside your backpack was everything you need. Now we usually use tools offered by our favorite OS provider (Apple iCloud, Google App, Microsoft Outlook). Each software is able to integrate calendar from other providers and I used Apple Calendar for years. A few days ago I switched to Fantastical2.

fantastical2-screenshot

Fantastical2 (not to be confused with Fantastical, previous version with less features) is really similar to Apple Calendar but has a few relevant addition that worth the price:

  • Appointments recap¬†on left sidebar.
  • Adjustable font size and flexible size of hours in weekly view (against all-day events).
  • Calendar sets.

Before the switch I tested Sunrise Calendar who has a great online interface but it doesn’t offer something really better than Apple Calendar.

Notes

evernote-logoA place where to store every information you could need¬†now or in the future. Evernote is my choice for everyday use but, in the past, I experienced¬†several problems: sync was slow and conflicts were frequent, GUI¬†wasn’t easy to use and web interface was a mess. Now, after a couple of years of active development, everything seems better.

evernote-web-screenshot
google-drive-logoThe ability to integrate documents and edit them in place is still limited so I also use Google Docs for documents and PDF I want to store.

More

Todoist, Fantastical2 and Evernote help me to accomplish almost all the management work required by my everyday life. Anyway a couple of softwares are really useful too in addition to these:

1password-logo1Password is the best place where to store your password and relevant informations and could be synced over any of your devices (OS X, iOS, Android, Windows)

pocket-logoPocket can act as funnel for every interesting article find on Feedly or social networks and has great text-to-speech functionalities.

 

neural-network

Deep Learning is a trending buzzword in the Machine Learning environment. All the major players in Silicon Valley are heavily investing in these topics and US universities are improving their courses offer.

I’m really interested in artificial intelligence both for fun and for work and I spent a few hours in the last weeks searching for best MOOCs about this topic. I found only a few courses but they are from the most notable figures in Deep Learning and Neural Networks environment.

Machine Learning
Stanford University on Coursera, Andrew Ng

Andrew Ng is Chief Scientist at Baidu Research since 2015, founder of Coursera and Machine Learning lecturer at Stanford University. He also founded the Google Brain project in 2011. His Machine Learning (CS229a) course at Stanford is quite mythical and, obviously, was my starting point.

machine-learning-ng

Machine Learning, Coursera

Neural Networks for Machine Learning
University of Toronto on Coursera, Geoffrey Hinton

Geoffrey Hinton is working at Google (probably on Google Brain) since 2013 when Google acquire his company DNNResearch Inc. He is a cognitive psychologist most noted for his work on artificial neural networks. His Coursera course on Neural Networks is related to 2012 but seem to be one of the best resource about these topics.

neural-networks-for-machine-learning

Neural Networks for Machine Learning, Coursera

Deep Learning (2015)
New York University on TechTalks, Yann LeCun (videos on techtalks.tv)

In 2013 LeCun became the first director of Facebook AI Research. He is well known for his work on optical character recognition and computer vision using convolutional neural networks (CNN), and is a founding father of convolutional nets. 2015 Deep Learning course at NYU is the last course about this topic hold by him.

Yann LeCun. CIFAR NCAP pre-NIPS' Workshop. Photo: Josh Valcarcel/WIRED

Yann LeCun. CIFAR NCAP pre-NIPS’ Workshop. Photo: Josh Valcarcel/WIRED

Big Data, Large Scale Machine Learning
New York University on TechTalks, John Langford and Yann LeCun

Another interesting course about Machine Learning hold by LeCun and John Langford, researcher at Yahoo Research, Microsoft Research and IBM’s Watson Research Center.

langford

John Langford, NYU

Deep Learning Courses
NVIDIA Accelerated Computing

This is not a college course. NVIDIA was one of the most important graphic board manufacturer in the early 2000s and now, with the experience of massive parallel computer on GPUs, is heavily investing in Deep Learning. This course is focused on usage of GPUs on most common deep learning framework: DIGITS, Caffe, Theano and Torch.

deep-learning-course

Deep Learning Courses, NVIDIA

Mastering Apache Spark
Mike Frampton, Packt Publishing

Last summer I had the opportunity to collaborate in review of this title. Chapter about MLlib contains a useful introduction to Artificial Neural Networks on Spark. Implementation seems still young but is already possible to distribute the network over a Spark cluster.

mastering-apache-spark

Mastering Apache Spark

[UPDATE 2016-01-31]

Deep Learning 
Vincent Vanhoucke, Google, Udacity

Google, a few days ago, releases on Udacity a Deep Learning course focused on TensorFlow, its deep learning tool. It’s the first course officially sponsored by a big companym is free and seems a great introduction. Thanks to¬†Piotr Chromiec for pointing ūüôā

deep-learning-google

phpday_logoI used to refer to¬†me as¬†“PHP Developer” for a long time. More or less from the beginning of my career to 2011 when I moved to Ruby as main language. Last time I worked for real using PHP, tools were still uncomfortable and community was huge but still messy. 4 years later everything seems changed. Several important companies (including Facebook) leverage on it and both core language and most popular¬†tools are¬†improved a lot.

Things like HHVM, Hack, FIG, Composer, PHP7 many more are rapidly evolving the PHP landscape so I decided to attend the PHPDay 2015 to meet the Italian community refresh my knowledge.

It was a really interesting event. I had the opportunity to meet several awesome¬†people and chat with them about almost everything (quite often chats happened on the grass of the beautiful location in Verona ūüôā )

phpday_attendees

Davey Shafik (@dshafik), a funny guy from Engine Yard, talks about PHP7 and HHVM as major improvements for the core of PHP. Enrico Zimuel (@ezimuel), core developer of Zend Framework, talks about ZF3. Steve Maraspin (@maraspin) talks about his experience with async PHP and parallel processing. Bernhard Schussek (@webmozart), core developer of Symfony, talks about Puli: a framework agnostic tool fro resource sharing.

I strongly believe PHP is changed. It included¬†many “good parts” from other languages and is now ready to became, with the Facebook endorsement, a first class language.

See you next year at PHPDay 2016 ūüėÄ

c2_logoA few weeks ago I had the privilege to attend C2 Spark conference in Milan. It¬†is a¬†“business conference somewhere between genius and insanity by Sid Lee, Cirque du Soleil, Fast Company and Microsoft” and try to mix Commerce and Creativity.¬†It started¬†in Montreal a few years ago and this was the first time in Europe (Zurich and Milan).

Talks were awesome and people were awesome too. Is quite strange for me meet non tech people and I really enjoy the day. I learned a lot of things attending the conference. Here the most fascinating ones.

Technology will change everything, again.

David Rose, scientist at MIT Media Lab and author of “Enchanted Objects” talks about the way we imagined the future. Internet of Things will be a huge¬†opportunity and a lot of products are already here. Here is the “periodic table” of the Enchanted Objects¬†who David shown us.

enchanted_objects_poster

Microsoft has enough money to reboot its business

Ten year ago Microsoft make a lot of billions on Windows and Office. Now Windows and Office worth nothing and Microsoft is trying to reboot its business with cloud (Windows Azure), mobile (Nokia and Surface) and wearable (Microsoft Band). Carlo Purassanta, CEO of Microsoft Italia, was clear: Microsoft is changing. He first ran for the conference wearing a Microsoft Band.

carlo_purassanta

Non tech people are awesome

I never had great respect for the non-technical people. “I can change the world, they can’t“,¬†a lot of modesty and respect. I grow up inside a really close environment made by nerds and geeks and I always thought non-technical people¬†have nothing to give me. I was wrong. I was absolutely wrong. At the event people came¬†from different fields:¬†diplomacy, medicine, sales, biology, advertising, marketing, law and more. Each of them has enriched me in some way. Non tech people are awesome ūüôā

non-tech-people

Creativity could be an analytical process

Sid Lee, creativity firm behind C2 Spark describes the process behind its most successful advertising campaigns. A lot of myths about creativity are just myths and, with the right process, anyone could express his creativity.

creativity_process

Jump on table to move them is definitely cool

Time between talks and workshops where staff move tables is usually boring. At C2 Spark tables were moved by a parkour crew jumping and dancing around the room. It was absolutely useless by definitely cool! ūüėÄ

c2_parkour-2 c2_parkour-4 c2_parkour-1 c2_parkour-3

[UPDATE 2014-12-27 21:10 CET]

Seems a lot of people at Microsoft liked my article ūüôā Carlo Purassanta (CEO at Microsoft Italy), Carlo Rinaldi (Digital Marketing Group Leader at Microsoft Italia) and Chiara Mizzi (CMO at Microsoft Italia) shared it:

I few days ago I have been at Codemotion in Milan and I had the opportunity to discover some insights¬†about technologies used by two of our main competitor in Italy: BlogMeter and Datalytics. It’s quite interesting because, also if technical challenges are almost the same, each company use a differente approach with a different stack.

datalytics_logo

Datalytics a is relatively new company founded 4 months ago. They had¬†a desk at Codemotion to show theirs products and recruit new people. I chatted with Marco Caruso, the CTO (who probably didn’t know who I am, sorry Marco, I just wanted to avoid¬†hostility ūüėČ ), about technologies they use and¬†developer profile they were looking for. Requires¬†skills was:

Their tech team is composed by 4 developers (including the CTO) and¬†main products are:¬†Datalytics Monitoring‚ĄĘ (a sort of statistical dashboard that shows¬†buzz stats in real time) and¬†Datalytics Engage‚ĄĘ (a real time analytics dashboard for live events). I have no technical insights about how they systems works but I can guess some details inferring them from the¬†buzz words they use.

Supported sources are Twitter, Facebook (only public data), Instagram, Youtube, Vine (logos are on their website) and probably Pinterest.

They use DataSift as data source in addition to standard APIs. I suppose their processing pipeline uses Storm to manage streaming input, maybe with an importing layer before. Data is crunched using Hadoop and Java and results are stored on MongoDB (Massimo Brignoli, Italian MongoDB evangelist, advertise their company during his presentation so I suppose they largely use it).

Node.js should be used for frontend. Is fast enough for near real time application (also using websockets) and play really well both with Angular.js and MongoDB (the MEAN stack). D3.js is obviously the only choice for complex dynamic charts.

I’m not so happy when I discover a new competitor in our market segment. Competition gets harder and this is not fun. Anyway guys at Datalytics seems smart (and nice) and compete with them would be a pleasure and will push me to do my best.

Now I’m curios to know if Datalytics is monitoring buzz on the web around its company name. I’m going to tweet about this article using #Datalytics hashtag. If you find this article please tweet me “Yes, we found it bwahaha” ūüėõ

[UPDATE 2014-12-27 21:18 CET]

@DatalyticsIT favorite my tweet on December 1st. This probably means they found my article but the didn’t read it! ūüėÄ

This is the first public paper, published by NASA, that mention the word “Big Data“. Actually it’s not related to data processing but is like the beginning for this funny buzz word ūüôā


nasa_logo

Title: Application-Controlled Demand Paging for Out-of-Core Visualization (PDF), 1997
Authors: Michael Cox, David Ellsworth

Abstract

In the area of scientific visualization, input data sets are often very large. In visualization of Computational Fluid Dynamics (CFD) in particular, input data sets today can surpass 100 Gbytes, and are expected to scale with the ability of supercomputers to generate them. Some visualization tools already partition large data sets into segments, and load appropriate segments as they are needed.

However, this does not remove the problem for two reasons: 1) there are data sets for which even the individual segments are too large for the largest graphics workstations, 2) many practitioners do not have access to workstations with the memory capacity required to load even a segment, especially since the state-of-the-art visualization tools tend to be developed by researchers with much more powerful machines. When the size of the data that must be accessed is larger than the size of memory, some form of virtual memory is simply required. This may be by segmentation, paging, or by paged segments.

In this paper we demonstrate that complete reliance on operating system virtual memory for out-of-core visualization leads to egregious performance. We then describe a paged segment system that we have implemented, and explore the principles of memory management that can be employed by the application for out-of-core visualization.

We show that application control over some of these can significantly improve performance. We show that sparse traversal can be exploited by loading only those data actually required. We show also that application control over data loading can be exploited by 1) loading data from alternative storage format (in particular 3-dimensional data stored in subcubes), 2) controlling the page size.

Both of these techniques effectively reduce the total memory required by visualization at run-time. We also describe experiments we have done on remote out-of-core visualization (when pages are read by demand from remote disk) whose results are promising.


Check out the list of interesting papers and projects (Github).

After GFS and MapReduce, Google solve again big data problems designing BigTable: a compressed, high performance, and proprietary data storage system that forms the basis for most of its projects. HBase and Cassandra are inspired from it.


google_logo

Title: Bigtable: A Distributed Storage System for Structured Data (PDF), November 2006
Authors: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.


Check out the list of interesting papers and projects (Github).

For several years I thought MapReduce was the only paradigm for distributed data processing. Only a few month ago, watching “Clash of the Titans Beyond MapReduce” (a really interesting talk at The Hive meetup)¬†Matei Zaharia (@matei_zaharia), CTO of Databricks and co-creator of Spark, cited Dryad, a programming paradigm developed by Microsoft. I don’t know any open project which implement it but was widely used by Microsoft Research.


microsoft_research_logo

Title: Dryad: Distributed Data-Parallel Programs from Sequential
Building Blocks (PDF), March 2007
Authors: Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly

Abstract

Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational ‚Äúvertices‚ÄĚ with communication ‚Äúchannels‚ÄĚ to form a dataflow graph.

Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through files, TCP pipes, and shared-memory FIFOs. The vertices provided by the application developer are quite simple and are usually written as sequential programs with no thread creation or locking. Concurrency arises from Dryad scheduling vertices to run simultaneously on multiple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses to make efficient use of the available resources.

Dryad is designed to scale from powerful multi-core single computers, through small clusters of computers, to data centers with thousands of computers. The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.


Check out the list of interesting papers and projects (Github).