message-in-a-bottle

When I was young (yes, now I’m old ūüėÄ ) the only messaging feature available on my phone was SMS texting: 160 chars, 10 spaces on my SIM card.

15 years later I’m impressed by the number of messaging app I’m using on daily basis. You can find below the list of messaging apps I used at least once in the last week.

Skype
Old but gold. Now is backed by Microsoft and has more problem than years before but it’s still the most used video call app. Most of my customer ask for Skype handle¬†for chatting and also some my friends use it at work.

Skype Screenshot

WhatsApp
Most of my non geek friends still uses this app. Facebook paid $22B to acquire the company. Only half of them move to Telegram when WhatsApp became paid for Android users. Web interface, now available also for iOS, OS X and Chrome extension is really useful.

whatsapp_screenshot

Telegram
Most of my geek friends use this app. Is really similar to WhatsApp but uses an open source protocol, encryption is claimed as better than competitors. You can also use it server side. Meh…

telegram_screenshot

 

Slack
My boss want it¬†for replacing Skype last spring. I actually don’t like the UI and the price is exaggerated but, after some months of testing, works quite well.

slack_screenshot

HipChat
One of our external collaborator create a private HipChat channel because he hasn’t an internal email address and wasn’t able to join company’s Slack. After a few days of use,¬†seems really similar to Slack but is almost free. Crew.co tests both and choose Slack.

hipchat_screenshot

Google Talk/Hangout
The best alternative to Skype for video calls. Actually used only for setup video calls.

google_hangout

Facebook Messages
For people¬†I want to contact only on Facebook. Really useful in combination with other Facebook features (events, birthdays, …) also support a lot of external integration (and more are coming). Probably one of the best player in the coming years.

facebook_messages_screenshot

 

Twitter Direct Messages
Limited to 140 chars until June 2015 are now a valid alternative to Facebook Messages (Twitter extend the limit a few weeks ago). I use them rarely only with a couple of contacts who aren’t connected on Facebook.

twitter_direct_messages_screenshot

Linkedin Messages
With recent update (last week for my account) they are more similar to a messaging app nor an email client. Now recruiters what to chat with you and this is quite noisy.

linekdin_messages_screenshot

Apple Messages
The modern alternative to SMS. Only for Apple users.

apple_messages_screenshot

What’s the best one? I have no idea! Anyway there is a funny thing to notice: I can use them in an easy and transparent way both on my smartphone and on my notebook even better than I was able to use SMS 15 years ago.

Many people are worried by the use of several different channels for communication. “NO NO don’t add me on Telegram, we are already connected on WhatsApp! I can’t handle this!” they say.

IMHO there is no pain in using a lot of different channel because user experience is improved and now we need to use only one channel: our “digital identity” logged in on every device. Complexity is hidden by designers.

Is the interconnected world, is here and we are already part of that.

infrastructure-update

Everything started on Heroku in October 2012 over their dynos with Heroku Postgres and continued on OpenShift in August 2013 over a LAMP stack based on Apache 2.4, PHP 5.3 and MySQL 5.1.

Now it’s time to to move my little blog on a modern stack. Best offer on¬†OpenShift is a variation of the standard LEMP (we can call it: LEMP-HH) stack with¬†HHVM 3.8, MariaDB 5.5 over¬†NGINX¬†1.7.

lemphh-stack

Actually biggest performances improvement was achieved adding a good cache plugin a few months ago. I always used W3 Total Cache and WP Super Cache but, in this specific case, they are both complex to use because of the structure of OpenShift stack. Best solution I found is WP Fastest Cache plugin, one of the latest cache plugin I tested. Here is the stunning header of their website showing two beautiful cheetahs (are they cheetahs?).

wp-fastes-cache

Anyway coming back on new stack, there is no official bundle¬†yet but you can create a new application using tengyifei’s¬†HHVM 3.8 cartridge and adding¬†OpenShift MariaDB 5.5 cartridge. I wasn’t able to run them on different gears (with scaling option activated and HAProxy) but seems fast enough on a single gear.

Filesystem structure is similar to the standard PHP bundle except for the application dir that is named www/ instead of php/. I used last backup from UpdraftPlus to migrate database on MariaDB. On non scalable applications you need to forward port in order to access DB from your local machine. RHC command is:

rhc port-forward -a application-name

Source here: Getting Started with Port Forwarding on OpenShift

Moving on NGINX also causes problems on permalinks because .htaccess doesn’t work anymore. The Nginx Helper plugin fix the problem but you could simply add a couple of row to NGINX configuration located in /config/nginx.d/default.conf.erb.

# Handle any other URI
location / {
try_files $uri $uri/ /index.php?q=$request_uri;
}

Discussion on WordPress support forum: WordPress Permalinks on NGINX

Refactor of previous filesystem, migration of database and bugfix of permalinks and other stuff takes about 2 hours and, at the end, everything seems working fine. I’m quite confident this a future proof solution but I’m going to test it until next major update ūüôā

[UPDATE 2015-09-06 21:56 CEST]

After migration sitemap_index.xml and robots.txt weren’t reachable. Some rules were missing. I took the opportunity to switch to Yoast SEO for sitemap, Facebook open graph and Twitter cards. Then, these rules fix problems with SEO.

# Rewrites for WordPress SEO XML Sitemap
rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;
# Rewrites for robots.txt
rewrite ^/robots\.txt$ /index.php?robots=1 last;

c2_logoA few weeks ago I had the privilege to attend C2 Spark conference in Milan. It¬†is a¬†“business conference somewhere between genius and insanity by Sid Lee, Cirque du Soleil, Fast Company and Microsoft” and try to mix Commerce and Creativity.¬†It started¬†in Montreal a few years ago and this was the first time in Europe (Zurich and Milan).

Talks were awesome and people were awesome too. Is quite strange for me meet non tech people and I really enjoy the day. I learned a lot of things attending the conference. Here the most fascinating ones.

Technology will change everything, again.

David Rose, scientist at MIT Media Lab and author of “Enchanted Objects” talks about the way we imagined the future. Internet of Things will be a huge¬†opportunity and a lot of products are already here. Here is the “periodic table” of the Enchanted Objects¬†who David shown us.

enchanted_objects_poster

Microsoft has enough money to reboot its business

Ten year ago Microsoft make a lot of billions on Windows and Office. Now Windows and Office worth nothing and Microsoft is trying to reboot its business with cloud (Windows Azure), mobile (Nokia and Surface) and wearable (Microsoft Band). Carlo Purassanta, CEO of Microsoft Italia, was clear: Microsoft is changing. He first ran for the conference wearing a Microsoft Band.

carlo_purassanta

Non tech people are awesome

I never had great respect for the non-technical people. “I can change the world, they can’t“,¬†a lot of modesty and respect. I grow up inside a really close environment made by nerds and geeks and I always thought non-technical people¬†have nothing to give me. I was wrong. I was absolutely wrong. At the event people came¬†from different fields:¬†diplomacy, medicine, sales, biology, advertising, marketing, law and more. Each of them has enriched me in some way. Non tech people are awesome ūüôā

non-tech-people

Creativity could be an analytical process

Sid Lee, creativity firm behind C2 Spark describes the process behind its most successful advertising campaigns. A lot of myths about creativity are just myths and, with the right process, anyone could express his creativity.

creativity_process

Jump on table to move them is definitely cool

Time between talks and workshops where staff move tables is usually boring. At C2 Spark tables were moved by a parkour crew jumping and dancing around the room. It was absolutely useless by definitely cool! ūüėÄ

c2_parkour-2 c2_parkour-4 c2_parkour-1 c2_parkour-3

[UPDATE 2014-12-27 21:10 CET]

Seems a lot of people at Microsoft liked my article ūüôā Carlo Purassanta (CEO at Microsoft Italy), Carlo Rinaldi (Digital Marketing Group Leader at Microsoft Italia) and Chiara Mizzi (CMO at Microsoft Italia) shared it:

I few days ago I have been at Codemotion in Milan and I had the opportunity to discover some insights¬†about technologies used by two of our main competitor in Italy: BlogMeter and Datalytics. It’s quite interesting because, also if technical challenges are almost the same, each company use a differente approach with a different stack.

datalytics_logo

Datalytics a is relatively new company founded 4 months ago. They had¬†a desk at Codemotion to show theirs products and recruit new people. I chatted with Marco Caruso, the CTO (who probably didn’t know who I am, sorry Marco, I just wanted to avoid¬†hostility ūüėČ ), about technologies they use and¬†developer profile they were looking for. Requires¬†skills was:

Their tech team is composed by 4 developers (including the CTO) and¬†main products are:¬†Datalytics Monitoring‚ĄĘ (a sort of statistical dashboard that shows¬†buzz stats in real time) and¬†Datalytics Engage‚ĄĘ (a real time analytics dashboard for live events). I have no technical insights about how they systems works but I can guess some details inferring them from the¬†buzz words they use.

Supported sources are Twitter, Facebook (only public data), Instagram, Youtube, Vine (logos are on their website) and probably Pinterest.

They use DataSift as data source in addition to standard APIs. I suppose their processing pipeline uses Storm to manage streaming input, maybe with an importing layer before. Data is crunched using Hadoop and Java and results are stored on MongoDB (Massimo Brignoli, Italian MongoDB evangelist, advertise their company during his presentation so I suppose they largely use it).

Node.js should be used for frontend. Is fast enough for near real time application (also using websockets) and play really well both with Angular.js and MongoDB (the MEAN stack). D3.js is obviously the only choice for complex dynamic charts.

I’m not so happy when I discover a new competitor in our market segment. Competition gets harder and this is not fun. Anyway guys at Datalytics seems smart (and nice) and compete with them would be a pleasure and will push me to do my best.

Now I’m curios to know if Datalytics is monitoring buzz on the web around its company name. I’m going to tweet about this article using #Datalytics hashtag. If you find this article please tweet me “Yes, we found it bwahaha” ūüėõ

[UPDATE 2014-12-27 21:18 CET]

@DatalyticsIT favorite my tweet on December 1st. This probably means they found my article but the didn’t read it! ūüėÄ

data_science

During the last year I refined my RSS collection about big-data, data science and analytics. I usually check it everyday in order to discover a ton of new cool technologies and have fun. Here is the updated list.

Bloggers

News about emerging technologies, scalability and data

Data companies, social networks and search engines

Companies supporting e distributing big-data processing products

Recently I discovered the awesome data science list that contains a list of interesting blogger I haven’t time to check yet. You can surely find something more in it. I’ll try to publish an update when I’ll check it.

[UPDATE 2014-09-22 11:35]

Thanks to @onurakpolat for correcting my link to awesome data science list. Previous link was to his fork, the original repo is https://github.com/okulbilisim/awesome-datascience by @okulbilisim

hiphop_logoHipHop was one of the most notable thing came from the Facebook labs about PHP development. PHP is slow and limited. They can’t rewrite theirs entire codebase so they decided to make PHP better. HipHop is a simply PHP to C++ compiler (HPHPc). Converted code is compiled into a binary and performance improvements are about¬†6x.

Unfortunately HipHop has several downsides. For all the performance gains that HPHPc provided, the curve for further performance improvements had flattened. HPHPc did not fully support the PHP language, including the create_function() and eval() constructs. HPHPc required a very different push process, requiring a bigger than 1 GB binary to be compiled and distributed to many machines in short order.

hhvm_logoTo overcome these problems Facebook develops, starting from early 2010, the HHVM: a PHP virtual machine. HHVM builds on top of HPHPc, using the same runtime and extension function implementations. HHVM converts PHP code into a high-level bytecode. This bytecode is then translated into x64 machine code dynamically at runtime by a just-in-time (JIT) compiler similarly to C#/CLR or Java/JVM.

hack_logoFacebook also released Hack, a programming language for HHVM that can be seen as a new version of PHP which it allows programmers to use both dynamic typing and static typing.

HHVM supports major PHP open source projects like WordPress. Running this project on seems really easy. A little modification was needed but last version (3.9) no longer need this. HHVM can also run on Heroku using a custom buildpack available here: https://github.com/hhvm/heroku-buildpack-hhvm.

My first experiment was to run WordPress on Heroku using HHVM. First step is create a Heroku app using HHVM buildpack:

heroku create --buildpack https://github.com/hhvm/heroku-buildpack-hhvm

Then you can deploy a standard WordPress installation adding the following config.hdf (the HHVM configuration file)

Server {
DefaultDocument = index.php
}
Eval {
Jit = true
}
VirtualHost {
* {
Pattern = .*
RewriteRules {
dirindex {
pattern = ^/(.*)/$
to = $1/index.php
qsa = true
}
}
}
}
StaticFile {
FilesMatch {
* {
pattern = .*.(dll|exe)
headers {
* = Content-Disposition: attachment
}
}
}
Extensions {
css = text/css
gif = image/gif
html = text/html
jpe = image/jpeg
jpeg = image/jpeg
jpg = image/jpeg
png = image/png
tif = image/tiff
tiff = image/tiff
txt = text/plain
}
}

Warning: don’t miss a newline character on the last line or linter will fail and you will going to hate this project ūüėČ

Everything works fine. You can add you favorite MySQL hosted service and run your WordPress 5 minutes installation. Almost every plugin seems 100% compatible, I tested most popular with no problem. Performances are better and you also have the opportunity to use Hack to develop new custom plugins.

Now I’m curious about how HHVM can improve my production installations of WordPress. About this I’m looking for an OpenShift cartridge for¬†HHVM or someone want to collaborate to create a new one (the only I found on Github seems “young”). Anyone interested? Let me know!

vkontakte_logo

Informations about VKontakte, the largest european social network, and its infrastructure are very few and fragmented. The only recent insights, in english, about its technology is a BTI’s press release which talks about VK migration on their infrastructure. Everything was top secret.

Only on 2011 at Moscow HighLoad++, Pavel Durov and Oleg Illarionov told something about the architecture of the social network and insights are collected into this post (in russian). 

VK seems not different than any other popular social network: is over a LAMP stack and uses many other open source technologies.

  • Debian is the base for their custom Linux distro.
  • nginx mange load balancing in front of Apache who runs PHP using mod_php and XCache as opcode cacher.
  • MySQL is the main datastore but a custom DBMS (written using C and based on memcached protocol) is used for some magics. memcached helps also page caching.
  • XMPP is used for messages and chats and runs over node.js. Availability is granted by HAProxy who handle the node’s fragility.
  • Multimedia files are stored using xfs and media encoding is made using ffmpeg.
  • Everything is distributed over more than 4 datacenters

vk_logoThe main difference betweek VK and other social network is about server functions: VK servers are multifunctional. There is no clear distinction between database servers or file servers, they are used simultaneously in several roles.

Load balancing between servers occurs on a layered circuit which includes at balancing DNS, as well as routing requests within the system, wherein the different servers are used for different types of requests. 

For example, microblogging is working on a tricky circuit using memcached protocol capability for parallel sending requests for data on a large number of keys. In the absence of data in the cache, the same request is sent to the storage system, and the results are subjected to sorting, filtering and discarding the excess at the level of PHP-code.

The custom database is still a secret and is widely used in VKontakte. Many services use it: private messages, messages on the walls, statuses, search, privacy, friends lists and probably more. It uses a¬†non-relational data model, and most operations are performed in memory.¬†Access interface is an advanced protocol memcached.¬†Specially compiled keys return the results of complex queries. They said is¬†developed “best minds” of Russia.

I wasn’t able to find any other insight about VK infrastructure after this speech. They are like KGB ūüėÄ

Recently I had to analyze interactions on a Facebook page. I need to fetch all the contents from the stream and analyze user actions. Retrive interactions count foreach post can be hard because Facebook APIs are like hell: they change very fast, return a lot of errors, have understandable limits and give you many headache.

Anyway after a lot of tries I found a way to fetch quantitative informations about posts and photos on the stream. First of all you need the contents.

Get the contents

Graph endpoint is: https://graph.facebook.com/. You can fetch page data (I use the BBCNews page as example) at:

https://graph.facebook.com/bbcnews/posts?access_token=your_access_token

To get a valid access token you have different ways and Facebook let you choose many different kind of access tokens, each one with a different rate limit.

Returned data is a JSON array of elements. Each elements has a lot of properties which describe items on the timeline. The returned element included into the stream has just a subset of these properties (last comments, last likes, some counters). Here you can find text content, pictures and links. To get more data you need three more properties: id, type and object_id.

Status updates are identified by type “status”,¬†Photos are identified by type “photo”¬†and¬†Video¬†by type “video”. The id field is used as identifier for the entry on the stream. The object_id instead is used to identify object inside the Facebook graph.

Actions: comments, likes and shares

Comments are returned paginated and sometimes APIs doesn’t return the entire list. To get the total count you need to specify the parameter summary=true.

https://graph.facebook.com/228735667216_10151700273382217/comments?summary=true&access_token=your_access_token

At the end of response you can find additional informations about comments feed. total_count displays the count.

"summary": {
"order": "ranked",
"total_count": 100
}

Likes are similar to comments. They have similar limitations and have a similar endpoint to retrive data with the same parameter summary=true.

https://graph.facebook.com/228735667216_10151700273382217/likes?summary=true&access_token=your_access_token

This time summary shows only total count:

"summary": {
"total_count": 949
}

Shares count can be found as part of the object detail.

https://graph.facebook.com/228735667216_10151700273382217/?access_token=your_access_token

After created and updated date you find shares property:

"shares": {
"count": 238
}

Convert the object_id

Depending on you data feed, sometimes id is not available and you have to handle the object_id. To be able to use previous methods you need to query the Facebook database using FQL looking for the story_id.

SELECT page_story_id
FROM photo
WHERE object_id = '10151700273362217'

https://api.facebook.com/method/fql.query?format=json&acces_token=#your_access_token&query=SELECT%20page_story_id%20FROM%20photo%20WHERE%20object_id%20%3D%20%2710151700273362217%27

The result is the page_story_id (the id of the post on the feed) of the object.

"data": [
{
"page_story_id": "228735667216_10151700273382217"
}
]

Now you can use this to retrieve counters and data.