After almost 4 years as CTO at The Fool, is time for me to search for new adventures. Starting from May 1st 2016 I’ll join the Curcuma team. Below, a picture from new office:

curcuma

Just joking 😛 Will be a big challenge for me because I’ll move from specific field, web and social network analysis, to general purpose development where projects are really varied: custom CMS, integration with IoT devices, mobile application and many others. In the end, a good challenge!

soliduseCommerce solutions are quite popular in Curcuma’s portfolio and my last experience about was in 2008 with an early version of Magento. I worked on similar products but I’m quite “rusty” on this topic. Starting from the Ruby ecosystem, default in Curcuma, only two realistic options are available: Spree (acquired by First Data and no longer supported) and Solidus (a Spree fork quite young but already interesting).

I searched for tutorials about Solidus but version 1.0.0 was shipped last August (and is based on Spree 2.4) and community is still young. I found only beginner’s tutorials so I decided to follow Github README instructions on master branch.

Install

Start with a fresh installation of Rails 4.2 (Rails 5.0 beta seems not supported yet), add gems and run bundle install

gem 'solidus'
gem 'solidus_auth_devise'

Inspecting Gemfile.lock you can find solidus dependencies:

solidus (1.2.2)
solidus_api (= 1.2.2)
solidus_backend (= 1.2.2)
solidus_core (= 1.2.2)
solidus_frontend (= 1.2.2)
solidus_sample (= 1.2.2)
solidus_auth_devise (1.3.0)

The solidus package seems a container for these modules. I really like this approach: is clean, encourages isolation and mask complexity. Also gemspec is the cleanest I’ve seen yet.

# encoding: UTF-8
require_relative 'core/lib/spree/core/version.rb'
Gem::Specification.new do |s|
s.platform    = Gem::Platform::RUBY
s.name        = 'solidus'
s.version     = Spree.solidus_version
s.summary     = 'Full-stack e-commerce framework for Ruby on Rails.'
s.description = 'Solidus is an open source e-commerce framework for Ruby on Rails.'
s.files        = Dir['README.md', 'lib/**/*']
s.require_path = 'lib'
s.requirements << 'none' s.required_ruby_version = '>= 2.1.0'
s.required_rubygems_version = '>= 1.8.23'
s.author       = 'Solidus Team'
s.email        = 'contact@solidus.io'
s.homepage     = 'http://solidus.io'
s.license      = 'BSD-3'
s.add_dependency 'solidus_core', s.version
s.add_dependency 'solidus_api', s.version
s.add_dependency 'solidus_backend', s.version
s.add_dependency 'solidus_frontend', s.version
s.add_dependency 'solidus_sample', s.version
end

Setup and config

Anyway next step on README is to run following rake tasks

bundle exec rails g spree:install
bundle exec rake railties:install:migrations

First one gives me a warning:

[WARNING] You are not setting Devise.secret_key within your application!
You must set this in config/initializers/devise.rb. Here's an example:
Devise.secret_key = "7eaa914b11299876c503eca74af..."

fires some actions related to assets, migrations and seeds then ask me for username and password. Standard install.

About the warning I found another post that recommend to run this task.

rails g solidus:auth:install

Is not clear to me what it does but seems working. After run warning is left.

Rake about migration (bundle exec rake railties:install:migrations) gives no output. I suppose migrations are already installed after first step. No idea.

Anyway last step liste on README is to run migrations (bundle exec rake db:migrate) and give no output too so everything seems ok.

No we can fire rails s and enjoy our brand new store 😀

solidus-screen

A bit more control

These steps are cool but do a lot of things we probably don’t want like install demo products and demo users. Following the README, installation can be run without any automatic step:

rails g spree:install --migrate=false --sample=false --seed=false

and then you are free to run any of the available step with your customizations

bundle exec rake railties:install:migrations
bundle exec rake db:migrate
bundle exec rake db:seed
bundle exec rake spree_sample:load

Now our new store is ready. It’s time to dig deeper into Solidus structure. See ya in another post bro 😉

data-backup

I don’t know if the expression “Digital Data Hypochondriac” is clear enough. They are people always worried to lose all their digital data. Every time any components of their digital ecosystem doesn’t work as expected they’re scared.

I’m one of them. In 2000, when I was 15, my huge 14 GB hard disk contains almost all my digital life. One day the HD controller burn (literally, because of a short circuit) and I lost everything. EVERYTHING 🙁

backup

Since that day I became “hypochondriac” about backups. Now, working with multiple huge projects and living as digital commuter I need a safe backup strategy able to handle almost any problem. Here is what I do.

My digital ecosystem is made by different “objects”: Macbook, iPhone, iPad and several online services (iCloud, Google Documents, GMail, Dropbox, Evernote, Todoist, 1Password, …).

For mobile devices and online services is available an online backup directly from provider (iCloud for Apple devices and application, Google Drive for Google services, …). Anyway, to be safe I take a snapshot of these services and devices twice a month.

My aggregated digital footprint is about 550 GB.

Everything single byte is on my Macbook. Obviously doing so, my notebook became a fucking single point of failure and this is not a good choice. Notebook backup strategy is on three different levels.

Apple Time Machine

time_machine

Everyday, during usual work, I run Time Machine in order to backup data on a WD My Passport Ultra 1TB. I choose this disk because use a single platter and hardware failures are less frequent. Backup happens incrementally about every hour and you can restore a single file from versioned snapshots.

my-passport-ultra

Backblaze

backblaze-storage-pod

backblaze-logoUnfortunately external hard disks are safe but are a piece of hardware I usually take with me during commuting and everything could happen. I can loose my bag, someone can stole it or I can broke the disk hitting something. Online backup is a great complement and Backblaze is a great provider. You just need to install their agent, generate encryption key and backup starts. Backup is encrypted client side so your data isn’t vulnerable to man-in-the-middle attack and is safe on servers based in US. If you lost your data you can download it as zip or buy an hard drive shipped to you containing you snapshot. 30 days of historic on your files are available.

Carbon Copy Cloner

carbon-copy-clonerYou can assume to be safe with Time Machine and Backblaze but they are both incremental backups. After 30 days or when the hard disk is full, old data is deleted. This is ok almost every time except when you absolutely need it. To be safe is better to take a monthly snapshot using Carbon Copy Cloner. It creates an mountable snapshot of the disk you can easily access. A 5TB hard drive could retain 9 or 10 months of historical data and you can archive them in an easy way.

Amazon Glacier

amazon-glacier-logo3 levels of backup are definitely enough. However some kind of data are really important for me. My Personal photos and documents related to my family, health and house is better tu have a long-term copy. Amazon Glacier is the right place for that. Pack these files per month, TAR them, calculate the checksum then upload them to a given bucket on S3 and configure lifecycle in order to archive on Glacier after 1 day. Pricing is between $0.70 and $1.10 a month for 100 GB.

newborn-star

This blog started on October 2012 as a place where to write about cool tech stuff I found online. Now after 3+ years of tech problems, a son, 3 infrastructure migrations (Heroku, OpenShift, OVH) and a lot of articles I decided to put all this stuff under a new domain: UsefulStuff.io

New server configuration (HHVM, PageSpeed, SPDY, Varnish and a ton of new technologies), new theme, new topics, all new 🙂

Learn it, then use it (again!)

 

About a month ago I wrote about how cool was to migrate to HHVM on OpenShift. Custom cartridge for Nginx + HHVM and MariaDB was running fast and I was really excited about the new stack. I was landing on a new, beautiful world.

About a week later I faced some problems because of disk space. Files took only 560MB over 1GB but OpenShift shell gave me an error for 100% disk usage (and a nice blank page on the home because cache couldn’t be written). I wasn’t able to understand why it was giving me that error. It probably depends on log files written by custom cartridge in other position inside of the filesystem. No idea. Anyway I had no time to go deeper so I bought 1 more GB of storage.

The day after I bought the storage blog speed goes down. It was almost impossible to open the blog and CloudFlare gives me timeout for half of the requests. Blog visits started to fall and I have no idea about how to fix that. Some weeks later I discover some troubles with My Corderwall Badges and Simple Sharer Button Adder but, in the OpenShift environment, I had no external caching system useful to handle this kind of problems.

I didn’t want to come back to MySQL and Apache but also trash all my articles wasn’t fun so I choose something I rejected 3 years ago: I took a standalone server.

server-in-datacenter

First choice was Scaleway. It’s trendy and is BareMetal. 3.5€ for a 4 core ARM (very hipster choice), 2 GB RAM, 50 GB SSD server. New interface is cool, better then Linode and Digital Ocean, server and resources are managed easily. Unfortunately HHVM is still experimental on ARM and SSD are on SAN, and they aren’t so fast (100MB/s).

Next choice was OVH. New 2016 VPS SSD (available in Canadian datacenters) are cheap enough (3.5$) and offer a virtual core Xeon with 2 GB RAM and 10 GB SSD. Multicore performances are lower and you have a lot of less storage but is an X86-64 architecture and SSD is faster (250 MB/s). I took this one!

Unfortunately my preferences aren’t changed since my first post. I’m still a developer, not a sysadmin. I’m not a master in Linux configuration and my stack has several running parts and my blog was still unavailable. My beautiful migration on cutting edge technologies became an emergency landing.

Luckily I found several online tutorial which explain how to master the WordPress stack. In the next days I completed the migration and my new stack now runs on: Pound, Varnish, Nginx, HHVM, PHP-FPM and MariaDB. I hope to have enough time in the coming days to publish all the useful stuff I used for configuration.

For the moment I’m proud to share average response time of the home page: 342ms 🙂

infrastructure-update

Everything started on Heroku in October 2012 over their dynos with Heroku Postgres and continued on OpenShift in August 2013 over a LAMP stack based on Apache 2.4, PHP 5.3 and MySQL 5.1.

Now it’s time to to move my little blog on a modern stack. Best offer on OpenShift is a variation of the standard LEMP (we can call it: LEMP-HH) stack with HHVM 3.8, MariaDB 5.5 over NGINX 1.7.

lemphh-stack

Actually biggest performances improvement was achieved adding a good cache plugin a few months ago. I always used W3 Total Cache and WP Super Cache but, in this specific case, they are both complex to use because of the structure of OpenShift stack. Best solution I found is WP Fastest Cache plugin, one of the latest cache plugin I tested. Here is the stunning header of their website showing two beautiful cheetahs (are they cheetahs?).

wp-fastes-cache

Anyway coming back on new stack, there is no official bundle yet but you can create a new application using tengyifei’s HHVM 3.8 cartridge and adding OpenShift MariaDB 5.5 cartridge. I wasn’t able to run them on different gears (with scaling option activated and HAProxy) but seems fast enough on a single gear.

Filesystem structure is similar to the standard PHP bundle except for the application dir that is named www/ instead of php/. I used last backup from UpdraftPlus to migrate database on MariaDB. On non scalable applications you need to forward port in order to access DB from your local machine. RHC command is:

rhc port-forward -a application-name

Source here: Getting Started with Port Forwarding on OpenShift

Moving on NGINX also causes problems on permalinks because .htaccess doesn’t work anymore. The Nginx Helper plugin fix the problem but you could simply add a couple of row to NGINX configuration located in /config/nginx.d/default.conf.erb.

# Handle any other URI
location / {
try_files $uri $uri/ /index.php?q=$request_uri;
}

Discussion on WordPress support forum: WordPress Permalinks on NGINX

Refactor of previous filesystem, migration of database and bugfix of permalinks and other stuff takes about 2 hours and, at the end, everything seems working fine. I’m quite confident this a future proof solution but I’m going to test it until next major update 🙂

[UPDATE 2015-09-06 21:56 CEST]

After migration sitemap_index.xml and robots.txt weren’t reachable. Some rules were missing. I took the opportunity to switch to Yoast SEO for sitemap, Facebook open graph and Twitter cards. Then, these rules fix problems with SEO.

# Rewrites for WordPress SEO XML Sitemap
rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;
# Rewrites for robots.txt
rewrite ^/robots\.txt$ /index.php?robots=1 last;

bedrock_big_logo During the last couple of weeks I had to work on a PHP project with a custom WordPress stack I have never used before: Bedrock.

The home page says “Bedrock is a modern WordPress stack that gets you started with the best development tools, practices, and project structure.“.

What Bedrock really is

It is a regular WordPress installation with a different folder structure and is integrated with composer for dependencies management and capistrano for deploy. The structure reminds Rails or similar frameworks but contains usual WordPress component and run on the same web stack.

├── composer.json
├── config
│   ├── application.php
│   └── environments
│       ├── development.php
│       ├── staging.php
│       └── production.php
├── vendor
└── web
├── app
│   ├── mu-plugins
│   ├── plugins
│   ├── themes
│   └── uploads
├── wp-config.php
├── index.php
└── wp

Server configuration

The project use to works on Apache with mod_php but I personally don’t like this stack. I’d like to test it on HHVM but at the moment I preferred to run it on nginx with PHP-FPM. Starting with an empty Ubuntu 14.04 installation I set up a LEMP stack with memcached and Redis using apt-get:

apt-get update
apt-get install build-essential tcl8.5 curl screen bootchart git mailutils munin-node vim nmap tcpdump nginx mysql-server mysql-client memcached redis-server php5-fpm php5-curl php5-mysql php5-mcrypt php5-memcache php5-redis php5-gd

Everything works fine except the Redis extension (used for custom function unrelated with WordPress). I don’t know why but the config file wasn’t copied into the configuration directory /etc/php5/fpm/conf.d/. You can find it among the available mods into /etc/php5/mods-available/.

PHP-FPM uses a standard configuration placed into /etc/php5/fpm/pool.d/example.conf. It listen on 127.0.0.1:9000 or unix socket in /var/run/php5-fpm-example.sock (I assume the configured name was “example”).

Memcached should be configured to be used for session sharing among multiple servers. To activate it you need to edit the php.ini configuration file setting the following parameters into /etc/php5/fpm/php.ini

session.save_handler = memcache
session.save_path = 'tcp://192.168.0.1:11211,tcp://192.168.0.2:11211'

nginx configuration is placed into /etc/nginx/sites-available/ and linked into /etc/nginx/sites-enabled/ as usual and forward request to PHP-FPM for PHP files.

server {
listen 80 default deferred;
root /var/www/example/htdocs/current/web/;
index index.html index.htm index.php;
server_name www.example.com;
access_log /var/www/example/logs/access.log;
error_log /var/www/example/logs/error.log;
location / {
try_files $uri $uri/ /index.php?q=$uri&$args;
}
location ~\.php$ {
try_files $uri =404;
# fastcgi_pass 127.0.0.1:9000;
fastcgi_pass unix:/var/run/php5-fpm-example.sock;
fastcgi_index index.php;
fastcgi_buffers 16 16k;
fastcgi_buffer_size 32k;
include fastcgi_params;
}
location ~ /\.ht {
deny all;
}
}

Root directory will be web/ of Bedrock prefixed with current/ to support the capistrano directory structure displayed below.

├── current -> /var/www/example/htdocs/releases/20150120114500/
├── releases
│   ├── 20150080072500
│   ├── 20150090083000
│   ├── 20150100093500
│   ├── 20150110104000
│   └── 20150120114500
├── repo
│   └── <VCS related data>
├── revisions.log
└── shared
└── <linked_files and linked_dirs>

Local configuration

I’m quite familiar with capistrano because of my Ruby recent background. You need a Ruby version greater then 1.9.3 to run it (RVM helps). First step is to download dependencies. Ruby uses Bundler.

# run it to install bundler gem the first time
gem install bundler
# run it to install dependencies
bundle install

Bundler read the Gemfile (and Gemfile.lock) and download all the required gems (Ruby libraries).

Now the technological stack is ready locally and on server 🙂
I’ll probably describe how to run a LEMP stack on OS X in a next post. For the moment I’m assuming you are able to run it locally. Here is useful guides by Jonas Friedmann and rtCamp.

Anyway Bedrock could run over any LAMP/LEMP stack. The only “special” feature is the Composer integration. Composer for PHP is like Bundler for Ruby. Helps developers to manage dependencies in the project. Here is used to manage plugins, themes and WordPress core update.

You can run composer install to install libraries. If you update libraries configuration or you want to force download of them (maybe after a fresh install) run composer update.

Deploy

Capistrano enable user to setup different deploy environment. A global configuration is defined and you need to specify only custom configuration for each environment. An example of /config/deploy/production.rb:

set :application, 'example'
set :stage, :production
set :branch, "master"
server '192.168.0.1', user: 'user', roles: %w{web app db}

Everything else is inherited from global config where are defined all the other deploy properties. Is important to say that deploy script of capistrano on Bedrock only download source code from Git repo and run composer install for main project. If you need to run in on any plugin you new to define a custom capistrano task and run it after the end of deploy. For instance you can add in the global configuration the following lines in order to install dependencies on a specific plugin:

namespace :deploy do
desc 'Rebuild Plugin Libraries'
task :updateplugin do
on roles(:app), in: :sequence, wait: 5 do
execute "cd /var/www/#{fetch(:application)}/htdocs/current/web/app/plugins/anything/ && composer install"
end
end
end
after 'deploy:publishing', 'deploy:updateplugin'

Now you are ready to deploy your Bedrock install on server!
Simply run cap production deploy, restart PHP-FPM (service php5-fpm restart) and enjoy it 😀

Many thanks to Giuseppe, great sysadmin and friend, for support during development and deploy of this @#@?!?@# application.

crate_logo

I usually don’t trust cutting edge datastore. They promise a lot of stunning features (and use a lot of superlatives to describe them) but almost every time they are too young and have so much problems to run in production to be useless. I thought the same also about Crate Data.

“Massively scalable data store. It requires zero administration”

First time I read these words (take from the home page of Crate Data) I wasn’t impressed. I simply didn’t think was true. Some months later I read some articles and the overview of the project and I found something more interesting:

It includes solid established open source components (Presto, Elasticsearch, Lucene, Netty)

I used both Lucene and Elasticsearch in production for several years and I really like Presto. Combine some production-ready components can definitely be a smart way to create something great. I decided to give it a try.

They offer a quick way to test it:

bash -c "$(curl -L try.crate.io)"

But I don’t like self install scripts so I decided to download it a run from bin. It simply require JVM. I unpacked it on my desktop on OS X and I launched ./bin/crate. The process bind the port 4200 (or first available between 4200 and 4300) and if you go to http://127.0.0.1:4200/admin you found the admin interface (there is no authentication). You also had a command line interface: ./bin/crash. Is similar to MySQL client and you are familiar with any other SQL client you will be familiar with crash too.

I created a simple table with semi-standard SQL code (data types are a bit different)

create table items (id integer, title string)

Then I search for a Ruby client and I found crate_ruby, the official Ruby client. I started to fill the table using a Ruby script and a million record CSV as input. Inserts go by 5K per second and the meantime I did some aggregation query on database using standard SQL (GROUP BY, ORDER BY and so on) to test performances and response was quite fast.

CSV.foreach("data.csv", col_sep: ";").each do |row|
client.execute("INSERT INTO items (id, title) VALUES (\$1, \$2)", [row[0], row[9]])
end

Finally I decided to inspect cluster features by running another process on the same machine. After a couple of seconds the admin interface shows a new node and after a dozen informs me data was fully replicated. I also tried to shut down both process to see what happen and data seems ok. I was impressed.

crate_admin

I still have many doubts about Crate. I don’t know how to manage users and privileges, I don’t know how to create a custom topology for a cluster and I don’t know how difficult is to use advanced features (like full text search or blob upload). But at the moment I’m impressed because administration seems really easy and scalability seems easy too.

Next step will be test it in production under a Rails application (I found an interesting activerecord-crate-adapter) and test advanced features to implement a real time search. I don’t know if I’ll use it but beginning looks very good.

Next week O’Reilly will host a webcast about Crate. I’m really looking forward to discover more about the project.

A  couple of weeks ago I was playing with Hack looking for online resources. I found on Youtube the playlist of the Hack Dev Day 2014 where Hack was officially presented to the world.

Introduction to the language by Julien Verlaguet is really interesting, it show the advantages of static typing and how the HHVM is able to preserve the rapid development cycle of PHP.

Also talk by Josh Watzman is interesting. He talks about how to convert PHP code to Hack code and years of experience at Facebook are extremely useful.

The conference also talks about how to run HHVM on Heroku, gives an overview of library and common use cases of Hack and talks about HHVM strong optimization.

If you are playing with Hack I absolutely recommend these videos.

hiphop_logoHipHop was one of the most notable thing came from the Facebook labs about PHP development. PHP is slow and limited. They can’t rewrite theirs entire codebase so they decided to make PHP better. HipHop is a simply PHP to C++ compiler (HPHPc). Converted code is compiled into a binary and performance improvements are about 6x.

Unfortunately HipHop has several downsides. For all the performance gains that HPHPc provided, the curve for further performance improvements had flattened. HPHPc did not fully support the PHP language, including the create_function() and eval() constructs. HPHPc required a very different push process, requiring a bigger than 1 GB binary to be compiled and distributed to many machines in short order.

hhvm_logoTo overcome these problems Facebook develops, starting from early 2010, the HHVM: a PHP virtual machine. HHVM builds on top of HPHPc, using the same runtime and extension function implementations. HHVM converts PHP code into a high-level bytecode. This bytecode is then translated into x64 machine code dynamically at runtime by a just-in-time (JIT) compiler similarly to C#/CLR or Java/JVM.

hack_logoFacebook also released Hack, a programming language for HHVM that can be seen as a new version of PHP which it allows programmers to use both dynamic typing and static typing.

HHVM supports major PHP open source projects like WordPress. Running this project on seems really easy. A little modification was needed but last version (3.9) no longer need this. HHVM can also run on Heroku using a custom buildpack available here: https://github.com/hhvm/heroku-buildpack-hhvm.

My first experiment was to run WordPress on Heroku using HHVM. First step is create a Heroku app using HHVM buildpack:

heroku create --buildpack https://github.com/hhvm/heroku-buildpack-hhvm

Then you can deploy a standard WordPress installation adding the following config.hdf (the HHVM configuration file)

Server {
DefaultDocument = index.php
}
Eval {
Jit = true
}
VirtualHost {
* {
Pattern = .*
RewriteRules {
dirindex {
pattern = ^/(.*)/$
to = $1/index.php
qsa = true
}
}
}
}
StaticFile {
FilesMatch {
* {
pattern = .*.(dll|exe)
headers {
* = Content-Disposition: attachment
}
}
}
Extensions {
css = text/css
gif = image/gif
html = text/html
jpe = image/jpeg
jpeg = image/jpeg
jpg = image/jpeg
png = image/png
tif = image/tiff
tiff = image/tiff
txt = text/plain
}
}

Warning: don’t miss a newline character on the last line or linter will fail and you will going to hate this project 😉

Everything works fine. You can add you favorite MySQL hosted service and run your WordPress 5 minutes installation. Almost every plugin seems 100% compatible, I tested most popular with no problem. Performances are better and you also have the opportunity to use Hack to develop new custom plugins.

Now I’m curious about how HHVM can improve my production installations of WordPress. About this I’m looking for an OpenShift cartridge for HHVM or someone want to collaborate to create a new one (the only I found on Github seems “young”). Anyone interested? Let me know!

A few hours after I posted about DataSift architecture, @choult, one of the about 25 ninjas who develop DataSift platform, tweet me.

The following SlideShare presentation by @stuherbert, another ninja, talks about the use of PHP in DataSift. Unlike what you may think, PHP is widely used in data processing.

datasift_repo_languges

System is decomposable in three major data pipelines:

  • Data Archiving (Adds new data to Historic Archive)
  • Filtering Pipeline (Filtering and delivery data in realtime)
  • Playback Pipeline (Filtering and delivery data from Historic Archive)

And PHP is used for many parts of these.

datasift_php

They use a custom build of PHP 5.3.latest with several optimizations and compiled-in extensions (ZeroMQ, APC, XHProof, Redis, XDebug). The also develop some internal components:

  • Frink, tweetrmeme’s framework
  • Stone: foundation of in-house test tools, Hornet and Storyteller (they probably open source a fork named Storyplayer).

Unfortunately I wasn’t able to find more details about these. Anyway, here is the presentation: