crate_logo

I usually don’t trust cutting edge datastore. They promise a lot of stunning features (and use a lot of superlatives to describe them) but almost every time they are too young and have so much problems to run in production to be useless. I thought the same also about Crate Data.

“Massively scalable data store. It requires zero administration”

First time I read these words (take from the home page of Crate Data) I wasn’t impressed. I simply didn’t think was true. Some months later I read some articles and the overview of the project and I found something more interesting:

It includes solid established open source components (Presto, Elasticsearch, Lucene, Netty)

I used both Lucene and Elasticsearch in production for several years and I really like Presto. Combine some production-ready components can definitely be a smart way to create something great. I decided to give it a try.

They offer a quick way to test it:

bash -c "$(curl -L try.crate.io)"

But I don’t like self install scripts so I decided to download it a run from bin. It simply require JVM. I unpacked it on my desktop on OS X and I launched ./bin/crate. The process bind the port 4200 (or first available between 4200 and 4300) and if you go to http://127.0.0.1:4200/admin you found the admin interface (there is no authentication). You also had a command line interface: ./bin/crash. Is similar to MySQL client and you are familiar with any other SQL client you will be familiar with crash too.

I created a simple table with semi-standard SQL code (data types are a bit different)

create table items (id integer, title string)

Then I search for a Ruby client and I found crate_ruby, the official Ruby client. I started to fill the table using a Ruby script and a million record CSV as input. Inserts go by 5K per second and the meantime I did some aggregation query on database using standard SQL (GROUP BY, ORDER BY and so on) to test performances and response was quite fast.

CSV.foreach("data.csv", col_sep: ";").each do |row|
client.execute("INSERT INTO items (id, title) VALUES (\$1, \$2)", [row[0], row[9]])
end

Finally I decided to inspect cluster features by running another process on the same machine. After a couple of seconds the admin interface shows a new node and after a dozen informs me data was fully replicated. I also tried to shut down both process to see what happen and data seems ok. I was impressed.

crate_admin

I still have many doubts about Crate. I don’t know how to manage users and privileges, I don’t know how to create a custom topology for a cluster and I don’t know how difficult is to use advanced features (like full text search or blob upload). But at the moment I’m impressed because administration seems really easy and scalability seems easy too.

Next step will be test it in production under a Rails application (I found an interesting activerecord-crate-adapter) and test advanced features to implement a real time search. I don’t know if I’ll use it but beginning looks very good.

Next week O’Reilly will host a webcast about Crate. I’m really looking forward to discover more about the project.

Serialized fields in Rails are a really useful feature to store structured data related to a single element of your application. Performance usually aren’t so stunning because they are stored in a text field.

Recently to overcome this limit hstore on PostgreSQL and similar structure on other DBMS have gained popularity.

Anyway editing data using a form still require a lot of code. Last week a was working on a form to edit options of an elements stored into serialized field and I found this question on StackOverflow. It seems a really interesting solution. For a serialized field called properties

class Element < ActiveRecord::Base
serialize :properties
end

I can dynamically define accessor method for any field I need.

class Element < ActiveRecord::Base
serialize :properties
def self.serialized_attr_accessor(*args)
args.each do |method_name|
eval "
def #{method_name}
(self.properties || {})[:#{method_name}]
end
def #{method_name}=(value)
self.properties ||= {}
self.properties[:#{method_name}] = value
end
attr_accessible :#{method_name}
"
end
end
serialized_attr_accessor :field1, :field2, :field3
end

And then you can easies access fields in a view

#haml
- form_for @element do |f|
= f.text_field :field1
= f.text_field :field2
= f.text_field :field3

IMHO it’s a really clean way to improve quality of accessor code.

I have a problem: I need to store a huge set of data and access it from many different projects located in different locations. I know I’ll never have a schema and probably I have to use more than a DBMS to persist and serve this data. I have no idea about what to use but I need to be up and running asap (as usual 🙂 ).

Best choice seems to build an API between softwares and database. I can access to resources using HTTP and interact in a not-so-complex way. Unfortunately I’ll not be able to use an ORM… Or not?

Actually there are a few projects which try to implement ORM for RESTful resources. Most used solutions has its own component: ActiveRecord include ActiveResource and DataMapper has an adapter called dm-rest-adapter. IMHO Her seems to be the most promising.

It uses Faraday, a well-know, flexible and powerful HTTP adapter and supports its middlewares. Integrating into an existing model is really easy:

class User
include Her::Model
has_many :comments
has_one :role
belongs_to :company
end
@user = User.find(1)
@user.comments # list of comments
@user.role # user role
@user.company # user company
@user.fullname = "Bar"
@user.save
User.create(fullname: "Foo")

You can embed related items into JSON response. If you don’t, Her makes the request only when you try to access to the property.

If REST actions are not enough for you, is possible to define custom actions with generic params

class User
include Her::Model
custom_get :admin
custom_post :search
end
User.admin
User.search(name: "Foo")

or make HTTP request directly

User.get_collection(:admin) # expects a list of items as response
User.get_resource(:admin) # expects a single item as response
User.get_raw(:admin) # returns HTTP response
User.get(:admin) # auto detect get_collection or get_resource

Similar methods are available also for POST, PUT and DELETE. You can also add before and after hooks to your models that are triggered on specific actions (save, update, create, destroy).

I really like this project. Maybe is not supported like competitors and still has a lot of issues but is simple enough to grow.

Recently I have to build a JSON API to wrap the connection to persistence layer in order to be able to change (or add) DBMS later and define more structured logic (authentication, selective caching, …). I didn’t know which DBMS to use but I had to start develop the other components which relay on this persistence layer.

To avoid delay while choosing DBMS setup we decided to build a prototype using Rails and MySQL in order to start defining API’s methods. Rails is really useful when you had to create a MVC application but includes too much stuff if you only need to build an API. This is why usually we use Sinatra.

This time we tried Rails::API,  a subset of a normal Rails application. It’s a bit faster and lightweight and you can use your existing Rails app.

To use it you only need to add gem to Gemfile:

gem 'rails-api'

change the ancestor of ApplicationController

class ApplicationController < ActionController::API
# [...]
end

and comment out the protect_from_forgery call if you are using it.

Everything seems to be ok. Created four models, a couple of controllers with usual REST action and everything is done: my prototype is up and running.

Unfortunately there are some new Rails feature that aren’t well supported by this gem. The most important is the wrap_parameters support IMHO. The ActionController::ParamsWrapper should automatically make a copy of request’s parameters into an hash named as the element you are sending.

For example, if you send to /users:

{"name": "Andrea"}

controller should receive:

{"name" => "Andrea", "user" => {"name" => "Andrea"}}

It is very convenient if you use standard ActiveRecord-based scaffold but Rails::API doesn’t support well this initializer: https://github.com/rails-api/rails-api/issues/33. You have to setup it manually.

Into each controller you must define how to wrap params:

class UsersController < ApplicationController
wrap_parameters :person, include: [:name], format: :json
# [...]
end

Is the only issue i found but took me a lot of time to be solved. I choose Rails because is easy and I can build a prototype in a flash but I think is still to early to use it to build an API, maybe Rails 4. At the moment I still prefer to use Sinatra.

Thanks to @olinicola, he built the prototype and found the solution to the issue.

ActiveRecord is an incredibly powerful tool but the Rails Guides doesn’t cover every possible situation and the ActiveRecord’s official documentation is huge. Find something you are looking for can be hard. If you have to do something strange and you have no time to search you have to hope someone had got the same problem and posted it on StackOverflow or on its own blog.

Recently I have to modelize a relation where a resource belongs to an entity and contemporary is related to N other entities.

The One-to-Many relation is easy: use belongs_to and has_many. The other part is harder because you need to use a connection table (HABTM doesn’t work) and you need to rename relation because its name is already taken.

You can use a connection table using through attribute:

has_many :connection_table
has_many :items, through: :connection_table

and rename a has_many through relation using source attribute:

has_many :related_items, through: :connection_table, source: :items

Problem solved:

class Resource < ActiveRecord::Base
belongs_to :entity
has_many :connections
has_many :related_entities,
through: :connections, source: :entity
end

 

class Entity < ActiveRecord::Base
has_many :resources
has_many :connections
has_many :related_resources,
through: :connections, source: :resource
end

 

class Connection < ActiveRecord::Base
belongs_to :entity
belongs_to :resource
end

Thanks to @olinicola for the advises 🙂