In the beginning was RediSQL, a “Hybrid Relational-Database/NOSQL-Datastore written in ANSI C”. After a while they changed the name to AlchemyDB.

Everything is built over Redis adding a Lua interpreter to implement a really interesting technique they call Datastore-Side-Scripting. Is like to use stored procedure, putting logic into datastore. They can achieve many different goals using this technique:

  • Implement a SQL-like language using Lua to decode SQL requests
  • Implement many datatypes not supported by Redis using Lua to fit into common Redis types the new structure
  • Serve content (like web pages o JSON data) directly from the datastore using a REST API.
  • Implement a GraphDB using SQL for Index and Lua for graph-traversal logic.
  • Implement Document-oriented model using Lua
  • Implement an ObjectDB using Lua

Last year Citrusleaf acquired AlchemyDB and Russ Sullivan (the guy behind AlchemyDB) incrementally porting functionality to run on top of Citrusleaf’s proven distributed high-availability linearly-scalable key-value store: Aerospike. It is a distributed NoSQL database, the first solution to claim ACID support and an extremely fast architecture optimized to run using SSDs.

I didn’t test it yet but as far I can see they provide and SDK for most popular programming languages. The Ruby one requires a native library. To start you need to add a node:

require "Citrusleaf"
c = Citrusleaf.new
c.add_node "10.1.1.6", 3000

And set, get and delete operations are done as follow:

# Writing Values
c.put 'namespace', 'myset', 'mykey', 'bin_name', value
# Reading Values
rv, gen, value = c.get 'namespace', 'myset', 'mykey', 'binx'
# Deleting Values
c.delete 'namespace', 'myset', 'mykey'

Documentation isn’t useful yet. The only way to understand how if is cool or not is to test it. That’s what I’ll do.

This week my problem was to modelize a semi-relational structure. We decided to use MongoDB because (someone says) is fast, scalable and schema-less. Unfortunately I’m not a good MongoDB designer yet. Data modeling was mostly easy because I can copy the relational part of the schema. The biggest data modeling problem is about m-to-m relations. How to decide if embed m-to-m relations keys into documents or not? To make the right choice I decided to test different design solutions.

Foreign keys emdedded:

class A
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :bs
end
class B
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :as
end
def direct(small, large)
small.times do |i|
a = A.new
a.name = "A#{i}"
large.times do |j|
b = B.create(name: "B#{j}")
a.bs << b
end
a.save
end
end

Foreign keys into an external document:

class C
include Mongoid::Document
field :name, type: String
has_many :rels
end
class D
include Mongoid::Document
field :name, type: String
has_many :rels
end
class Rel
include Mongoid::Document
belongs_to :c
belongs_to :d
end
def with_rel(small, large)
small.times do |i|
c = C.new
c.name = "C#{i}"
large.times do |j|
d = D.create(name: "D#{j}")
Rel.create(c: c, d: d)
end
end
end

I tested insert time for a database with 10 objects related to a growing number of other objects each iteration (from 100 to 5000).

def measure(message, &block)
cleanup
start = Time.now.to_f
yield
finish = (Time.now.to_f - start).to_f
puts "#{message}: #{"%0.3f" % finish}"
end
(1..50).each do |e|
measure "10 A embeds #{e*100} B each one" do
direct(10, e*100)
end
measure "10 A linked to #{e*100} B with extenal relation" do
with_rel(10, e*100)
end
end

Results are really interesting:

Number of relation for each element Insert time embedding relation key Insert time with external relation
100 0.693 1.021
200 1.435 2.006
300 1.959 2.720
400 2.711 3.587
500 3.477 4.531
600 4.295 5.414
700 5.106 6.369
800 5.985 7.305
900 6.941 8.221
1000 7.822 8.970
1200 12.350 13.946
1400 14.820 15.532
1600 15.806 17.344
1800 18.722 18.372
2000 21.552 20.732
3000 36.151 29.818
4000 56.060 38.154
5000 82.996 47.658

As you can see when number of embedded relation keys go over 2000, the time grow geometrically.

I know, this is not a real case test so we can’t say that using embedded relation is worse than using external. Anyway is really interesting observe that limits are always the same in both SQL and NoSQL world: when you hit a memory limit and need to go to disk, performances degrade.

In coming post I’m going to analyze reading performances.

Recently were released two important updates in the Ruby world (informally named ROR24):

  1. Ruby 2.0.0-p0
    http://www.ruby-lang.org/en/news/2013/02/24/ruby-2-0-0-p0-is-released/
  2. Rails 4.0.beta1
    http://weblog.rubyonrails.org/2013/2/25/Rails-4-0-beta1/

Following this release, PragProg has released a new update for two of the most popular book about this topics.

Programming Ruby (the pickaxe book)
by Dave Thomas, with Chad Fowler and Andy Hunt

programming_ruby_2

Agile Web Development with Rails
by Sam Ruby, Dave Thomas and David Heinemeier Hansson

agile_web_devlopment_with_rails_4

I bought them yesterday. At first look, updates look cool also if there are only minor updates. In the coming days I’m going to practice about these new stuff and write some posts about it 😉

I’m developing a new project which require a data structure not yet well defined. We are evaluating different solutions for persistence and Amazon AWS is one of the partners we are considering. I’m trying to recap solutions which it offers.

Amazon Relational Database Service (RDS)

Relational Database similar to MySQL and PostgreSQL. It offers 3 different engines (with different costs) and each one should be fully compatible with the protocol of the corresponding DBMS: Oracle, MySQL and Microsoft SQL Server.

You can use it with ActiveRecord (with MySQL adapter) on Rails or Sinatra easily. Simply replace you database.yml with given parameters:

production:
adapter: mysql2
host: myprojectname.somestuff.amazonaws.com
database: myprojectname
username: myusername
password: mypass

Amazon DynamoDB

Key/Value Store similar to Riak and Cassandra. It is still in beta but Amazon released a paper (PDF) about its structure a few year ago which inspire many other products.

You can access it using Ruby and aws-sdk gem. I’m not an expert but this code should works for basic interaction (not tested yet).

require "aws"
# set connection parameters
AWS.config(
access_key_id: ENV["AWS_KEY"],
secret_access_key: ENV["AWS_SECRET"]
)
# open connection to DB
DB = AWS::DynamoDB.new
# create a table
TABLES["your_table_name"] = DB.tables["your_table_name"].load_schema
rescue AWS::DynamoDB::Errors::ResourceNotFoundException
table = DB.tables.create("your_table_name", 10, 5, schema)
# it takes time to be created
sleep 1 while table.status == :creating
TABLES["your_table_name"] = table.load_schema
end
end

After that you can interact with table:

# Create a new element
record = TABLES["your_table_name"].items.create(id: "andrea-mostosi")
record.attributes.add(name: ["Andrea"])
record.attributes.add(surname: ["Mostosi"])
# Search for value "andrea-mostosi" inside table
TABLES["your_table_name"].items.query(
hash_value: "andrea-mostosi",
)

Amazon Redshift

Relational DBMS based on PostgreSQL structured for a petabyte-scale amount of data (for data-warehousing). It was released to public in the last days and SDK isn’t well documented yet. Seems to be very interesting for big-data processing on a relational structure.

Amazon ElastiCache

In-RAM caching system based on Memcached protocol. It should be used to cache any kind of object like Memcached. Is different (and worse IMHO) than Redis because doesn’t offer persistence. I prefer a different kind of caching but may be a good choice if your application already use Memcached.

Amazon SimpleDB

RESTFul Key/Value Store using only strings as data types. You can use it with any REST ORM like ActiveResource, dm-rest-adapter or, my favorite, Her (read previous article). If you prefer you can use with any HTTP client like Faraday or HTTParty.

[UPDATE 2013-02-19] SimpleDB isn’t listed into “Database” menu anymore and it seems no longer available for activation.

Other DBMS on markerplace

Many companies offer support to theirs software deployed on EC2 instance. Engines include MongoDB, CouchDB, MySQL, PostgreSQL, Couchbase Server, DB2, Riak, Memcache and Redis.

Sources

I have a problem: I need to store a huge set of data and access it from many different projects located in different locations. I know I’ll never have a schema and probably I have to use more than a DBMS to persist and serve this data. I have no idea about what to use but I need to be up and running asap (as usual 🙂 ).

Best choice seems to build an API between softwares and database. I can access to resources using HTTP and interact in a not-so-complex way. Unfortunately I’ll not be able to use an ORM… Or not?

Actually there are a few projects which try to implement ORM for RESTful resources. Most used solutions has its own component: ActiveRecord include ActiveResource and DataMapper has an adapter called dm-rest-adapter. IMHO Her seems to be the most promising.

It uses Faraday, a well-know, flexible and powerful HTTP adapter and supports its middlewares. Integrating into an existing model is really easy:

class User
include Her::Model
has_many :comments
has_one :role
belongs_to :company
end
@user = User.find(1)
@user.comments # list of comments
@user.role # user role
@user.company # user company
@user.fullname = "Bar"
@user.save
User.create(fullname: "Foo")

You can embed related items into JSON response. If you don’t, Her makes the request only when you try to access to the property.

If REST actions are not enough for you, is possible to define custom actions with generic params

class User
include Her::Model
custom_get :admin
custom_post :search
end
User.admin
User.search(name: "Foo")

or make HTTP request directly

User.get_collection(:admin) # expects a list of items as response
User.get_resource(:admin) # expects a single item as response
User.get_raw(:admin) # returns HTTP response
User.get(:admin) # auto detect get_collection or get_resource

Similar methods are available also for POST, PUT and DELETE. You can also add before and after hooks to your models that are triggered on specific actions (save, update, create, destroy).

I really like this project. Maybe is not supported like competitors and still has a lot of issues but is simple enough to grow.

Recently I have to build a JSON API to wrap the connection to persistence layer in order to be able to change (or add) DBMS later and define more structured logic (authentication, selective caching, …). I didn’t know which DBMS to use but I had to start develop the other components which relay on this persistence layer.

To avoid delay while choosing DBMS setup we decided to build a prototype using Rails and MySQL in order to start defining API’s methods. Rails is really useful when you had to create a MVC application but includes too much stuff if you only need to build an API. This is why usually we use Sinatra.

This time we tried Rails::API,  a subset of a normal Rails application. It’s a bit faster and lightweight and you can use your existing Rails app.

To use it you only need to add gem to Gemfile:

gem 'rails-api'

change the ancestor of ApplicationController

class ApplicationController < ActionController::API
# [...]
end

and comment out the protect_from_forgery call if you are using it.

Everything seems to be ok. Created four models, a couple of controllers with usual REST action and everything is done: my prototype is up and running.

Unfortunately there are some new Rails feature that aren’t well supported by this gem. The most important is the wrap_parameters support IMHO. The ActionController::ParamsWrapper should automatically make a copy of request’s parameters into an hash named as the element you are sending.

For example, if you send to /users:

{"name": "Andrea"}

controller should receive:

{"name" => "Andrea", "user" => {"name" => "Andrea"}}

It is very convenient if you use standard ActiveRecord-based scaffold but Rails::API doesn’t support well this initializer: https://github.com/rails-api/rails-api/issues/33. You have to setup it manually.

Into each controller you must define how to wrap params:

class UsersController < ApplicationController
wrap_parameters :person, include: [:name], format: :json
# [...]
end

Is the only issue i found but took me a lot of time to be solved. I choose Rails because is easy and I can build a prototype in a flash but I think is still to early to use it to build an API, maybe Rails 4. At the moment I still prefer to use Sinatra.

Thanks to @olinicola, he built the prototype and found the solution to the issue.

ActiveRecord is an incredibly powerful tool but the Rails Guides doesn’t cover every possible situation and the ActiveRecord’s official documentation is huge. Find something you are looking for can be hard. If you have to do something strange and you have no time to search you have to hope someone had got the same problem and posted it on StackOverflow or on its own blog.

Recently I have to modelize a relation where a resource belongs to an entity and contemporary is related to N other entities.

The One-to-Many relation is easy: use belongs_to and has_many. The other part is harder because you need to use a connection table (HABTM doesn’t work) and you need to rename relation because its name is already taken.

You can use a connection table using through attribute:

has_many :connection_table
has_many :items, through: :connection_table

and rename a has_many through relation using source attribute:

has_many :related_items, through: :connection_table, source: :items

Problem solved:

class Resource < ActiveRecord::Base
belongs_to :entity
has_many :connections
has_many :related_entities,
through: :connections, source: :entity
end

 

class Entity < ActiveRecord::Base
has_many :resources
has_many :connections
has_many :related_resources,
through: :connections, source: :resource
end

 

class Connection < ActiveRecord::Base
belongs_to :entity
belongs_to :resource
end

Thanks to @olinicola for the advises 🙂

If you run a commercial webapp, probably you have to track access.

CloudFlare helps you to manage more connection but hides from you many informations about the client. If you try to log the IP address you always get the CloudFlare’s ones.

Common headers which nginx uses to forward original IP (X-Forwarded-For and X-Real-IP) contain the CloudFlare’s IP. The correct header where to look is HTTP_CF_CONNECTING_IP.

1
2
/* PHP */
$_SERVER['HTTP_CF_CONNECTING_IP']
1
2
# Rack
request.headers["HTTP_CF_CONNECTING_IP"]