This week my problem was to modelize a semi-relational structure. We decided to use MongoDB because (someone says) is fast, scalable and schema-less. Unfortunately I’m not a good MongoDB designer yet. Data modeling was mostly easy because I can copy the relational part of the schema. The biggest data modeling problem is about m-to-m relations. How to decide if embed m-to-m relations keys into documents or not? To make the right choice I decided to test different design solutions.

Foreign keys emdedded:

class A
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :bs
end
class B
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :as
end
def direct(small, large)
small.times do |i|
a = A.new
a.name = "A#{i}"
large.times do |j|
b = B.create(name: "B#{j}")
a.bs << b
end
a.save
end
end

Foreign keys into an external document:

class C
include Mongoid::Document
field :name, type: String
has_many :rels
end
class D
include Mongoid::Document
field :name, type: String
has_many :rels
end
class Rel
include Mongoid::Document
belongs_to :c
belongs_to :d
end
def with_rel(small, large)
small.times do |i|
c = C.new
c.name = "C#{i}"
large.times do |j|
d = D.create(name: "D#{j}")
Rel.create(c: c, d: d)
end
end
end

I tested insert time for a database with 10 objects related to a growing number of other objects each iteration (from 100 to 5000).

def measure(message, &block)
cleanup
start = Time.now.to_f
yield
finish = (Time.now.to_f - start).to_f
puts "#{message}: #{"%0.3f" % finish}"
end
(1..50).each do |e|
measure "10 A embeds #{e*100} B each one" do
direct(10, e*100)
end
measure "10 A linked to #{e*100} B with extenal relation" do
with_rel(10, e*100)
end
end

Results are really interesting:

Number of relation for each element Insert time embedding relation key Insert time with external relation
100 0.693 1.021
200 1.435 2.006
300 1.959 2.720
400 2.711 3.587
500 3.477 4.531
600 4.295 5.414
700 5.106 6.369
800 5.985 7.305
900 6.941 8.221
1000 7.822 8.970
1200 12.350 13.946
1400 14.820 15.532
1600 15.806 17.344
1800 18.722 18.372
2000 21.552 20.732
3000 36.151 29.818
4000 56.060 38.154
5000 82.996 47.658

As you can see when number of embedded relation keys go over 2000, the time grow geometrically.

I know, this is not a real case test so we can’t say that using embedded relation is worse than using external. Anyway is really interesting observe that limits are always the same in both SQL and NoSQL world: when you hit a memory limit and need to go to disk, performances degrade.

In coming post I’m going to analyze reading performances.