The Importance of Being String

A few weeks ago I faced a performance issue around a lookup query on a 5GB MySQL database. About 10% of these request take 100x more time than the others. Syntax was really simple:

SELECT  `authors`.* FROM `authors`  WHERE `authors`.`remote_id` = 415418856 LIMIT 1;

Actually it wasn’t a real “slow query”. Table is quite large (3M+ record) and a response time of 300ms was reasonable. Unfortunately my application run this query thousands times each minute and performances were degraded. The remote_id is a standard varchar field indexed by a unique Hash index and I couldn’t understand why these queries take so much time.

Read More

Performance driven data modeling using MongoDB – Part 1

This week my problem was to modelize a semi-relational structure. We decided to use MongoDB because (someone says) is fast, scalable and schema-less. Unfortunately I’m not a good MongoDB designer yet. Data modeling was mostly easy because I can copy the relational part of the schema. The biggest data modeling problem is about m-to-m relations. How to decide if embed m-to-m relations keys into documents or not? To make the right choice I decided to test different design solutions.

Foreign keys emdedded:

class A
  include Mongoid::Document
  field :name, type: String
  has_and_belongs_to_many :bs
end

class B
  include Mongoid::Document
  field :name, type: String
  has_and_belongs_to_many :as
end

def direct(small, large)
  small.times do |i|
    a = A.new
    a.name = "A#{i}"
    large.times do |j|
      b = B.create(name: "B#{j}")
      a.bs << b
    end
    a.save
  end
end

Foreign keys into an external document:

class C
  include Mongoid::Document
  field :name, type: String
  has_many :rels
end

class D
  include Mongoid::Document
  field :name, type: String
  has_many :rels
end

class Rel
  include Mongoid::Document
  belongs_to :c
  belongs_to :d
end

def with_rel(small, large)
  small.times do |i|
    c = C.new
    c.name = "C#{i}"
    large.times do |j|
      d = D.create(name: "D#{j}")
      Rel.create(c: c, d: d)
    end
  end
end

I tested insert time for a database with 10 objects related to a growing number of other objects each iteration (from 100 to 5000).

def measure(message, &block)
  cleanup
  start = Time.now.to_f
  yield
  finish = (Time.now.to_f - start).to_f
  puts "#{message}: #{"%0.3f" % finish}"
end

(1..50).each do |e|
  measure "10 A embeds #{e*100} B each one" do
    direct(10, e*100)
  end
  measure "10 A linked to #{e*100} B with extenal relation" do
    with_rel(10, e*100)
  end
end

Results are really interesting:

Number of relation for each element Insert time embedding relation key Insert time with external relation
100 0.693 1.021
200 1.435 2.006
300 1.959 2.720
400 2.711 3.587
500 3.477 4.531
600 4.295 5.414
700 5.106 6.369
800 5.985 7.305
900 6.941 8.221
1000 7.822 8.970
1200 12.350 13.946
1400 14.820 15.532
1600 15.806 17.344
1800 18.722 18.372
2000 21.552 20.732
3000 36.151 29.818
4000 56.060 38.154
5000 82.996 47.658

As you can see when number of embedded relation keys go over 2000, the time grow geometrically.

I know, this is not a real case test so we can’t say that using embedded relation is worse than using external. Anyway is really interesting observe that limits are always the same in both SQL and NoSQL world: when you hit a memory limit and need to go to disk, performances degrade.

In coming post I’m going to analyze reading performances.