A few weeks ago I faced a performance issue around a lookup query on a 5GB MySQL database. About 10% of these request take 100x more time than the others. Syntax was really simple:

SELECT  `authors`.* FROM `authors`  WHERE `authors`.`remote_id` = 415418856 LIMIT 1;

Actually it wasn’t a real “slow query”. Table is quite large (3M+ record) and a response time of 300ms was reasonable. Unfortunately my application run this query thousands times each minute and performances were degraded. The remote_id is a standard varchar field indexed by a unique Hash index and I couldn’t understand why these queries take so much time.

After a long search I found a note in the reference of MySQL related to type conversion:
MySQL 5.7 Reference Manual > 12.2 Type Conversion in Expression Evaluation


For comparisons of a string column with a number, MySQL cannot use an index on the column to look up the value quickly. If str_col is an indexed string column, the index cannot be used when performing the lookup in the following statement:

SELECT * FROM tbl_name WHERE str_col=1;

Everything was clear: my Ruby application doesn’t implicit cast to string the remote_id I use when this value is an integer. This is a quite strange behavior because ActiveRecord, the application ORM, usually operates cast transparently. Usually.

The fixed version of the query runs in 3ms:

SELECT  `authors`.* FROM `authors`  WHERE `authors`.`remote_id` = '415418856' LIMIT 1;

Small changes, big headaches, great results 🙂

This week my problem was to modelize a semi-relational structure. We decided to use MongoDB because (someone says) is fast, scalable and schema-less. Unfortunately I’m not a good MongoDB designer yet. Data modeling was mostly easy because I can copy the relational part of the schema. The biggest data modeling problem is about m-to-m relations. How to decide if embed m-to-m relations keys into documents or not? To make the right choice I decided to test different design solutions.

Foreign keys emdedded:

class A
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :bs
end
class B
include Mongoid::Document
field :name, type: String
has_and_belongs_to_many :as
end
def direct(small, large)
small.times do |i|
a = A.new
a.name = "A#{i}"
large.times do |j|
b = B.create(name: "B#{j}")
a.bs << b
end
a.save
end
end

Foreign keys into an external document:

class C
include Mongoid::Document
field :name, type: String
has_many :rels
end
class D
include Mongoid::Document
field :name, type: String
has_many :rels
end
class Rel
include Mongoid::Document
belongs_to :c
belongs_to :d
end
def with_rel(small, large)
small.times do |i|
c = C.new
c.name = "C#{i}"
large.times do |j|
d = D.create(name: "D#{j}")
Rel.create(c: c, d: d)
end
end
end

I tested insert time for a database with 10 objects related to a growing number of other objects each iteration (from 100 to 5000).

def measure(message, &block)
cleanup
start = Time.now.to_f
yield
finish = (Time.now.to_f - start).to_f
puts "#{message}: #{"%0.3f" % finish}"
end
(1..50).each do |e|
measure "10 A embeds #{e*100} B each one" do
direct(10, e*100)
end
measure "10 A linked to #{e*100} B with extenal relation" do
with_rel(10, e*100)
end
end

Results are really interesting:

Number of relation for each element Insert time embedding relation key Insert time with external relation
100 0.693 1.021
200 1.435 2.006
300 1.959 2.720
400 2.711 3.587
500 3.477 4.531
600 4.295 5.414
700 5.106 6.369
800 5.985 7.305
900 6.941 8.221
1000 7.822 8.970
1200 12.350 13.946
1400 14.820 15.532
1600 15.806 17.344
1800 18.722 18.372
2000 21.552 20.732
3000 36.151 29.818
4000 56.060 38.154
5000 82.996 47.658

As you can see when number of embedded relation keys go over 2000, the time grow geometrically.

I know, this is not a real case test so we can’t say that using embedded relation is worse than using external. Anyway is really interesting observe that limits are always the same in both SQL and NoSQL world: when you hit a memory limit and need to go to disk, performances degrade.

In coming post I’m going to analyze reading performances.