Virtual memory and Redis cluster: to scale Redis is hard

Redis is a RAM based key-value store. RAM is expensive. Hard disks (even SSD) are slow. It’s the truth, we know.

A few months ago people tried to use Redis instead of MySQL (or similar SQL DBs) as main datastore. When you do, is easy to clash against the memory limit. As we learned from operative systems, first solution is to use your disk space to “enlarge” your RAM. Redis versions from 2.0 to 2.6 offer a Virtual Memory implementation.

Virtual Memory seems to be really useful in many cases. If only a small part of your keys get the vast majority of accesses you can efficiently keep only that part of keys into RAM and leave the remaining part on disk.

To enable Virtual Memory you can switch it on using vm-enabled yes and set the memory limit using vm-max-memory. Additionally you can fine tune the configuration using vm-pages and vm-page-size for swap file and vm-max-threads for concurrency.

Anyway since version 2.4 Virtual Memory is deprecated. This is the official note about it:

Redis VM is now deprecated. Redis 2.4 will be the latest Redis version featuring Virtual Memory (but it also warns you that Virtual Memory usage is discouraged). We found that using VM has several disadvantages and problems. In the future of Redis we want to simply provide the best in-memory database (but persistent on disk as usual) ever, without considering at least for now the support for databases bigger than RAM. Our future efforts are focused into providing scripting, cluster, and better persistence.

The alternative is Redis cluster. It will be a “distributed and fault tolerant implementation of a subset of the features available in the Redis stand alone server”. At the moment is a work in progress. There are some client-side implementation (for Node.jsfor Ruby and more) but not yet an official, standalone version.

Virtual memory deprecation and Redis cluster long developing time make me think about a simple idea:

Redis is not ready to be the main datastore for a huge dataset, not yet. 

More about Redis scaling

[2013-03-09 UPDATE] @olinicola advises me a post by @antirez about to use Redis in memory and to swap on SSD. His response is the same:

TL;DR: the outcome of this test was expected and Redis is an in-memory system 🙂

Persistence in the Amazon AWS Cloud

I’m developing a new project which require a data structure not yet well defined. We are evaluating different solutions for persistence and Amazon AWS is one of the partners we are considering. I’m trying to recap solutions which it offers.

Amazon Relational Database Service (RDS)

Relational Database similar to MySQL and PostgreSQL. It offers 3 different engines (with different costs) and each one should be fully compatible with the protocol of the corresponding DBMS: Oracle, MySQL and Microsoft SQL Server.

You can use it with ActiveRecord (with MySQL adapter) on Rails or Sinatra easily. Simply replace you database.yml with given parameters:

production:
  adapter: mysql2
  host: myprojectname.somestuff.amazonaws.com
  database: myprojectname
  username: myusername
  password: mypass

Amazon DynamoDB

Key/Value Store similar to Riak and Cassandra. It is still in beta but Amazon released a paper (PDF) about its structure a few year ago which inspire many other products.

You can access it using Ruby and aws-sdk gem. I’m not an expert but this code should works for basic interaction (not tested yet).

require "aws"

# set connection parameters
AWS.config(
  access_key_id: ENV["AWS_KEY"],
  secret_access_key: ENV["AWS_SECRET"]
)

# open connection to DB
DB = AWS::DynamoDB.new

# create a table
TABLES["your_table_name"] = DB.tables["your_table_name"].load_schema
  rescue AWS::DynamoDB::Errors::ResourceNotFoundException
    table = DB.tables.create("your_table_name", 10, 5, schema)
    # it takes time to be created
    sleep 1 while table.status == :creating
    TABLES["your_table_name"] = table.load_schema
  end
end

After that you can interact with table:

# Create a new element
record = TABLES["your_table_name"].items.create(id: "andrea-mostosi")
record.attributes.add(name: ["Andrea"])
record.attributes.add(surname: ["Mostosi"])

# Search for value "andrea-mostosi" inside table
TABLES["your_table_name"].items.query(
  hash_value: "andrea-mostosi",
)

Amazon Redshift

Relational DBMS based on PostgreSQL structured for a petabyte-scale amount of data (for data-warehousing). It was released to public in the last days and SDK isn’t well documented yet. Seems to be very interesting for big-data processing on a relational structure.

Amazon ElastiCache

In-RAM caching system based on Memcached protocol. It should be used to cache any kind of object like Memcached. Is different (and worse IMHO) than Redis because doesn’t offer persistence. I prefer a different kind of caching but may be a good choice if your application already use Memcached.

Amazon SimpleDB

RESTFul Key/Value Store using only strings as data types. You can use it with any REST ORM like ActiveResource, dm-rest-adapter or, my favorite, Her (read previous article). If you prefer you can use with any HTTP client like Faraday or HTTParty.

[UPDATE 2013-02-19] SimpleDB isn’t listed into “Database” menu anymore and it seems no longer available for activation.

Other DBMS on markerplace

Many companies offer support to theirs software deployed on EC2 instance. Engines include MongoDB, CouchDB, MySQL, PostgreSQL, Couchbase Server, DB2, Riak, Memcache and Redis.

Sources

Intersect a SET with a ZSET

Redis‘s SET and ZSET (sorted sets) are a really powerful structure. The only limits are about set operation you can perform. Using Redis you can’t obtain the intersection (or the union) between two sorted set or between a SET and a ZSET. You can use SINTER to intersect a group of SET or SUNION for union. Unfortunately there is no direct way for ZSET.

In our use case, we had to intersect a ZSET (a sorted rank) and a SET (a group of categorized items) to find the rank of the element inside selected category.

After a successful search on Google I found a way on StackOverflow (view below link): use ZINTERSTORE. It’s really simple: act like SINTER but store results into a new ZSET. It has a quite expensive memory footprint but is ok if you frequently reuse the result (is like a cache and you can set expire time using EXPIRE).

Source
http://stackoverflow.com/questions/10500695/redis-how-to-intersect-a-normal-set-with-a-sorted-set