I’m developing a new project which require a data structure not yet well defined. We are evaluating different solutions for persistence and Amazon AWS is one of the partners we are considering. I’m trying to recap solutions which it offers.

Amazon Relational Database Service (RDS)

Relational Database similar to MySQL and PostgreSQL. It offers 3 different engines (with different costs) and each one should be fully compatible with the protocol of the corresponding DBMS: Oracle, MySQL and Microsoft SQL Server.

You can use it with ActiveRecord (with MySQL adapter) on Rails or Sinatra easily. Simply replace you database.yml with given parameters:

production:
adapter: mysql2
host: myprojectname.somestuff.amazonaws.com
database: myprojectname
username: myusername
password: mypass

Amazon DynamoDB

Key/Value Store similar to Riak and Cassandra. It is still in beta but Amazon released a paper (PDF) about its structure a few year ago which inspire many other products.

You can access it using Ruby and aws-sdk gem. I’m not an expert but this code should works for basic interaction (not tested yet).

require "aws"
# set connection parameters
AWS.config(
access_key_id: ENV["AWS_KEY"],
secret_access_key: ENV["AWS_SECRET"]
)
# open connection to DB
DB = AWS::DynamoDB.new
# create a table
TABLES["your_table_name"] = DB.tables["your_table_name"].load_schema
rescue AWS::DynamoDB::Errors::ResourceNotFoundException
table = DB.tables.create("your_table_name", 10, 5, schema)
# it takes time to be created
sleep 1 while table.status == :creating
TABLES["your_table_name"] = table.load_schema
end
end

After that you can interact with table:

# Create a new element
record = TABLES["your_table_name"].items.create(id: "andrea-mostosi")
record.attributes.add(name: ["Andrea"])
record.attributes.add(surname: ["Mostosi"])
# Search for value "andrea-mostosi" inside table
TABLES["your_table_name"].items.query(
hash_value: "andrea-mostosi",
)

Amazon Redshift

Relational DBMS based on PostgreSQL structured for a petabyte-scale amount of data (for data-warehousing). It was released to public in the last days and SDK isn’t well documented yet. Seems to be very interesting for big-data processing on a relational structure.

Amazon ElastiCache

In-RAM caching system based on Memcached protocol. It should be used to cache any kind of object like Memcached. Is different (and worse IMHO) than Redis because doesn’t offer persistence. I prefer a different kind of caching but may be a good choice if your application already use Memcached.

Amazon SimpleDB

RESTFul Key/Value Store using only strings as data types. You can use it with any REST ORM like ActiveResource, dm-rest-adapter or, my favorite, Her (read previous article). If you prefer you can use with any HTTP client like Faraday or HTTParty.

[UPDATE 2013-02-19] SimpleDB isn’t listed into “Database” menu anymore and it seems no longer available for activation.

Other DBMS on markerplace

Many companies offer support to theirs software deployed on EC2 instance. Engines include MongoDB, CouchDB, MySQL, PostgreSQL, Couchbase Server, DB2, Riak, Memcache and Redis.

Sources

A few weeks ago we had to put some hundred GBs of sensitive data (mostly large video files) in a shared space in order to keep a backup copy and deliver it to recipients. We were looking for a lot of encrypted space accessibile from everywhere with strong access control and able to log every operation. Amazon S3 seemed very interesting so we decided to try it.

When you create a new bucket you can choose to activate logging on it. Logging: OK!
Data encryption is possible during upload. Encryption: OK!
Access control on buckets isn’t so easy. You must use policies.

To define permission for a user using Amazon AWS IAM (Identity and Access Management) you have to create a group, then create one or more users and assign them to the group then choose a policy for the group.

Policies aren’t easy to define and the complete documentation is huge. They are a set of statements which define what you can and can’t do on AWS properties. Using policy below you can setup mybucket for read/write access.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads"
],
"Resource": "arn:aws:s3:::mybucket",
"Condition": {}
},
{
"Effect": "Allow",
"Action": [
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObject",
"s3:GetObjectAcl",
"s3:GetObjectVersion",
"s3:GetObjectVersionAcl",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectAclVersion"
],
"Resource": "arn:aws:s3:::mybucket/*",
"Condition": {}
},
{
"Effect": "Allow",
      "Action": "s3:ListAllMyBuckets",      "Resource": "*",
"Condition": {}
}
]
}

The problem with the policy above is the last statement.

If you want to access to S3 space using a generic client (like a modern FTP client) you have to allow listing all your buckets because first operation which a generic client performs is directory listing. This is a bad news because if you have a public accessible bucket, people can see all contents. Probably you don’t use S3 just to share files. Probably you use S3 as CDN (I do) and probably you don’t want to show ALL contents to others. So far I haven’t been able to find any way to fix this behavior…