occasionally useful ruby, ubuntu, etc

28Feb/111

Ye Olde Key-Value Store: BerkeleyDB

This weekend I explored the possibility of implementing something resembling Redis but on top of an existing, mature database engine.  It's a key-value store, it has transactions, it supports locks and concurrent writes, replication, etc.  While it ultimately didn't end up panning out (trying to get sets to work efficiently just wasn't working out), it was an interesting learning experience for me familiarizing myself with the Java DPL (Direct Persistence Layer) in BerkeleyDB.  (Let that be your warning that there's Java in this post!)

The DPL is basically the new(ish) Java 5 generics-and-annotations-happy way of storing objects into BerkeleyDB (aka BDB from here on out).  Here's an example:

import com.sleepycat.persist.model.Entity;
import com.sleepycat.persist.model.PrimaryKey;
 
@Entity
public class BString {
	@PrimaryKey
	private String key;
	private String value;
 
	public String getKey() {
		return key;
	}
	public void setKey(String key) {
		this.key = key;
	}
	public String getValue() {
		return value;
	}
	public void setValue(String value) {
		this.value = value;
	}
}

It's basically a bean with a couple annotations -- @Entity to mark it as storable, and @PrimaryKey to say, well, what the primary key is. It's required and has to be unique. Nothing special so far. Getting and setting is simple, too:

// Don't worry about `store`, it's an EntityStore object
// First parameter is the type of the primary key, and the second is the actual bean class
PrimaryIndex<String, BString> pIdx = store.getPrimaryIndex(String.class, BString.class);
 
public void set(String key, String value) throws DatabaseException {
	BString bStr = new BString();
	bStr.setKey(key);
	bStr.setValue(value);
	pIdx.put(bStr);
}
public String get(String key) throws DatabaseException {
	BString bStr = pIdx.get(key);
	return bStr == null ? null : bStr.getValue();
}

Suppose you want to increment the field atomically and safely; you can use transactions for that:

public long incr(String key) throws DatabaseException {
	Transaction t = env.beginTransaction(null, null);
	try {
		BString bStr = pIdx.get(t, key, LockMode.RMW); // this is a "Read-Modify-Write" lock
		long value = Long.parseLong(bStr.getValue());
		value += 1;
		bStr.setValue(Long.toString(value));
		pIdx.put(t, bStr);
		t.commit();
		t = null;
		return value;
	} finally {
		abortQuietly(t); // this just rolls back the transaction if it's not null
	}
}

And most of what you need to know about basic retrieval, creation, and transactions is right there. However, you can do more interesting things in what BDB documentation describes as joins. Let's take a look at the Redis-"hash"-like structure:

import com.sleepycat.persist.model.Entity;
import com.sleepycat.persist.model.PrimaryKey;
import com.sleepycat.persist.model.Relationship;
import com.sleepycat.persist.model.SecondaryKey;
 
@Entity
public class BHash {
	// think auto_increment from MySQL, or, well, sequences in Oracle
	@PrimaryKey(sequence="hash_sequences")
	private long seq;
	@SecondaryKey(relate=Relationship.MANY_TO_ONE)
	private String key;
	@SecondaryKey(relate=Relationship.MANY_TO_ONE)
	private String field;
 
	private String value;
 
	public String getKey() {
		return key;
	}
 
	public void setKey(String key) {
		this.key = key;
	}
 
	public String getField() {
		return field;
	}
 
	public void setField(String field) {
		this.field = field;
	}
 
	public String getValue() {
		return value;
	}
 
	public void setValue(String value) {
		this.value = value;
	}
 
	public long getSeq() {
		return seq;
	}
 
	public void setSeq(long seq) {
		this.seq = seq;
	}
}

Many-to-one here means "many instances can have the same value here, but this class only has one". If it was *-to-many, it would have to be a collection. One-to-one guarantees uniqueness. Here, rather than storing an actual hash map into the BDB (which we could!) we're just going to put in key-field-value tuples. The reason for this is so you can have a hash of enormous size (say tens of thousands of keys) and not need to deserialize the entire object when you only want to access one key. So, on to joins, or "How do I get the value for the entry with a particular key and field?":

PrimaryIndex<Long, BHash> pIdx = store.getPrimaryIndex(Long.class, BHash.class);
// String.class is the type of the key, and "key" is the name of the getter/setter for that key.
SecondaryIndex<String, Long, BHash> keyIdx = store.getSecondaryIndex(pIdx, String.class, "key");
SecondaryIndex<String, Long, BHash> fieldIdx = store.getSecondaryIndex(pIdx, String.class, "field");
 
public boolean set(String key, String field, String value) throws DatabaseException {
	// Same as before, basically
	BHash hash = new BHash();
	hash.setKey(key);
	hash.setField(field);
	hash.setValue(value);
	return pIdx.put(hash) == null;
}
 
public String get(String key, String field) throws DatabaseException {
	// Here we're instantiating a helper class for the join
	EntityJoin<Long, BHash> join = new EntityJoin<Long, BHash>(pIdx);
	join.addCondition(keyIdx, key);
	join.addCondition(fieldIdx, field);
	ForwardCursor<BHash> cursor = join.entities(); // Yay cursors?
	try {
		BHash hash = cursor.next(); // bit of a hack but there's not a great way to get the first element from ForwardCursor
		return hash == null ? null : hash.getValue();
	} finally {
		cursor.close();
	}
}

Between the concepts covered here, you can sort of almost build a Redis-clone on top of BDB. Strings, hashes, and lists are all doable, but it starts to break down when you try to implement sets or sorted sets -- there's not a way to dictate unique compound keys as far as I can tell (each key-member pair in a set would be unique). If one could do that, I think sorted sets would be simple, as sorting keys is supported.

Performance was also important, and running within the scope of a unit test (not the best place for the JVM to warm up, admittedly) performance was about 1/3 that of Redis for that which I did implement (except for increment; those locks are brutal). Not bad for a Sunday's worth of work!

Anyway, it gave me some valuable experience into how other key-value stores work, and how key-value stores should work on other platforms (i.e. Java).

And now, back to our regularly scheduled Ruby-programming.

Filed under: java Leave a comment
Comments (1) Trackbacks (0)
  1. You had me at “Direct Persistence Layer”..I do try to tract this….


Leave a comment


No trackbacks yet.