occasionally useful ruby, ubuntu, etc

2Jan/113

Javascript Serialization

Note: RSS subscribers, I've (finally) enabled summary-only feed entries, so you will have to click through to see the full post.


Have you ever wanted to serialize entire Javascript objects? When I say objects here, I don't mean the simple hashes that you get by saying {foo: "bar"} -- I mean instances of your User, Product, Query, etc prototypal Javascript instances. JSON.stringify gets you part of the way there, but notice this doesn't work:

function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();

JSON.parse(JSON.stringify(p)).greet(); // throws Object #<an Object> has no method 'greet' in Chrome
view raw 1_greet.js This Gist brought to you by GitHub.
var a = {"foo": "bar"};
var b = [a, a];
b[0] == b[1] // true
var c = JSON.parse(JSON.stringify(b));
c[0] == c[1] // false
// Step 1: setup
function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();
p.name = "Bob";

var object = JSON.parse(JSON.stringify(p));
try {
  object.greet(); // still throws exception
} catch(e){
  console.log("error calling #greet, %o", e);
}

// Step 2: change __proto__
object.__proto__ = Person.prototype;

// Step 3: verify
object.greet(); // now it works!
view raw 3_proto.js This Gist brought to you by GitHub.
// Step 1: same as before

// Step 2: copy properties
var tmp = function(){};
tmp.prototype = Person.prototype;
var t = new tmp;
for (k in object) {
  t[k] = object[k];
}
object = t;
// not that it matters, but object.constructor == Person now. A good thing, so you can reserialize easily.

// Step 3: verify
object.greet(); // now it works!
object instanceof Person // this is also true

Inconvenient, to be sure -- and to make matters worse, there don't appear to be, well, any libraries out there that handles intelligent serialization of Javascript objects. So that's what I'm writing now (but it's not ready yet). Look for it soon though!

Other problems

Object references aren't preserved:

function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();

JSON.parse(JSON.stringify(p)).greet(); // throws Object #<an Object> has no method 'greet' in Chrome
view raw 1_greet.js This Gist brought to you by GitHub.
var a = {"foo": "bar"};
var b = [a, a];
b[0] == b[1] // true
var c = JSON.parse(JSON.stringify(b));
c[0] == c[1] // false
// Step 1: setup
function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();
p.name = "Bob";

var object = JSON.parse(JSON.stringify(p));
try {
  object.greet(); // still throws exception
} catch(e){
  console.log("error calling #greet, %o", e);
}

// Step 2: change __proto__
object.__proto__ = Person.prototype;

// Step 3: verify
object.greet(); // now it works!
view raw 3_proto.js This Gist brought to you by GitHub.
// Step 1: same as before

// Step 2: copy properties
var tmp = function(){};
tmp.prototype = Person.prototype;
var t = new tmp;
for (k in object) {
  t[k] = object[k];
}
object = t;
// not that it matters, but object.constructor == Person now. A good thing, so you can reserialize easily.

// Step 3: verify
object.greet(); // now it works!
object instanceof Person // this is also true

Lastly, function serialization isn't supported in JSON, but...I don't think that's really a problem. It's not terribly common to attach functions to instances of objects, anyway.

Solutions, people

Prototype preservation

So what can we do about the first problem? In all browsers but IE (of course), a special property called __proto__ exists on all objects [1], which is a (writable!) pointer back to its original prototype object. Changing an instance's prototype reference then, is simple:

function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();

JSON.parse(JSON.stringify(p)).greet(); // throws Object #<an Object> has no method 'greet' in Chrome
view raw 1_greet.js This Gist brought to you by GitHub.
var a = {"foo": "bar"};
var b = [a, a];
b[0] == b[1] // true
var c = JSON.parse(JSON.stringify(b));
c[0] == c[1] // false
// Step 1: setup
function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();
p.name = "Bob";

var object = JSON.parse(JSON.stringify(p));
try {
  object.greet(); // still throws exception
} catch(e){
  console.log("error calling #greet, %o", e);
}

// Step 2: change __proto__
object.__proto__ = Person.prototype;

// Step 3: verify
object.greet(); // now it works!
view raw 3_proto.js This Gist brought to you by GitHub.
// Step 1: same as before

// Step 2: copy properties
var tmp = function(){};
tmp.prototype = Person.prototype;
var t = new tmp;
for (k in object) {
  t[k] = object[k];
}
object = t;
// not that it matters, but object.constructor == Person now. A good thing, so you can reserialize easily.

// Step 3: verify
object.greet(); // now it works!
object instanceof Person // this is also true

For IE (IE6-IE9), it's a little messier:

function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();

JSON.parse(JSON.stringify(p)).greet(); // throws Object #<an Object> has no method 'greet' in Chrome
view raw 1_greet.js This Gist brought to you by GitHub.
var a = {"foo": "bar"};
var b = [a, a];
b[0] == b[1] // true
var c = JSON.parse(JSON.stringify(b));
c[0] == c[1] // false
// Step 1: setup
function Person(){}
Person.prototype.greet = function(){ alert("Hi!"); };

var p = new Person();
p.name = "Bob";

var object = JSON.parse(JSON.stringify(p));
try {
  object.greet(); // still throws exception
} catch(e){
  console.log("error calling #greet, %o", e);
}

// Step 2: change __proto__
object.__proto__ = Person.prototype;

// Step 3: verify
object.greet(); // now it works!
view raw 3_proto.js This Gist brought to you by GitHub.
// Step 1: same as before

// Step 2: copy properties
var tmp = function(){};
tmp.prototype = Person.prototype;
var t = new tmp;
for (k in object) {
  t[k] = object[k];
}
object = t;
// not that it matters, but object.constructor == Person now. A good thing, so you can reserialize easily.

// Step 3: verify
object.greet(); // now it works!
object instanceof Person // this is also true

Prototype resolution

JSON.stringify doesn't preserve anything about objects it stringifies except its properties [2]. So how do we know what the prototype's name is? That is, how do we know that the serialized object should use the Person.prototype object when deserialized?

There's really only one solution, but it's somewhat limited: #constructor. If I have a Person instance p, I can use p.constructor to get the function that built p, and p.constructor.name to get that function's name. Only one problem: p.constructor is writable . I can go p.constructor = "cat" if I wanted to. No, it doesn't make any sense, but you can reassign the constructor to whatever you want. So, big caveat: don't write to #constructor if you want serialization to be possible! At least p.constructor.name isn't writable.

Object reference preservation

This has to be handled in the serialization and deserialization steps. When serializing an object, you descend through the entire tree reachable from that object and serialize those into the output as well. If you encounter an object you've already serialized before, insert a reference into the data stream instead of the object itself. Assuming the original object has a property (that you give it) of __id: 1 or something, then in the data stream you can insert {"__ref": 1} instead of the object. Obviously you have to be looking for this when you deserialize (JSON.parse doesn't care about __ref or __id), and you can't resolve references until after you've traversed the entire deserialized object once (otherwise the object with __id: 1 might not exist yet).

This solves one other little problem for us though: circular references. If A has a reference to B, and B has a reference to A, and you serialize A, B's reference to A will get replaced with __ref: 1 instead of trying to serialize A again.

Deeper discussion: how do you know whether an element was already serialized (and what its id is)? I can think of a few ways: keep a list, keep a map, or modify the input.

Map

Keeping a map is good (a map from an object's hash to the object), but requires you to implement a #hash() method on every object you intend to serialize -- not easy for client code.

List

Keeping a list of all the elements you've serialized is good because you don't have to do anything special (like with the hash); however, lookups are O(n) runtime, and the number of times you have to lookup elements is O(n), meaning the overall algorithm is O(n²), which is BAD -- serialization should be O(n). This leaves us with one option: modifying the input.

Modify the input

So, we can add a property, like say...__id to the object being serialized to help identify it and mark it as serialized. One caveat: when a programmer serializes an object, they don't expect the original object to be modified. Easily fixed -- for every object serialized, push it onto a list, and when serialization is done, iterate over the list and remove things that were added during serialization. Everything is still O(n). Only drawback is there is a possibility of a property name collision (if __id was already defined); in practice, I wouldn't use __id, but rather __my_serialization_library_id or something.

Winner: Modify the Input

Schema changes

Perhaps schema isn't the right word, but...what happens when you serialize a Person object, and deserialize it a month later after the Person class has backwards-incompatible changes? If your original person object had, say, a "name" property, but the new Person class expects "firstName" and "lastName" properties, what do you do? This is actually less complicated than the other problems encountered in serialization: store a version number on your person object (in a way that actually gets saved to the data stream), and when you deserialize it, offer the developers a mechanism for migrating between old and new Person objects, something along the lines of "Oh, this serialized Person object was constructed using Person v1, but the Person class in memory now is v3; run migrations to get the object from v1->v2->v3". This won't be needed too often, but backwards-compatibility is, of course, important.

Conclusion

It's possible and feasible to perform good[3] serialization in Javascript, with the following (minor) limitations:

  • Can't write to an object's #constructor property.
  • Can't serialize functions

The first is plausible to work around, and the second hopefully shouldn't arise terribly often -- and when it does, hopefully you can use pre-serialization and post-serialization callbacks to save the function in way that is supported (i.e. as a string referring to a method).

Thanks for reading! If you DO know of any cool JS serialization libraries out there, or if I missed something, let me know in the comments.

  1. [1] Actually, __proto__ is part of the Javascript standard, and does exist in IE. However, it's not exposed to client javascript in IE, so it may as well not exist.
  2. [2] Any property that passes true for hasOwnProperty, at least
  3. [3] For lack of a better term...
Comments (3) Trackbacks (1)
  1. How’s it going?
    Expected usefulness++


Leave a comment