ReReplay: Replay production traffic
Ever wonder if your change is going to slow down the website? "Performance impact will probably be negligible", you might say. Well, now you can know for sure. First, take a slice of your production logs and transform it into a particular format. Then run ReReplay with this snapshot to generate a baseline. Next apply your change, rerun ReReplay (sorry, too many re's?) and measure the difference!
Unfortunately, there are a number of "fill in the blanks" with this library. For one, it knows nothing about log processing and the like -- you have to fetch that yourself and feed that into ReReplay, in a format like so: [[time, http method, url], ...] where time is the time from the start of the run (i.e. should start at 0), http method is either :get or :head, and url is a full path, including domain name.
For another, what does it mean for a site to be "slower"? Generally speaking, this means the length of time that the page takes to download has increased. But what do you care about -- the average page load speed? The slowest 5% of pages? That's up to you.
You track this sort of thing using monitors -- request monitors to be precise. These run before and/or after every request and allows you to track things like when the request was scheduled to start, when it actually started, when it finished, what the http response code was, etc. All you need is a class with a #start or #finish method that takes a single request parameter. Here's a short example:
1 2 3 4 5 6 7 8 9 | require "rereplay" require "rereplay/monitors" # this is for the built-in monitors input = [[0.0, :get, "http://www.google.com/"], [0.5, :get, "http://www.amazon.com/"]] mon = ReReplay::RequestTimeMonitor.new r = ReReplay::Runner.new(input) r.request_monitors << mon r.run # now wait a second puts mon.results.inspect |
And that's how we get information about our run. There are also periodic monitors that simply run periodically, independent of any requests (for monitoring memory if you're executing against localhost, for example).
ReReplay deliberately takes a hands-off approach -- you will have to write some code to prep your log files for entry, as well as for extracting meaningful data. However, I will continue working on additional monitors as examples and for general use.
Check out the code!
Let me know what you think, or if you have any ideas I should implement. I do want to write a script that will do an actual statistical analysis between two runs, but based on the limited Ruby stats libraries, it looks like it'll have to hook into R. Not sure if that's a bug or a feature, though