Dealing with program recordings

[I will be at GUADEC from tomorrow evening. See you in Manchester!]

In the previous post I talked about the technology behind UndoDB, in this last post of the series I will talk about replaying recorded programs.

Debugging programs with the ability to go backwards is really useful, but what about automated tests? How about bugs that happen only for a specific user but you cannot reproduce?

Saving recordings helps here. Using Live Recorder you can save the complete status and execution history of a process (including debug symbols) and debug it using UndoDB on another machine with the same architecture.
Live Recorder can be used as a standalone program (live-record -o recording.undo my-program) or as a library.
Using it as a library allows the programmer to start/stop recording when they want, for instance recording only the execution of a test but not the test set up.

The API is quite simple to use, for instance, to record a program until its execution ends, you can just do:

// Start recording.
undolr_start(NULL);
// Automatically save when the program exits.
// This saves also if the program crashes or terminates
// due to an uncaught signal.
undolr_save_on_termination("/foo/bar/recording.undo");

This is particularly useful for tests, in particular tests which fail due to a rarely occurring bug.
You can record your test execution and, if the test fails, save the recording for later debugging. Otherwise, you can just discard the recording.

See the Undo website if you want to try UndoDB and Live Recorder.

How UndoDB works

In the previous post I described what UndoDB is, now I will describe how the technology works.

The naïve approach to record the execution of a program is to record everything that happens, that is the effects of every single machine instruction. This is what gdb does to offer reversible debugging.
Unfortunately this is so slow that it’s unusable even for trivial programs (this is why most people don’t know gdb already has reversible debugging).

UndoDB takes a different approach. It distinguishes which operations are deterministic and which aren’t. For instance, “2+2” will always produce “4”, so there’s no need to save the result of this instruction.
On the other hand, a small proportion of what a program does is non-deterministic, so the effect of these operations must be saved in memory in what we call the event log.
Some non-deterministic operations are:

  • System calls. For instance, for a read we need to save what was read from a file, for a write we only need to save the return code as the content of the buffer is already in the program memory..
  • Signals.
  • Thread switches.
  • Access to shared memory.
  • Non-deterministic assembly instructions. For instance, on x86, RDTSC returns the CPU’s time stamp counter which counts the number of cycles since reset.

Snapshots
Snapshots of a program at different times in execution history.

When you want to run a program under UndoDB, the deterministic program instructions are executed as normal, while non-deterministic ones are executed but their result is also saved in the event log.
Periodically, snapshots of the program are taken. These are complete copies of the current state of the program, but since they are created using the Linux copy-on-write mechanism the impact on the sytem resources is minimal.

Searching
Moving backwards in execution history.

Later, if you need to go back in time, you start replaying the application from a previous snapshot, re-executing only the deterministic operations. The results of non-deterministic operations are synthesised based on what is stored in the event log.

In the next post I will talk about saving program recordings to replay them later.

By the way, we are hiring software engineers in Cambridge (UK). If you are interested, contact me.

What I do at Undo

In October, I started working for Undo and, now that I understand our technology better, it’s time to explain what I do.

Undo produces a (closed source) technology which allows to record, rewind and replay Linux programs (on x86 and ARM).
One of our products using this technology is UndoDB, a debugger built on top of gdb which allows you to do everything you do with gdb, but also to go back in time.

Example of reverse commands in UndoDB

Before joining Undo, I mainly used printfs or similar to debug my code. The main reason is that, when you read logs, it’s easy to jump between different parts of the log and proceed backwards from the point where the bug became apparent to the point where the bug was caused.
On the other hand, with standard gdb, once the bug happens it’s not possible to know what was going on earlier.

UndoDB fixes this problem by allowing the user to go backwards. Every command which moves the program forward has an equivalent reverse command. For instance, next has reverse-next (or rn for short) which moves to the previous line of code, continue has reverse-continue (or rc) which executes backwards until a breakpoint is hit or the start of the program is reached, and so on.

I should point out that UndoDB is not just some kind of fancy logging. You can really jump around in execution history, explore variables and registers at different points in time, bookmark interesting points in history, etc. What you cannot do is change history.

Finally, UndoDB is also useful when gdb wouldn’t be able to show any information. Have you ever seen anything like this in gdb?

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(udb) backtrace
#0  0x0000000000000000 in ?? ()
#1  0x0000000000000000 in ?? ()

Not very useful, is it?
With UndoDB you can step backwards until you reach a point before your program messed up its state:

(udb) backtrace
#0  foo () at program.c:75
#1  0x0000000000400557 in bar (n=42) at program.c:120
#2  0x000000000040056a in main () at program.c:420

In the next post I will give some details on how the technology actually works.

By the way, we are hiring software engineers in Cambridge (UK). If you are interested, contact me.