Using clang-format only on newly written code

At Undo, the company where I work, we have a quite big C codebase with a not very consistent style.
To improve things, we decided to use the clang-format tool (part of the LLVM projects) to enforce a consistent style for new and refactored code.
We don’t want to change all the existing code to avoid a massive and confusing change, and we don’t want spurious unrelated changes when somebody modifies a file.

To achieve this, I wrote a couple of scripts which, using clang-format and clang-format-diff, only modify the formatting of the code you are about to commit.

The most interesting part is, I think, the pre-commit hook which suggests fixes before your code is committed:

git pre-commit hook to apply clang-format

This code is now available in the clang-format-hooks repository on GitHub.

Karton 1.0

After more than a year using Karton regularly, I released version 1.0 with the last few features I was missing for my use case.

Karton is a tool which can transparently run Linux programs on a different Linux distribution, on macOS, or on a different architecture.
By using Docker, Karton manages semi-persistent containers with easy to use automatic folder sharing and lots of small details which make the experience smooth. You shouldn’t notice you are using command line programs from a different OS or distro.

Karton logo

If you are interested, check the Karton website.

Dealing with program recordings

[I will be at GUADEC from tomorrow evening. See you in Manchester!]

In the previous post I talked about the technology behind UndoDB, in this last post of the series I will talk about replaying recorded programs.

Debugging programs with the ability to go backwards is really useful, but what about automated tests? How about bugs that happen only for a specific user but you cannot reproduce?

Saving recordings helps here. Using Live Recorder you can save the complete status and execution history of a process (including debug symbols) and debug it using UndoDB on another machine with the same architecture.
Live Recorder can be used as a standalone program (live-record -o recording.undo my-program) or as a library.
Using it as a library allows the programmer to start/stop recording when they want, for instance recording only the execution of a test but not the test set up.

The API is quite simple to use, for instance, to record a program until its execution ends, you can just do:

// Start recording.
undolr_start(NULL);
// Automatically save when the program exits.
// This saves also if the program crashes or terminates
// due to an uncaught signal.
undolr_save_on_termination("/foo/bar/recording.undo");

This is particularly useful for tests, in particular tests which fail due to a rarely occurring bug.
You can record your test execution and, if the test fails, save the recording for later debugging. Otherwise, you can just discard the recording.

See the Undo website if you want to try UndoDB and Live Recorder.

How UndoDB works

In the previous post I described what UndoDB is, now I will describe how the technology works.

The naïve approach to record the execution of a program is to record everything that happens, that is the effects of every single machine instruction. This is what gdb does to offer reversible debugging.
Unfortunately this is so slow that it’s unusable even for trivial programs (this is why most people don’t know gdb already has reversible debugging).

UndoDB takes a different approach. It distinguishes which operations are deterministic and which aren’t. For instance, “2+2” will always produce “4”, so there’s no need to save the result of this instruction.
On the other hand, a small proportion of what a program does is non-deterministic, so the effect of these operations must be saved in memory in what we call the event log.
Some non-deterministic operations are:

  • System calls. For instance, for a read we need to save what was read from a file, for a write we only need to save the return code as the content of the buffer is already in the program memory..
  • Signals.
  • Thread switches.
  • Access to shared memory.
  • Non-deterministic assembly instructions. For instance, on x86, RDTSC returns the CPU’s time stamp counter which counts the number of cycles since reset.

Snapshots
Snapshots of a program at different times in execution history.

When you want to run a program under UndoDB, the deterministic program instructions are executed as normal, while non-deterministic ones are executed but their result is also saved in the event log.
Periodically, snapshots of the program are taken. These are complete copies of the current state of the program, but since they are created using the Linux copy-on-write mechanism the impact on the sytem resources is minimal.

Searching
Moving backwards in execution history.

Later, if you need to go back in time, you start replaying the application from a previous snapshot, re-executing only the deterministic operations. The results of non-deterministic operations are synthesised based on what is stored in the event log.

In the next post I will talk about saving program recordings to replay them later.

By the way, we are hiring software engineers in Cambridge (UK). If you are interested, contact me.

What I do at Undo

In October, I started working for Undo and, now that I understand our technology better, it’s time to explain what I do.

Undo produces a (closed source) technology which allows to record, rewind and replay Linux programs (on x86 and ARM).
One of our products using this technology is UndoDB, a debugger built on top of gdb which allows you to do everything you do with gdb, but also to go back in time.

Example of reverse commands in UndoDB

Before joining Undo, I mainly used printfs or similar to debug my code. The main reason is that, when you read logs, it’s easy to jump between different parts of the log and proceed backwards from the point where the bug became apparent to the point where the bug was caused.
On the other hand, with standard gdb, once the bug happens it’s not possible to know what was going on earlier.

UndoDB fixes this problem by allowing the user to go backwards. Every command which moves the program forward has an equivalent reverse command. For instance, next has reverse-next (or rn for short) which moves to the previous line of code, continue has reverse-continue (or rc) which executes backwards until a breakpoint is hit or the start of the program is reached, and so on.

I should point out that UndoDB is not just some kind of fancy logging. You can really jump around in execution history, explore variables and registers at different points in time, bookmark interesting points in history, etc. What you cannot do is change history.

Finally, UndoDB is also useful when gdb wouldn’t be able to show any information. Have you ever seen anything like this in gdb?

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(udb) backtrace
#0  0x0000000000000000 in ?? ()
#1  0x0000000000000000 in ?? ()

Not very useful, is it?
With UndoDB you can step backwards until you reach a point before your program messed up its state:

(udb) backtrace
#0  foo () at program.c:75
#1  0x0000000000400557 in bar (n=42) at program.c:120
#2  0x000000000040056a in main () at program.c:420

In the next post I will give some details on how the technology actually works.

By the way, we are hiring software engineers in Cambridge (UK). If you are interested, contact me.

reTrumplation, a Twitter bot experiment

A few years ago, somebody introduced me to Translation Party, a website which automatically translates a sentence back and forth until further translations produce the same English text. The results are mostly funny nonsense.

Recently at work we were talking about automatic translations, so I thought it could be funny to use the same principle for a Twitter bot which works on Donald Trump’s many tweets. The result is @reTrumplation.

@reTrumplation, first example

@reTrumplation, second example

Karton – running Linux programs on macOS, a different Linux distro, or a different architecture

At work I use Linux, but my personal laptop is a Mac (due to my previous job developing for macOS).

A few months ago, I decided I want to be able to do some work from home without carrying my work laptop home every day.
I considered using a VM, but I don’t like the experience of mixing two operating systems. On Mac I want to use the native key bindings and applications, not a confusing mix of Linux and Mac UI applications.

In the end, I wrote Karton, a program which, using Docker, manages semi-persistent containers with easy to use automatic folder sharing and lots of small details which make the experience smooth. You shouldn’t notice you are using command line programs from a different OS.

Karton logo

After defining which distro and packages you need (this is called an “image”), you can just execute Linux programs by prefixing them with karton run IMAGE-NAME LINUX-COMMAND. For example:

$ uname -a # Running on macOS.
Darwin my-hostname 16.4.0 Darwin Kernel Version 16.4.0 [...]

$ # Run the compiler in the Ubuntu image we use for work
$ # (which we called "ubuntu-work"):
$ karton run ubuntu-work gcc -o test_linux test.c

$ # Verify that the program is actually a Linux one.
$ # The files are shared and available both on your
$ # system and in the image:
$ file test_linux
test_linux: ELF 64-bit LSB executable, x86-64, [...]

Karton runs on Linux as well, so you can do development targeting a different distro or a different architecture (for instance ARMv7 while using an x86_64 computer).

For more examples, installation instructions, etc. see the Karton website.