Reversible Computers

From Dan Shearer CV
Revision as of 13:28, 23 January 2023 by Dan (talk | contribs)

Reversible Computing and Reversible Debugging are amazing and useful applications of Time Shifting, aiming at the massive problem of software unreliability.

It is possible to have a network of running computers - say, with Android, Windows or Linux - and to stop them all and reverse back to any point in time. For example, reversing to a point just before a catastrophic error occurred, so we can watch carefully. And repeat it if we want to, again and again, any number of times. This is so much fun!

Difficult problems of unreliable software and unexplainable crashes happen so often that they affect societies, because people's expectations and habits change in response to bad software. I believe reversibility could reduce this by making debugging much easier to do. While I personally use and recommend reversibility technologies, they are not ubiquitous, so most people are not convinced.

In 2023 the plain facts are:

  1. Full simulation reversibility at high speed is possible - including even running and operating system backwards, even unbooting. This is well-understood computer science.
  2. The ability to rewind and replay is very powerful for debugging, especially for complex stacks, and rare or non-deterministic bugs.
  3. Very few developers are interested in these features! Until recently they appeared to be commercially unviable, despite offering amazing visibility into the most complicated problems. And despite some implementations being very easy to use.

I believe my excited comments from 2005 still stand, when I wrote on the GDB developers list:

Reversibility is the biggest advance in debugging since source code debugging

The world did not agree with me. Wind River no longer advertises Simics' reverse execution feature so I presume it has been dropped, and VMware Player dropped their reversibility in 2011. Jakob Engblom's Comprehensive Reversibility List is maintained by my former Simics colleague, and it has very few entries since the dawn of commercial source code debugging.

Exciting update in 2022: I missed the fact that Intel made a binary-only public release of Simics in May 2021. Still no mention of reversibility, but still, I look forward to trying it. How on earth do people who perform advanced and vital work such as KASPER manage without reverse execution? The wealthy toolmaking companies of the world know reversibility of arbitary electronics devices can be can be done at speed and scale, down to microcode resolution where necessary. Intel, I'm looking at you first.

But maybe the world is changing. In the same way that Privacy succeeded where Security had failed to make the pitch to decisionmakers, perhaps now arguments based on CyberSecurity will succeed where the argument about Complexity just isn't working.

I find hope in 2022 due to these well-maintained open source solutions:

  • Eclipse has support for driving reversible targets, including full-system targets via the GDB MI interface
  • There are several reversible user mode targets, notably rr
  • DMTCP is a multi-thread user mode trace/replay solution
  • GDB does not need to be driven: it has a well-tested implementation of Record/Replay and Reversible Debugging. GDB can drive a reversible target itself.

Besides these, closed-source Visual Studio has a feature called Time Travel, Simulics is a commercial emulator, and UndoDB is a user-space record/replay solution. Perhaps one of these will spark a revolution. If you're a student, try them out!

I have an ongoing research project to understand what is needed to encourage developers to use these tools for more deterministic debugging and system testing. I put a lot of effort into this topic. Even if the world never uses reversibility at scale, we still need better solutions for debugging complex stacks.