Difference between revisions of "Reversible Computers"

From Dan Shearer CV
 
 
(40 intermediate revisions by the same user not shown)
Line 1: Line 1:
Reversible Computing and Reversible Debugging are useful applications of [[:Category:Time Shifting|Time Shifting]]. Programs or entire simulated systems are run with regular snapshots taken of full state. To reverse, the snapshots are replayed in the reverse order. Many ways of implementing this have been tried, and I spent two years of my life as Open Source Manager for a company that had one of the best. But in 2021 the plain facts are:
+
Reversible Computing and Reversible Debugging are amazing and useful applications of [[:Category:Time Shifting|Time Shifting]] via virtualisation, aiming at the massive problem of software unreliability. I believe my excited comments from 2005 still stand, when I [https://sourceware.org/legacy-ml/gdb/2005-05/msg00162.html wrote on the GDB developers list]:
   
 
<blockquote><big>'''Reversibility is the biggest advance in debugging since source code debugging'''</big> </blockquote>
# Full simulation reversibility at high speed is possible - including even running and operating system backwards, even unbooting!
 
# The ability to rewind and replay is very powerful for debugging, especially for complex stacks, and rare or non-deterministics bugs.
 
# Very few developers are interested in these features, and until recently they appeared to be commercially unviable. However, see previous point... just about every stack is a complex stack these days, and the overall reliability of software is getting worse by many measures. So we should all be using reversible debugging, right? Apparently not yet.
 
   
  +
In 2024, reversibility has come both a long way and not far at all. I am still very interested in it though.
Look how excited I was in 2005 when I [https://sourceware.org/legacy-ml/gdb/2005-05/msg00162.html wrote on the GDB developers list]:
 
   
  +
__TOC__
:''I think reversibility is the biggest advance in debugging since source code debugging''
 
   
  +
== What is reversibility? ==
The world did not agree with me. Wind River does not advertise [[:wikipedia:Simics|Simics]]' reverse execution feature so I presume it has been dropped, and VMware Player dropped theirs in 2011. [https://jakob.engbloms.se/archives/1564 Jakob Engblom's Comprehensive Reversibility List] is maintained by my former Simics colleague, and it has very few entries since the dawn of commercial source code debugging.
 
   
  +
It is possible to have a network of running computers - say, Android, Windows, Linux running on miscellaneous hardware - and then to stop them all and reverse back to any point in time. For example, reversing to a point just before a catastrophic error occurred, so we can watch carefully. And repeat it if we want to, again and again, any number of times. Imagine installing an operating system you know nothing about, starting an application... and then running the process in reverse as it unboots, scrolling up the screen until it switches off. This is true reversibility.
But maybe the world is changing. In the same way that Privacy succeeded where Security had failed to make the pitch to decisionmakers, perhaps now CyberSecurity will succeed where the point about Complexity just isn't working.
 
   
  +
Unreliable software is increasing, and it affects whole societies. A large part of the problem is due to massive complexity in software, and I have seen reversibility reduce this by making debugging much easier to do.
In 2021 there are some well-maintained open source solutions:
 
   
  +
In 2024 the plain facts are:
* Eclipse has support for driving reversible targets, including full-system targets via the GDB MI interface
 
* There are several reversible ''user mode'' targets, notably [[:wikipedia:Rr (debugging)|rr]]
 
* [https://github.com/dmtcp/dmtcp DMTCP] is a multi-thread user mode trace/replay solution
 
* GDB does not need to be driven: it has a well-tested implementation of [https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html#Process-Record-and-Replay Record/Replay and Reversible Debugging]. GDB can drive a reversible target itself.
 
   
 
# Full simulation reversibility at high speed is possible - including even running and operating system backwards, even unbooting. This is well-understood computer science.
Besides these, Visual Studio has a feature called Time Travel, Simulics is a commercial emulator, and UndoDB is a user-space record/replay solution.
 
 
# The ability to rewind and replay is very powerful for debugging, especially for complex stacks, and rare or non-deterministic bugs.
  +
# ''Very few developers seem interested in these features''! Until recently they appeared to be commercially unviable, despite offering amazing visibility into very complicated problems. And despite some implementations being very easy to use.
  +
 
Evidently, the world did not agree with me: Wind River no longer advertises [[:wikipedia:Simics|Simics]]' reverse execution feature and I'm sure it has been dropped. VMware Player dropped their reversibility in 2011. [https://jakob.engbloms.se/archives/1564 Jakob Engblom's Comprehensive Reversibility List] is maintained by my former Simics colleague, and it has very few entries since the dawn of commercial source code debugging.
  +
  +
But why?
  +
  +
How on earth do people who perform extremely difficult and vital work such as [https://download.vusec.net/papers/kasper_ndss22.pdf KASPER] manage without reverse execution? The wealthy toolmaking companies of the world know reversibility of arbitary electronics devices can be can be done at speed and scale, down to microcode resolution where necessary. Intel, I'm looking at you first because you killed Simics Hindsight.
  +
  +
== Hope in 2024 ==
  +
 
Maybe the world is changing. In the same way that Privacy succeeded where Security had failed to make the pitch to decisionmakers, perhaps now arguments based on ''CyberSecurity'' or even more likely ''AI'' will succeed where the argument about ''Complexity'' just isn't working.
  +
 
I find some hope in 2024 due to these well-maintained open source solutions:
  +
 
* Eclipse has support for driving reversible targets, including full-system targets via the '''GDB MI''' interface
 
* There are several reversible ''user mode'' targets, notably '''[[:wikipedia:Rr (debugging)|rr]]'''
 
* '''[https://github.com/dmtcp/dmtcp DMTCP]''' is a multi-thread user mode trace/replay solution
 
* GDB does not need to be driven: it has a well-tested implementation of '''[https://sourceware.org/gdb/onlinedocs/gdb/Process-Record-and-Replay.html#Process-Record-and-Replay Record/Replay and Reversible Debugging]'''. GDB can drive a reversible target itself.
  +
 
Besides these, closed-source '''Visual Studio''' has a feature called Time Travel, '''Simulics''' is a commercial emulator, and '''UndoDB''' is a user-space record/replay solution. Perhaps one of these will spark a revolution. If you're a student, try them out!
  +
  +
== AI and Reversibility ==
  +
  +
The combination of Boltzmann machine-style AI and reversibility is entirely new as far as I know, and not expected by anyone I have met. It's still early days, but here is the story as I understand it...
  +
  +
The risks of AI are regarded as difficult to estimate, but reversibility may be a way to address this. A 2022 mathematical paper on [https://arxiv.org/pdf/2205.01171.pdf Reversing an Imperative Concurrent Programming Language] from the University of Leicester demonstrates this diffficult problem is solvable. The same authors wrote [https://www.cs.unibo.it/~lanese/newpublications/fulltext/itprofessional2021-robots.pdf Reversible Execution for Robustness in Embodied AI and Industrial Robots]. This paper says:
  +
  +
<blockquote><big>'''We thus demonstrate how a traditional AI-based planning approach is enriched by an ''underlying reversible execution model'' that relies on the embodiment of the robot system'''</big> </blockquote>
  +
  +
== Where next? ==
  +
 
I have an ongoing research project to understand what is needed to encourage developers to use these tools for more deterministic debugging and system testing. I have put a lot of effort into this topic because I truly believe it is a partial answer to difficult problems. Even if the world never uses reversibility at scale, we still need better solutions for debugging complex stacks.
  +
  +
If I was starting to build practical reversibility for production use in 2024, I would probably start prototyping with [https://www.qemu.org/docs/master/system/replay.html the QEMU Replay System]. This is accurate but a bit clunky and slow, but rapidly allows different techniques to be tried.
  +
  +
And then, there is theoretical progress. Back in the day, this was all about engineering, but now there is some proper thought going into a theory of reversibility.
  +
  +
The annual Conference on Reversible Computation, most recently held in [https://reversible-computation-2022.github.io/ 2022] and [http://www.wikicfp.com/cfp/servlet/event.showcfp?copyownerid=90704&eventid=170106 2023] demonstrates a much more general view. This conference does include practical reversible computing as I describe above, but also explores the mathematics of reversibility including in the context of quantum computing. I can't even begin to guess how quantum reversibility works at high resolution but apparently it is regarded as feasible.
  +
  +
Perhaps the way reversibility will come about is by applying a mathematical rather than a technology approach first. If we can prove reversibility for very complicated cases such as machine learning/AI (and that is the level of complexity addressed by the paper from University of Leicester I listed above) then that could be what the world wants. Perhaps then it will somehow be obvious that the same techniques should be applied to software development.
   
I have an ongoing research project to understand what is needed to encourage developers to use these tools for more deterministic debugging and system testing. I put two years of my life into this and it still seems like a very good idea to me. Will the world ever agree? Even if it does not, we still need much better solutions for debugging complex stacks.
 
   
 
[[Category: Time Shifting]]
 
[[Category: Time Shifting]]

Latest revision as of 18:08, 25 October 2024

Reversible Computing and Reversible Debugging are amazing and useful applications of Time Shifting via virtualisation, aiming at the massive problem of software unreliability. I believe my excited comments from 2005 still stand, when I wrote on the GDB developers list:

Reversibility is the biggest advance in debugging since source code debugging

In 2024, reversibility has come both a long way and not far at all. I am still very interested in it though.

What is reversibility?

It is possible to have a network of running computers - say, Android, Windows, Linux running on miscellaneous hardware - and then to stop them all and reverse back to any point in time. For example, reversing to a point just before a catastrophic error occurred, so we can watch carefully. And repeat it if we want to, again and again, any number of times. Imagine installing an operating system you know nothing about, starting an application... and then running the process in reverse as it unboots, scrolling up the screen until it switches off. This is true reversibility.

Unreliable software is increasing, and it affects whole societies. A large part of the problem is due to massive complexity in software, and I have seen reversibility reduce this by making debugging much easier to do.

In 2024 the plain facts are:

  1. Full simulation reversibility at high speed is possible - including even running and operating system backwards, even unbooting. This is well-understood computer science.
  2. The ability to rewind and replay is very powerful for debugging, especially for complex stacks, and rare or non-deterministic bugs.
  3. Very few developers seem interested in these features! Until recently they appeared to be commercially unviable, despite offering amazing visibility into very complicated problems. And despite some implementations being very easy to use.

Evidently, the world did not agree with me: Wind River no longer advertises Simics' reverse execution feature and I'm sure it has been dropped. VMware Player dropped their reversibility in 2011. Jakob Engblom's Comprehensive Reversibility List is maintained by my former Simics colleague, and it has very few entries since the dawn of commercial source code debugging.

But why?

How on earth do people who perform extremely difficult and vital work such as KASPER manage without reverse execution? The wealthy toolmaking companies of the world know reversibility of arbitary electronics devices can be can be done at speed and scale, down to microcode resolution where necessary. Intel, I'm looking at you first because you killed Simics Hindsight.

Hope in 2024

Maybe the world is changing. In the same way that Privacy succeeded where Security had failed to make the pitch to decisionmakers, perhaps now arguments based on CyberSecurity or even more likely AI will succeed where the argument about Complexity just isn't working.

I find some hope in 2024 due to these well-maintained open source solutions:

  • Eclipse has support for driving reversible targets, including full-system targets via the GDB MI interface
  • There are several reversible user mode targets, notably rr
  • DMTCP is a multi-thread user mode trace/replay solution
  • GDB does not need to be driven: it has a well-tested implementation of Record/Replay and Reversible Debugging. GDB can drive a reversible target itself.

Besides these, closed-source Visual Studio has a feature called Time Travel, Simulics is a commercial emulator, and UndoDB is a user-space record/replay solution. Perhaps one of these will spark a revolution. If you're a student, try them out!

AI and Reversibility

The combination of Boltzmann machine-style AI and reversibility is entirely new as far as I know, and not expected by anyone I have met. It's still early days, but here is the story as I understand it...

The risks of AI are regarded as difficult to estimate, but reversibility may be a way to address this. A 2022 mathematical paper on Reversing an Imperative Concurrent Programming Language from the University of Leicester demonstrates this diffficult problem is solvable. The same authors wrote Reversible Execution for Robustness in Embodied AI and Industrial Robots. This paper says:

We thus demonstrate how a traditional AI-based planning approach is enriched by an underlying reversible execution model that relies on the embodiment of the robot system

Where next?

I have an ongoing research project to understand what is needed to encourage developers to use these tools for more deterministic debugging and system testing. I have put a lot of effort into this topic because I truly believe it is a partial answer to difficult problems. Even if the world never uses reversibility at scale, we still need better solutions for debugging complex stacks.

If I was starting to build practical reversibility for production use in 2024, I would probably start prototyping with the QEMU Replay System. This is accurate but a bit clunky and slow, but rapidly allows different techniques to be tried.

And then, there is theoretical progress. Back in the day, this was all about engineering, but now there is some proper thought going into a theory of reversibility.

The annual Conference on Reversible Computation, most recently held in 2022 and 2023 demonstrates a much more general view. This conference does include practical reversible computing as I describe above, but also explores the mathematics of reversibility including in the context of quantum computing. I can't even begin to guess how quantum reversibility works at high resolution but apparently it is regarded as feasible.

Perhaps the way reversibility will come about is by applying a mathematical rather than a technology approach first. If we can prove reversibility for very complicated cases such as machine learning/AI (and that is the level of complexity addressed by the paper from University of Leicester I listed above) then that could be what the world wants. Perhaps then it will somehow be obvious that the same techniques should be applied to software development.