Difference between revisions of "Fossil"

From Dan Shearer CV
(No difference)

Revision as of 17:13, 8 November 2021

Fossil source code management system is the only realistic alternative to Git, and has had 15 years of development and testing. I was forced to consider alternatives to Git and Github because the LumoSQL and Sweet Lies projects focus on security. Work needed to be done on Fossil because it was not then a commodity, off-the-shelf SCM, and that is what my projects needed.

I invested significantly in Fossil:

  • After being accepted as a code contributor, I have made 31 commits to the Fossil tree so far
  • I have made over 150 forum postings
  • I became a temporary packaging intermediary with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
  • I somewhat assisted Stephan Beal's libfossil to roar back into life. This is important because a library means the world can have multiple front-ends to the official Fossil app. It also enables automated toolkits, and hosted source code repositories to replace Github, and other things that are not part of Fossil's design goals. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now.
  • I completed a privacy review of Fossil, and debated it in public. Some of that involved discussion of privacy arcanae
  • I have had discussions with CyberSecurity researchers in two universities about the EU Privacy Shield issue facing Github as a US cloud, and what best to replace Github with in Europe. This discussion is ongoing.

It may sound strange to some to use anything other than Git. For the record, here are some of my reasons:

  1. Fossil is an append-only, non-repudiable Merkle tree with strong crytographic guarantees. Git is nothing of the sort.
  2. Git focusses on ancestors, not descendants. It is possible to find ancestor descendant commits by parsing git logs, but git does not help you do this even though it is a very important feature for checking security issues (remember Solar Winds?!). LumoSQL is itself a modification of a massively forked-and-vendored codebase, and since LumoSQL combines this with many other codebases, locating descendants is be important. An SCM is for managing a Directed Acyclic Graph (DAG) of checkins, and a DAG can be traversed in any direction. So this counts against Git, for my purposes.
  3. Git provides widely-used features to change history. One of the perceived benefits to using Git is that it encourages pull requests from people who have previously cloned your tree. It does not seem to be good design to accept clean trees from committers, because that tree has a reason for being the way it is. Git is like Toad of Toad Hall in the sense that it is about what we wished had happened in an ideal world. A security project would like to know what really happened. And should we ban our own tree users from using these tools too? It isn't the right mindset.
  4. All Gits Lead to Github. Github is currently the best index of open projects and code. If we run our own Git server we will still mirror to Github (or perhaps GitLab). And then that is where the issues will be raised, and pull requests made, and the authentication namespace, because while we are compatible with the Git protocol we are realistically not going to try to duplicate all the rest of the infrastructure. And Github has many problems. Not least that it is extraordinarily inaccessible to people with quite common eyesight problems, and I have spent a lot of time asking Github to fix these problems in vain.
  5. Git was not intended for ordinary projects. Git provides an excellent solution for the Linux kernel, and, after some scaling-up, the even more enormous internet Microsoft code repository for all their products. But these are dealing with tens and hundreds of millions of lines of code. Only a small handful of projects are anything like that big... for comparison, the major open source databases are around two million SLOC each (and SQLite is a slim 250 thousand SLOC.) Git is not the right tool for this job.