Difference between revisions of "Fossil"

From Dan Shearer CV
Line 6: Line 6:
 
* I have made over 150 forum postings
 
* I have made over 150 forum postings
 
* I became a [https://fossil-scm.org/forum/forumpost/daf8242f38b574c6 temporary packaging intermediary] with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
 
* I became a [https://fossil-scm.org/forum/forumpost/daf8242f38b574c6 temporary packaging intermediary] with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
* I somewhat assisted [https://fossil-scm.org/forum/forumpost/ec5155da3c822f13342bc7 Stephan Beal's libfossil] to roar back into life. This is important because a library means the world can have multiple front-ends to the official Fossil app. It also enables automated toolkits, and hosted source code repositories to replace Github, and other things that are not part of Fossil's design goals. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now.
+
* I somewhat assisted [https://fossil-scm.org/forum/forumpost/ec5155da3c822f13342bc7 Stephan Beal's libfossil] to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important (something Git lacks) and being a library means the world can have multiple front-ends alternatives to the official Fossil app. libfossil also enables automated toolkits, and hosted source code repositories to replace Github, and other things that are not part of Fossil's design goals. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now.
 
* I completed a privacy review of Fossil, and [https://fossil-scm.org/forum/forumpost/d4d4eff808166ed57bb8a81062807267baa1572f5d41247af4baa73ce69afb95 debated my proposal] in public. Some of that involved [https://fossil-scm.org/forum/forumpost/d4e051ac794fce31f232f6339fd445 discussion of privacy arcanae].
 
* I completed a privacy review of Fossil, and [https://fossil-scm.org/forum/forumpost/d4d4eff808166ed57bb8a81062807267baa1572f5d41247af4baa73ce69afb95 debated my proposal] in public. Some of that involved [https://fossil-scm.org/forum/forumpost/d4e051ac794fce31f232f6339fd445 discussion of privacy arcanae].
 
* I have had discussions with CyberSecurity researchers in two universities about the EU Privacy Shield issue facing Github as a US cloud, and what best to replace Github with in Europe. This discussion is ongoing.
 
* I have had discussions with CyberSecurity researchers in two universities about the EU Privacy Shield issue facing Github as a US cloud, and what best to replace Github with in Europe. This discussion is ongoing.
Line 12: Line 12:
 
Not only is Fossil a better SCM for my projects LumoSQL and Sweet Lies, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then it will have passed a major milestone.
 
Not only is Fossil a better SCM for my projects LumoSQL and Sweet Lies, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then it will have passed a major milestone.
   
It may sound strange to some to use anything other than Git. For the record, here are some of my reasons:
+
It may sound strange to some to use anything other than Git, which is thoroughly discussed at this [https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki comparison document] I have contributed to. For the record though, here are some of my specific reasons:
   
 
# Fossil is an '''append-only, non-repudiable Merkle tree with strong crytographic guarantees'''. Many people imagine that Git also fit this description, but it is nothing of the sort.
 
# Fossil is an '''append-only, non-repudiable Merkle tree with strong crytographic guarantees'''. Many people imagine that Git also fit this description, but it is nothing of the sort.
# '''Git focusses on ancestors, not descendants'''. It is possible to find ancestor descendant commits by parsing git logs, but git does not help you do this even though it is a very important feature for checking security issues (remember Solar Winds?!). LumoSQL is itself a modification of a massively forked-and-vendored codebase, and since LumoSQL combines this with many other codebases, locating descendants is be important. An SCM is for managing a Directed Acyclic Graph (DAG) of checkins, and a DAG can be traversed in any direction. So this counts against Git.
+
# '''Git focusses on ancestors, not descendants'''. It is possible to find ancestor descendant commits by parsing git logs, but git does not help you do this even though it is a very important feature for checking security issues. LumoSQL is itself a modification of a massively forked-and-vendored codebase, and since LumoSQL combines this with many other codebases, locating descendants is important. An SCM is for managing a Directed Acyclic Graph (DAG) of checkins, and a DAG can be traversed in any direction. So this counts against Git.
 
# '''Git provides widely-used features to change history'''. One of the perceived benefits to using Git is that it encourages pull requests from people who have previously cloned your tree. It does not seem to be good design to accept clean trees from committers, because that tree has a reason for being the way it is. Git is like Toad of Toad Hall in the sense that it is about what we wished had happened in an ideal world. A security project would like to know what ''really'' happened, not what we ''wanted to have happened''. It isn't the right mindset.
 
# '''Git provides widely-used features to change history'''. One of the perceived benefits to using Git is that it encourages pull requests from people who have previously cloned your tree. It does not seem to be good design to accept clean trees from committers, because that tree has a reason for being the way it is. Git is like Toad of Toad Hall in the sense that it is about what we wished had happened in an ideal world. A security project would like to know what ''really'' happened, not what we ''wanted to have happened''. It isn't the right mindset.
 
# '''All Gits Lead to Github'''. Github is currently the best index of open projects and code. If we run our own Git server we will still mirror to Github (or perhaps GitLab). And then that is where the issues will be raised, and pull requests made, and the authentication namespace, because while we are compatible with the Git protocol we are realistically not going to try to duplicate all the rest of the infrastructure. And Github has many problems. Not least that it is extraordinarily inaccessible to people with quite common eyesight problems, and I have spent a lot of time asking Github to fix these problems in vain.
 
# '''All Gits Lead to Github'''. Github is currently the best index of open projects and code. If we run our own Git server we will still mirror to Github (or perhaps GitLab). And then that is where the issues will be raised, and pull requests made, and the authentication namespace, because while we are compatible with the Git protocol we are realistically not going to try to duplicate all the rest of the infrastructure. And Github has many problems. Not least that it is extraordinarily inaccessible to people with quite common eyesight problems, and I have spent a lot of time asking Github to fix these problems in vain.

Revision as of 01:23, 19 November 2021

Fossil source code management system is the only realistic alternative to Git, and has had 15 years of development and testing. I was forced to consider alternatives to Git and Github because the LumoSQL and Sweet Lies projects focus on security. Work needed to be done on Fossil because it was not then a commodity, off-the-shelf SCM, and that is what my projects needed.

I invested significantly in Fossil:

  • After being accepted as a code contributor, I have made 31 commits to the Fossil tree so far
  • I have made over 150 forum postings
  • I became a temporary packaging intermediary with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
  • I somewhat assisted Stephan Beal's libfossil to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important (something Git lacks) and being a library means the world can have multiple front-ends alternatives to the official Fossil app. libfossil also enables automated toolkits, and hosted source code repositories to replace Github, and other things that are not part of Fossil's design goals. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now.
  • I completed a privacy review of Fossil, and debated my proposal in public. Some of that involved discussion of privacy arcanae.
  • I have had discussions with CyberSecurity researchers in two universities about the EU Privacy Shield issue facing Github as a US cloud, and what best to replace Github with in Europe. This discussion is ongoing.

Not only is Fossil a better SCM for my projects LumoSQL and Sweet Lies, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then it will have passed a major milestone.

It may sound strange to some to use anything other than Git, which is thoroughly discussed at this comparison document I have contributed to. For the record though, here are some of my specific reasons:

  1. Fossil is an append-only, non-repudiable Merkle tree with strong crytographic guarantees. Many people imagine that Git also fit this description, but it is nothing of the sort.
  2. Git focusses on ancestors, not descendants. It is possible to find ancestor descendant commits by parsing git logs, but git does not help you do this even though it is a very important feature for checking security issues. LumoSQL is itself a modification of a massively forked-and-vendored codebase, and since LumoSQL combines this with many other codebases, locating descendants is important. An SCM is for managing a Directed Acyclic Graph (DAG) of checkins, and a DAG can be traversed in any direction. So this counts against Git.
  3. Git provides widely-used features to change history. One of the perceived benefits to using Git is that it encourages pull requests from people who have previously cloned your tree. It does not seem to be good design to accept clean trees from committers, because that tree has a reason for being the way it is. Git is like Toad of Toad Hall in the sense that it is about what we wished had happened in an ideal world. A security project would like to know what really happened, not what we wanted to have happened. It isn't the right mindset.
  4. All Gits Lead to Github. Github is currently the best index of open projects and code. If we run our own Git server we will still mirror to Github (or perhaps GitLab). And then that is where the issues will be raised, and pull requests made, and the authentication namespace, because while we are compatible with the Git protocol we are realistically not going to try to duplicate all the rest of the infrastructure. And Github has many problems. Not least that it is extraordinarily inaccessible to people with quite common eyesight problems, and I have spent a lot of time asking Github to fix these problems in vain.
  5. Git was not intended for ordinary projects. Git provides an excellent solution for the Linux kernel, and, after some scaling-up, the even more enormous internet Microsoft code repository for all their products. But these contain tens and hundreds of millions of lines of code. Only a small handful of projects are anything like that big... for comparison, the major open source databases are around two million SLOC each (and SQLite is a slim 250 thousand SLOC.) Git is not sized correctly for most projects, or focussed on providing more certainty for developers of ordinary-sized projects with less than a million or so lines of code.