Difference between revisions of "Fossil"

From Dan Shearer CV
Line 15: Line 15:
 
: ✅ Fossil believes in and strongly avoids changing the code record, it is about as immutable as you would ever want.
 
: ✅ Fossil believes in and strongly avoids changing the code record, it is about as immutable as you would ever want.
   
: ✅ Fossil is made for projects of ordinary size and complexity, which means nearly 100% of all projects in the world (roughly speaking, ''by my own ad-hoc measurements'', < 8 million lines of code, < 500 developers and < 8 thousand checkin events per year.)
+
: ✅ Fossil is made for projects of ordinary size and complexity, which means nearly 100% of all projects in the world. By ''my own ad-hoc measurements'', this means < 8 million lines of code, < 800 developers and < 8 thousand checkin events per year since approximately the year 1990.
   
 
This article is about Fossil, but a reasonable person will ask in 2022 "Why not Git?".
 
This article is about Fossil, but a reasonable person will ask in 2022 "Why not Git?".

Revision as of 17:11, 18 January 2022

The Fossil source code management system is the only realistic alternative to Git, and has had 15 years of development and testing. I now use Fossil.

One-sentence Summary - Why Fossil?

21st century privacy and reproducibility require code to be in an append-only, non-repudiable Merkle tree with strong crytographic guarantees, and that is what Fossil is by design.

More Detail - Why Fossil?

✅ Fossil has a simple, small and written-down standard, so "people not yet born" will be able to read a Fossil repository.
✅ Fossil has two independent implementations of this written-down standard, ensuring consistency and interoperability.
The second implementation is a library, which is in turn used to create more Fossil-compatible apps.
✅ Fossil believes in and strongly avoids changing the code record, it is about as immutable as you would ever want.
✅ Fossil is made for projects of ordinary size and complexity, which means nearly 100% of all projects in the world. By my own ad-hoc measurements, this means < 8 million lines of code, < 800 developers and < 8 thousand checkin events per year since approximately the year 1990.

This article is about Fossil, but a reasonable person will ask in 2022 "Why not Git?".

Briefly:

❌ The only Git standard is the code of the one and only Git implementation, readable only by a few wizards.
❌ libgit2 provides a Git API by wrapping Git, which is like providing bread by soaking toast in water.
❌ Git is made for a tiny number of the biggest projects in the world, and all other users have to accept whatever features are good for those few projects.
❌ Git thinks it is a great idea to rewrite history, but the mathematics of privacy and reproducibility disagree.

These and many more points are covered in detail and with fair balance at Fossil v Git on the Fossil SCM web site.

But GitHub is So Successful!

We've been here before. Git was a major advance on CVS, and GitHub was a major advance on SourceForge. All the projects I am part of are mirrored on GitHub, and GitHub is a good place to search for existing projects. I feel that GitHub is losing relevance just like SourceForge did. We need something that sets out to meet 21st century challenges.

I wanted to find an alternative to Git and GitHub because:

  • I could not convince GitHub to fix visual accessibility problems, and I had multiple team members with visual impairments. I spoke to several very polite managers and developers at length. It turns out that despite their billions in the bank, it will be a years-long project for GitHub to implement years-old W3C accessibility standards. That is not acceptable, and in addition it shows GitHub has become an impossibly giant pile of code used to manage an impossibly large numbers of copies of code... over and over again, which is against reliability and reproducibility requirements.
  • Even if you only ever use a git commandline, Git comes with a lot of pain... even the most experienced software developers wrestle with Git and its complexities. Why should a development team should need to worry about losing work? Why should they use an interface so complex that paradoy man pages look real!.
  • Git encourages merging of privately-maintained trees, or the 'bazzar' development model, which seems to me to be delaying discussion until after code is written. I wanted my projects to instead be a tighter 'cathederal-style' development community, with discussion happening as code is developed and all branches visible to everyone.

In Addition: Security and Privacy Issues More Complex Than They Seem

LumoSQL and Sweet Lies are EU-domiciled open source projects which provide their users with critical security services. We need to be completely sure that the source code is exactly as the developers wrote it, and that the source code has not been interfered with, and that the developers have not had their own personal data misused.

Here is how Git and GitHub both bring potential security exposure:

  1. Git actively encourages users to break the Merkle tree. Rather than an inviolate historical record, Git users expect to produce a curated version of their local tree (especially with the 'git rebase' command used to squash commits)
  2. It is difficult to find the descendants of check-ins in Git. It is so difficult that neither native Git nor GitHub provide this capability, and you need to write code to crawl the commit log. This makes it hard to find what descendent code may have been affected by an upstream bug or deliberate code insertion.
  3. GitHub is closed source, and since it is also strongly focussed on third-party toolchain integration, that means we cannot know how secure the toolchain is. In April 2021 there was an example of GitHub giving credentials to a compromised toolchain partner.
  4. GitHub is a US-controlled company. The US has a history of actively working to insert vulnerabilities into encryption systems and believing that their fantasty NOBUS (Nobody But Us) policy can work. My projects are critical security systems, so this is not a risk I can accept. GitHub could be instructed not to inform me of any attack against my projects.
  5. GitHub has US Cloud issues, which correctly means it should not be used by EU developers, etc. While this is legitimately serious, and there are privacy issues uniquely associated with source code commits, this is common to all US cloud companies.

Fossil may have many security issues too, but it does not have the entirely avoidable ones listed above.

Work Done on Fossil

Before I could use Fossil, I needed some changes:

  • Fossil was not then a commodity, off-the-shelf SCM, and I needed users to be able to just get it easily for their favourite operating system.
  • Fossil only had one implementation. That made me uncomfortable with Git too. Why would we have a vital standardised data format with only one set of very complex tools that can read it?

So I invested significantly in Fossil, and these problems were fixed:

  • I became a temporary packaging intermediary with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
  • I somewhat assisted Stephan Beal's libfossil to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important and being a library means the world can have multiple front-end alternatives to the official Fossil app. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now. libfossil is great insurance.
  • I completed a privacy review of Fossil, and debated my proposal in public. Some of that involved discussion of privacy arcanae.
  • After being accepted as a code contributor, I have made 31 commits to the Fossil tree so far
  • I have made over 150 forum postings

Fossil as a LumoSQL Test Case

Not only is Fossil a better SCM for the needs of my projects, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then it will have passed a major milestone.

Not GitLab Either

This was a lesser consideration, because once Git and GitHub were ruled out that also ruled out GitLab. But it is worth recording that GitLab has a different version of the same kinds of issues as GitHub:

  • GitLab is proprietary closed source (the core of the product is open source, but that excludes many vital features. You cannot host your own fully-functional GitLab instance.)
  • GitLab integrates with many of the same third party toolchain services as GitHub, and has been affected by similar security problems as GitHub.
  • GitLab is a Ukranian company, and since the Ukraine has no established privacy relationship with the EU, the 2020 Data Transfer Recommendations apply. That is a lot of work to do with many uncertainties, but nevertheless is the minimum requirement to meet EU privacy standards.
  • GitLab is an enormous global company (although less enormous than GitHub) trading on the US stock exchange and worth billions. It is therefore also in part subject to US law, which brings up some Privacy Shield issues.
  • GitLab does try to address common Git practices with their Git Flow process. This tries to get closer to the default Fossil way of doing things, but adds a lot of overhead to do so.