Difference between revisions of "Fossil"

From Dan Shearer CV
 
(87 intermediate revisions by the same user not shown)
Line 1: Line 1:
[https://www.fossil-scm.org/ Fossil] source code management system is the only realistic alternative to Git, and has had 15 years of development and testing. I now use Fossil.
+
The [https://www.fossil-scm.org/ Fossil] source code management system is the only realistic alternative to Git, and has had 15 years of development and testing. After [[Fossil#Work Done on Fossil|helping Fossil make some changes]] I now use Fossil for many projects. I also use Git on various software forges, and [[:wikipedia:Mercurial|Mercurial]] if I need to work with code from the [[:wikipedia:Mozilla|Mozilla]] project.
   
== Why Not Git or GitHub? ==
+
== One-sentence Summary - Why Fossil? ==
   
  +
<big>21st century privacy and [https://reproducible-builds.org/ reproducibility] require code to be in an '''append-only, non-repudiable [[:wikipedia:Merkle tree|Merkle tree]] with strong crytographic guarantees''', and that is what Fossil is by design.</big>
I don't think Git is bad, and GitHub was a major advance on SourceForge in the same way that SourceForge was a major advance on plain CVS. All my Fossil code is mirrored on GitHub, and it is a good place to search for existing projects.
 
   
  +
== More Detail - Why Fossil? ==
However I needed to find an alternative to Git and GitHub because:
 
   
  +
: ✅ Fossil has a [https://fossil-scm.org/home/doc/trunk/www/fossil-is-not-relational.md simple, small and written-down standard], so "people not yet born" will be able to read a Fossil repository. Fossil repositories are designed to last for at least 100 years.
* GitHub seems focussed on maximising the number of total projects which use their proprietary processing pipelines. For GitHub, every cloned project is a financial win, but isn't clear to me that is a win for the world.
 
* I could not convince GitHub to fix visual accessibility problems, and I had multiple team members with visual impairments. GitHub have billions in the bank, and I spoke to several very polite managers at length, but GitHub still refuse to implement years-old W3C accessibility standards. That's not acceptable.
 
* Even if you only ever use a git commandline, Git comes with a lot of pain... software developers much more skilled than me wrestle with Git and its complexities. It doesn't feel right to me that a development team should need to worry about losing work, or use an interface so complex that [https://git-man-page-generator.lokaltog.net/ paradoy man pages look real!].
 
   
  +
: ✅ Fossil has two independent implementations of this written-down standard, ensuring consistency and interoperability. This is a very significant achievement, and important for long-term reliability.
Then there are security issues. [[LumoSQL]] and [[Sweet Lies]] are EU-domiciled open source projects which focus on security, and:
 
# GitHub has [[Analysis of EU-US Privacy Shield|US Cloud issues]], and so should not be used in the EU
 
# Git encourages users to break the Merkle tree. Rather than an inviolate historical record, it's a curated version (especially with 'rebase')
 
# It is difficult to find the descendants of check-ins in Git. It is so difficult that neither native Git nor GitHub provide this capability, and you need to write code to crawl the commit log. This makes it hard to find what descendent code may have been affected by an upstream bug.
 
   
  +
: ✅ [https://fossil.wanderinghorse.net/r/libfossil/wiki/home The second implementation] is a library, which is in turn used to create more [https://fnc.bsdbox.org/index Fossil-compatible apps]. If I want to solve a source management problem Fossil does not address, the Fossil library gives me a very big start.
The Fossil website has a pretty balanced [https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki comparison of Fossil v Git] that is much more comprehensive than what I have written here.
 
   
  +
: ✅ Fossil believes in and strongly avoids changing the code record, it is about as immutable as it reasonably can be.
For me, it matters that Fossil is an '''append-only, non-repudiable Merkle tree with strong crytographic guarantees'''. Git is not.
 
   
  +
: ✅ Fossil is designed for projects of ordinary size and complexity, which means nearly 100% of all projects in the world. By ''my own ad-hoc measurements'', this means < 8 million lines of code, < 800 developers and < 8 thousand checkin events per year since approximately the year 1990. The unscientific measure I settled on was to import the Git repo for the [https://gcc.gnu.org/ GNU Compiler Collection], which is 7 million lines of code and quite usable with Fossil.
  +
  +
This article is about Fossil, but a reasonable person will ask in 2024 "Why not Git?".
  +
  +
Briefly:
  +
  +
: ❌ The Git standard is the source code of Git, code only a few programming wizards know how to read. Therefore Git cannot be a standard, and the storage format is inaccessible to nearly everyone. That is a big weakness in the world's infrastructure. (There is some discussion in the documentation about the on-disk format, which again, is not a standard.)
  +
  +
: ❌ Git was designed and is now maintained for a tiny number of the biggest projects in the world, and all other users have to accept whatever features are good for those few projects. The Linux kernel source tree is enormous, and the Microsoft internal source code repository is much larger again - but nearly all software projects are hundreds of times smaller than these two elephants. Even, say, the very large [https://postgres.org Postgres database] project with 2 million lines of code is twenty times smaller than the Linux kernel. Fossil works great with the Postgres source tree.
  +
  +
: ❌ Git makes it easy to rewrite history by default and that is very appealing to human psychology: ''I will just squash my last twenty changes into a single change before committing where my whole team can see it''. Unfortunately that decision also means the Merkle tree is not complete. This and other privacy and reproducibility matters are covered in more detail further down this article.
  +
  +
: ❌ (downgraded from ✅) libgit2 is an independent, very trailing-edge implementation of Git that has lost development momentum because of the trailing edge nature. See the topic [https://lwn.net/ml/git/ZRrfN2lbg14IOLiK@nand.local/ Libification Goals and Progress] at the 2023 Git conference, where the notes make it clear that GitHub, GitLab and Microsoft have all dropped libgit. This point used to be only a red 'X', but now libgit2 is a dependency-free C library, which is the correct design although the implementation does not seem to be possible.
  +
  +
{{Note|This article is about why I chose Fossil, rather than comparing with Git. For a very detailed and balanced comparison see [https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki Fossil v Git] on the Fossil SCM web site.|Not a Comparison!}}
  +
  +
== But GitHub is So Successful! ==
  +
  +
Git is not GitHub. But yes, GitHub is successful, and historically played a big role in the movement to make code visible in the first decade of the 21st century. And 'Git' is in the name 'GitHub', although the company seems to focus mostly on the 'Hub' part of their name these days.
  +
  +
We've been here before. GitHub is a software forge which builds on Git, and Git was a major advance on [https://subversion.apache.org/ Subversion], which was the replacement for CVS. [https://sourceforge.net/ Souceforge] was the first software forge and was originally based on CVS, and then Subversion, but was still outclassed in terms of features by GitHub and now is just not a place where people choose to put their new projects. Will GitHub similarly slide into obscurity over time? One thing is clear, that GitHub will always be a US-based cloud company and thus unable to offer various privacy guarantees that EU-based clouds can. GitHub also is a closed-source cloud offering a restricted level of service for free, which means it cannot compete with open source forges. GitHub is a good place to search for existing projects, but new projects have options, and alternative forges (all based on Git) are growing fast.
  +
  +
I looked for something that sets out to meet 21st century challenges including accessibility, and there are some good candidates.
  +
 
I wanted to find an alternative to Git and GitHub because:
  +
 
* I could not convince GitHub to fix visual accessibility problems, and I had multiple team members with visual impairments. I spoke to several very polite managers and developers at length. It turns out that despite their billions in the bank, it will be a years-long project for GitHub to implement years-old W3C accessibility standards. That is not acceptable, '''and in addition''' it suggests GitHub has itself become a giant slow-moving codebase.
 
* Even if you only ever use a git commandline, Git comes with a lot of pain... even the most experienced software developers wrestle with Git and its complexities. Why should a development team should need to worry about losing work? Why should they use an interface so complex that [https://git-man-page-generator.lokaltog.net/ paradoy man pages look real]?. GitHub may be a good solution for the closed-source needs of the very largest companies, but I am one of millions of developers who have totally different needs. In 2022, after 14 years, GitHub started offering a [https://github.com/cli/cli commandline tool] that can access some of its features. This is not putting developers first.
  +
* Git encourages merging of private trees, or the '[https://git-scm.com/book/en/v2/Distributed-Git-Distributed-Workflows#wfdiag_c Benevolent Dictator development model]', which seems to me to be delaying discussion until after code is written. This is what Git supports well because it is the Linux model. There are many online resources and entire training companies dedicated to undoing the default Git/GitHub workflow practices. I like my projects to instead be a tighter 'cathederal-style' development community, with discussion happening as code is developed, all branches visible to everyone, and no long-lived active branches.
  +
* GitHub wants to be at the centre of [[wikipedia:CI/CD|CI/CD]], and with closed source APIs and services as part of every toolchain. This means that GitHub becomes part of the reproducibility chain, except since GitHub is not transparent these toolchains are not reproducible.
  +
* There is a common class of source tree management problems that GitHub could address where Git fails, one which no SCM can currently solve. This is the problem of non-diffable trees, some of which is addressed by the [[Not Forking]] project. GitHub is the biggest source tree management company in the world, but it does not appear to have thought about this problem - or if it has, does not even give me the tools to explore solutions for myself.
  +
  +
There are two open source EU-hosted alternatives to GitHub that feel to me like they could have a long and happy future - [https://sourcehut.org/ SourceHut] and [https://codeberg.org/ Codeberg]. SourceHut is able to work with non-Git DVCSs as proved by its [https://hg.sr.ht/ Mercurial support] while Codeberg currently only works with git. Both are open source.
  +
  +
Other people discuss moving away from GitHub:
  +
* [https://tomscii.sig7.se/2024/01/Ditching-GitHub Ditching GitHub - 2024]
  +
* [https://sfconservancy.org/blog/2022/jun/30/give-up-github-launch/ Give up GitHub; the time has come - 2022, Software Freedom Law Conservancy]
  +
* [https://raccoon.onyxbits.de/blog/why-not-use-github/ Why I won't use Github for any new projects - 2020]
  +
* [https://nilsnh.no/how-i-self-host-git-projects/#why-i-picked-forgejo-for-self-hosting-my-git-projects Why I picked Forgejo - 2024]
  +
  +
== In Addition: Security and Privacy Issues More Complex Than They Seem ==
  +
  +
There are many well-established security projects on GitHub but that does not mean GitHub is safe, only that these projects have ways (such as a lot of funding) to minimise the risks. The campaign [https://sfconservancy.org/GiveUpGitHub/ Give Up GitHub] presents a comprehensive view of why open source software developers should move elsewhere. From my personal point of view, [[LumoSQL]] and [[Sweet Lies]] projects are two small examples of open source security projects which must be completely sure that the source code is exactly as the developers wrote it, and that the source code has not been interfered with, and that the developers have not had their own personal data misused.
  +
  +
Following are ways that Git and GitHub would potentially cause security problems in my projects:
  +
  +
# Git actively encourages users to break the Merkle tree. Rather than an inviolate historical record, Git users expect to produce a curated version of their local tree (especially with the 'git rebase' command used to squash commits). This [https://news.ycombinator.com/item?id=27433751 YCombinator thread] discusses the pros and cons of squashing commits.
 
# It is difficult to find the descendants of check-ins in Git. It is so difficult that neither native Git nor GitHub provide this capability, and you need to write code to crawl the commit log. This makes it hard to find what descendent code may have been affected by an upstream bug or deliberate code insertion.
  +
# GitHub is closed source, and since it is also strongly focussed on third-party toolchain integration, that means we cannot know how secure the toolchain is. In April 2021 there was an example of [https://www.theregister.com/2021/04/19/codecov_warns_of_stolen_credentials/ GitHub giving credentials to a compromised toolchain partner]. In 2023 [https://msrc.microsoft.com/blog/2023/11/microsoft-guidance-regarding-credentials-leaked-to-github-actions-logs-through-azure-cli/ GitHub Actions exposed credentials].
  +
# GitHub is a US-controlled company. The US has a history of [[:wikipedia:Crypto Wars#Snowden_and_NSA's_bullrun_program|actively working to insert vulnerabilities into encryption systems]] and believing that their fantasty [[:wikipedia:NOBUS|NOBUS (Nobody But Us)]] policy can work. My projects are critical security systems, so this is not a risk I can accept. GitHub could be instructed by the US government not to inform me of any attack against my projects. Plenty of other countries have unpleasant laws on these topics of course, but the US is the one relevant to GitHub.
 
# GitHub has [[Analysis of EU-US Privacy Shield|US Cloud issues]], which correctly means it should not be used by EU developers, etc. While this is legitimately serious, it is common to all US cloud companies.
  +
  +
Fossil may have many security issues too, but it does not have the entirely avoidable ones listed above.
   
 
== Work Done on Fossil ==
 
== Work Done on Fossil ==
Line 26: Line 74:
   
 
* Fossil was not then a commodity, off-the-shelf SCM, and I needed users to be able to just get it easily for their favourite operating system.
 
* Fossil was not then a commodity, off-the-shelf SCM, and I needed users to be able to just get it easily for their favourite operating system.
* Fossil only had one implementation. That made me uncomfortable with Git too. Why would we have a vital standardised data format with only one set of very complex tools that can read it?
+
* Fossil only had one implementation, which is something I dislike about Git too. A vital standardised data format should have multiple tools that can read it. As an example of this, any proposal for an Internet RFC can't be considered unless there are at least two independent implementations, because that is how the standard is tested.
  +
* I discovered some small but significant bugs in Fossil's Git compatibility.
   
 
So I invested significantly in Fossil, and these problems were fixed:
 
So I invested significantly in Fossil, and these problems were fixed:
   
 
* I became a [https://fossil-scm.org/forum/forumpost/daf8242f38b574c6 temporary packaging intermediary] with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
 
* I became a [https://fossil-scm.org/forum/forumpost/daf8242f38b574c6 temporary packaging intermediary] with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
* I somewhat assisted [https://fossil-scm.org/forum/forumpost/ec5155da3c822f13342bc7 Stephan Beal's libfossil] to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important and being a library means the world can have multiple front-ends alternatives to the official Fossil app. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now. libfossil is great insurance.
+
* I assisted [https://fossil-scm.org/forum/forumpost/ec5155da3c822f13342bc7 Stephan Beal's libfossil] to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important and being a library means the world can have multiple front-end alternatives to the official Fossil app. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now. libfossil is great insurance.
 
* I completed a privacy review of Fossil, and [https://fossil-scm.org/forum/forumpost/d4d4eff808166ed57bb8a81062807267baa1572f5d41247af4baa73ce69afb95 debated my proposal] in public. Some of that involved [https://fossil-scm.org/forum/forumpost/d4e051ac794fce31f232f6339fd445 discussion of privacy arcanae].
 
* I completed a privacy review of Fossil, and [https://fossil-scm.org/forum/forumpost/d4d4eff808166ed57bb8a81062807267baa1572f5d41247af4baa73ce69afb95 debated my proposal] in public. Some of that involved [https://fossil-scm.org/forum/forumpost/d4e051ac794fce31f232f6339fd445 discussion of privacy arcanae].
* After being accepted as a code contributor, I have made 31 commits to the Fossil tree so far
+
* After being accepted as a code contributor, I have made commits to the Fossil tree.
  +
* I participate in the [https://fossil-scm.org/forum Fossil forum], which is an efficient and friendly group discussion.
* I have made over 150 forum postings
 
   
 
== Fossil as a LumoSQL Test Case ==
 
== Fossil as a LumoSQL Test Case ==
   
Not only is Fossil a better SCM for the needs of my projects, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then it will have passed a major milestone.
+
Not only is Fossil a better SCM for the needs of my projects, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then LumoSQL will have passed a major milestone.
  +
  +
== Not GitLab Either ==
   
  +
This was a lesser consideration, because once Git and GitHub were ruled out that also ruled out GitLab. But it is worth recording that GitLab has a different version of the same kinds of issues as GitHub:
   
  +
* GitLab is proprietary closed source wrapped around an [https://gitlab.com/gitlab-org/gitlab open source core]. From my experience with GitLab instances I don't believe it is possible to host your own fully-functional GitLab - for example, with full text search. Perhaps it is possible to hack the GitLab source to add functionality back in, but I have not tried. I assume that will never be possible because of the threat to the GitLab business model, and so I moved on.
  +
* GitLab integrates with many of the same third party toolchain services as GitHub, and has been affected by similar security problems as GitHub.
  +
* GitLab does try to address the common inefficient Git practices with their [https://about.gitlab.com/blog/2020/03/05/what-is-gitlab-flow/ Git Flow] process. This tries to get closer to the default Fossil way of doing things, but adds a lot of overhead to do so.
  +
* GitLab is a Ukranian company, and since Ukraine has no established privacy relationship with the EU, the [https://edpb.europa.eu/our-work-tools/public-consultations-art-704/2020/recommendations-012020-measures-supplement-transfer_en 2020 Data Transfer Recommendations] apply. That is a lot of work to do with many uncertainties, but nevertheless is the minimum requirement to meet EU privacy standards. Update: since this paragraph was written Ukraine was invaded, but I am not aware of this currently affecting decisions to use GitLab one way or another.
  +
* GitLab is also a large global company (although smaller than GitHub) trading on the US stock exchange and worth billions. It is therefore also in part subject to US law, which brings up [[Analysis of EU-US Privacy_Shield|US cloud issues]] just like GitHub (and other US cloud companies.)
   
 
[[Category:Software Development]]
 
[[Category:Software Development]]

Latest revision as of 00:16, 25 October 2024

The Fossil source code management system is the only realistic alternative to Git, and has had 15 years of development and testing. After helping Fossil make some changes I now use Fossil for many projects. I also use Git on various software forges, and Mercurial if I need to work with code from the Mozilla project.

One-sentence Summary - Why Fossil?

21st century privacy and reproducibility require code to be in an append-only, non-repudiable Merkle tree with strong crytographic guarantees, and that is what Fossil is by design.

More Detail - Why Fossil?

✅ Fossil has a simple, small and written-down standard, so "people not yet born" will be able to read a Fossil repository. Fossil repositories are designed to last for at least 100 years.
✅ Fossil has two independent implementations of this written-down standard, ensuring consistency and interoperability. This is a very significant achievement, and important for long-term reliability.
The second implementation is a library, which is in turn used to create more Fossil-compatible apps. If I want to solve a source management problem Fossil does not address, the Fossil library gives me a very big start.
✅ Fossil believes in and strongly avoids changing the code record, it is about as immutable as it reasonably can be.
✅ Fossil is designed for projects of ordinary size and complexity, which means nearly 100% of all projects in the world. By my own ad-hoc measurements, this means < 8 million lines of code, < 800 developers and < 8 thousand checkin events per year since approximately the year 1990. The unscientific measure I settled on was to import the Git repo for the GNU Compiler Collection, which is 7 million lines of code and quite usable with Fossil.

This article is about Fossil, but a reasonable person will ask in 2024 "Why not Git?".

Briefly:

❌ The Git standard is the source code of Git, code only a few programming wizards know how to read. Therefore Git cannot be a standard, and the storage format is inaccessible to nearly everyone. That is a big weakness in the world's infrastructure. (There is some discussion in the documentation about the on-disk format, which again, is not a standard.)
❌ Git was designed and is now maintained for a tiny number of the biggest projects in the world, and all other users have to accept whatever features are good for those few projects. The Linux kernel source tree is enormous, and the Microsoft internal source code repository is much larger again - but nearly all software projects are hundreds of times smaller than these two elephants. Even, say, the very large Postgres database project with 2 million lines of code is twenty times smaller than the Linux kernel. Fossil works great with the Postgres source tree.
❌ Git makes it easy to rewrite history by default and that is very appealing to human psychology: I will just squash my last twenty changes into a single change before committing where my whole team can see it. Unfortunately that decision also means the Merkle tree is not complete. This and other privacy and reproducibility matters are covered in more detail further down this article.
❌ (downgraded from ✅) libgit2 is an independent, very trailing-edge implementation of Git that has lost development momentum because of the trailing edge nature. See the topic Libification Goals and Progress at the 2023 Git conference, where the notes make it clear that GitHub, GitLab and Microsoft have all dropped libgit. This point used to be only a red 'X', but now libgit2 is a dependency-free C library, which is the correct design although the implementation does not seem to be possible.
Not a Comparison!
Evolution-tasks.png

But GitHub is So Successful!

Git is not GitHub. But yes, GitHub is successful, and historically played a big role in the movement to make code visible in the first decade of the 21st century. And 'Git' is in the name 'GitHub', although the company seems to focus mostly on the 'Hub' part of their name these days.

We've been here before. GitHub is a software forge which builds on Git, and Git was a major advance on Subversion, which was the replacement for CVS. Souceforge was the first software forge and was originally based on CVS, and then Subversion, but was still outclassed in terms of features by GitHub and now is just not a place where people choose to put their new projects. Will GitHub similarly slide into obscurity over time? One thing is clear, that GitHub will always be a US-based cloud company and thus unable to offer various privacy guarantees that EU-based clouds can. GitHub also is a closed-source cloud offering a restricted level of service for free, which means it cannot compete with open source forges. GitHub is a good place to search for existing projects, but new projects have options, and alternative forges (all based on Git) are growing fast.

I looked for something that sets out to meet 21st century challenges including accessibility, and there are some good candidates.

I wanted to find an alternative to Git and GitHub because:

  • I could not convince GitHub to fix visual accessibility problems, and I had multiple team members with visual impairments. I spoke to several very polite managers and developers at length. It turns out that despite their billions in the bank, it will be a years-long project for GitHub to implement years-old W3C accessibility standards. That is not acceptable, and in addition it suggests GitHub has itself become a giant slow-moving codebase.
  • Even if you only ever use a git commandline, Git comes with a lot of pain... even the most experienced software developers wrestle with Git and its complexities. Why should a development team should need to worry about losing work? Why should they use an interface so complex that paradoy man pages look real?. GitHub may be a good solution for the closed-source needs of the very largest companies, but I am one of millions of developers who have totally different needs. In 2022, after 14 years, GitHub started offering a commandline tool that can access some of its features. This is not putting developers first.
  • Git encourages merging of private trees, or the 'Benevolent Dictator development model', which seems to me to be delaying discussion until after code is written. This is what Git supports well because it is the Linux model. There are many online resources and entire training companies dedicated to undoing the default Git/GitHub workflow practices. I like my projects to instead be a tighter 'cathederal-style' development community, with discussion happening as code is developed, all branches visible to everyone, and no long-lived active branches.
  • GitHub wants to be at the centre of CI/CD, and with closed source APIs and services as part of every toolchain. This means that GitHub becomes part of the reproducibility chain, except since GitHub is not transparent these toolchains are not reproducible.
  • There is a common class of source tree management problems that GitHub could address where Git fails, one which no SCM can currently solve. This is the problem of non-diffable trees, some of which is addressed by the Not Forking project. GitHub is the biggest source tree management company in the world, but it does not appear to have thought about this problem - or if it has, does not even give me the tools to explore solutions for myself.

There are two open source EU-hosted alternatives to GitHub that feel to me like they could have a long and happy future - SourceHut and Codeberg. SourceHut is able to work with non-Git DVCSs as proved by its Mercurial support while Codeberg currently only works with git. Both are open source.

Other people discuss moving away from GitHub:

In Addition: Security and Privacy Issues More Complex Than They Seem

There are many well-established security projects on GitHub but that does not mean GitHub is safe, only that these projects have ways (such as a lot of funding) to minimise the risks. The campaign Give Up GitHub presents a comprehensive view of why open source software developers should move elsewhere. From my personal point of view, LumoSQL and Sweet Lies projects are two small examples of open source security projects which must be completely sure that the source code is exactly as the developers wrote it, and that the source code has not been interfered with, and that the developers have not had their own personal data misused.

Following are ways that Git and GitHub would potentially cause security problems in my projects:

  1. Git actively encourages users to break the Merkle tree. Rather than an inviolate historical record, Git users expect to produce a curated version of their local tree (especially with the 'git rebase' command used to squash commits). This YCombinator thread discusses the pros and cons of squashing commits.
  2. It is difficult to find the descendants of check-ins in Git. It is so difficult that neither native Git nor GitHub provide this capability, and you need to write code to crawl the commit log. This makes it hard to find what descendent code may have been affected by an upstream bug or deliberate code insertion.
  3. GitHub is closed source, and since it is also strongly focussed on third-party toolchain integration, that means we cannot know how secure the toolchain is. In April 2021 there was an example of GitHub giving credentials to a compromised toolchain partner. In 2023 GitHub Actions exposed credentials.
  4. GitHub is a US-controlled company. The US has a history of actively working to insert vulnerabilities into encryption systems and believing that their fantasty NOBUS (Nobody But Us) policy can work. My projects are critical security systems, so this is not a risk I can accept. GitHub could be instructed by the US government not to inform me of any attack against my projects. Plenty of other countries have unpleasant laws on these topics of course, but the US is the one relevant to GitHub.
  5. GitHub has US Cloud issues, which correctly means it should not be used by EU developers, etc. While this is legitimately serious, it is common to all US cloud companies.

Fossil may have many security issues too, but it does not have the entirely avoidable ones listed above.

Work Done on Fossil

Before I could use Fossil, I needed some changes:

  • Fossil was not then a commodity, off-the-shelf SCM, and I needed users to be able to just get it easily for their favourite operating system.
  • Fossil only had one implementation, which is something I dislike about Git too. A vital standardised data format should have multiple tools that can read it. As an example of this, any proposal for an Internet RFC can't be considered unless there are at least two independent implementations, because that is how the standard is tested.
  • I discovered some small but significant bugs in Fossil's Git compatibility.

So I invested significantly in Fossil, and these problems were fixed:

  • I became a temporary packaging intermediary with the main distributions. This has been successful... recent operating systems all carry recent versions of Fossil, and this now appears to be self-sustaining. There was a lot of private community interaction to make this happen.
  • I assisted Stephan Beal's libfossil to roar back into life as a second, completely independent implementation of the Fossil data model. Multiple implementations are really important and being a library means the world can have multiple front-end alternatives to the official Fossil app. I don't want my projects locked into Fossil any more than Github, although I am perfectly happy with Fossil for now. libfossil is great insurance.
  • I completed a privacy review of Fossil, and debated my proposal in public. Some of that involved discussion of privacy arcanae.
  • After being accepted as a code contributor, I have made commits to the Fossil tree.
  • I participate in the Fossil forum, which is an efficient and friendly group discussion.

Fossil as a LumoSQL Test Case

Not only is Fossil a better SCM for the needs of my projects, but it is also a very demanding test case for LumoSQL. Fossil is built on SQLite, in fact Fossil and SQLite are symbiotic projects, and Fossil is the one SQLite application all SQLite developers are guaranteed to use. If Fossil can run on LumoSQL without a problem, and potentially even with some advantages, then LumoSQL will have passed a major milestone.

Not GitLab Either

This was a lesser consideration, because once Git and GitHub were ruled out that also ruled out GitLab. But it is worth recording that GitLab has a different version of the same kinds of issues as GitHub:

  • GitLab is proprietary closed source wrapped around an open source core. From my experience with GitLab instances I don't believe it is possible to host your own fully-functional GitLab - for example, with full text search. Perhaps it is possible to hack the GitLab source to add functionality back in, but I have not tried. I assume that will never be possible because of the threat to the GitLab business model, and so I moved on.
  • GitLab integrates with many of the same third party toolchain services as GitHub, and has been affected by similar security problems as GitHub.
  • GitLab does try to address the common inefficient Git practices with their Git Flow process. This tries to get closer to the default Fossil way of doing things, but adds a lot of overhead to do so.
  • GitLab is a Ukranian company, and since Ukraine has no established privacy relationship with the EU, the 2020 Data Transfer Recommendations apply. That is a lot of work to do with many uncertainties, but nevertheless is the minimum requirement to meet EU privacy standards. Update: since this paragraph was written Ukraine was invaded, but I am not aware of this currently affecting decisions to use GitLab one way or another.
  • GitLab is also a large global company (although smaller than GitHub) trading on the US stock exchange and worth billions. It is therefore also in part subject to US law, which brings up US cloud issues just like GitHub (and other US cloud companies.)