Difference between revisions of "LumoSQL"

From Dan Shearer CV
 
(29 intermediate revisions by the same user not shown)
Line 1: Line 1:
  +
[https://lumosql.org LumoSQL] protects data on mobile phones using a new data storage technology which is highly compatible with most existing devices. With LunmoSQL, the device owner has ultimate right to decide who can read or change their data... and this decision continues to be enforced even after it has been copied off the phone to (for example) a bank or insurance company for processing with their in-house database software. In contrast, the situation at present is that device owners are rarely in control of the privacy of their own data, despite many laws relating to privacy.
LumoSQL modifies the [https://sqlite.org SQLite Open Source database software] to add performance, security and privacy features. LumoSQL matters because there are many copies of SQLite on every mobile phone in the world. LumoSQL is a technical product for software developers to use.
 
   
  +
If a criminal or government officer takes a phone away from its owner, LumoSQL data cannot be read without the consent of either the phone owner or someone(s) to whom the phone owner has granted access. This is fine-grained, meaning different levels of permission can be granted.
LumoSQL is a new category of software compatible with the original SQLite. LumoSQL is currently a working prototype, with the next goal being production testing. The NLnet Foundation funded [https://lumosql.org/src/lumosql/doc/trunk/README.md Phase I] of LumoSQL to running proof of concept, and is continuing to fund [https://lumosql.org/src/lumosql/doc/trunk/doc/LumoSQL-PhaseII-Announce.md Phase II].
 
  +
  +
<blockquote><big>The detailed strategy document [https://cv.shearer.org/x/images/3/30/LumoSQLMotivation-1.0.pdf LumoSQLMotivation-1.0.pdf] explains the social, business and technical pressures that mean LumoSQL brings radical change to data storage.</big></blockquote>
  +
  +
As the strategy document says, "LumoSQL assumes software development should never be relied on and
  +
is getting worse".
  +
  +
In summary:
  +
 
'''Technically:''' LumoSQL modifies the [https://sqlite.org SQLite Open Source database software] to add essential performance, security and privacy features. LumoSQL is a technical product for software developers to use. LumoSQL is currently a working prototype, and uses Predicate Based Encryption to manipulate Lumions.
  +
  +
'''Socially and Commercially:''' LumoSQL matters because virtually all data for all apps on mobile devices is stored using SQLite, making it by far the most-used database anywhere. Yet this database is vulnerable to snoopers and does not even come close to the gold-standard privacy requirements of the EU and various non-EU countries. LumoSQL is a new category of software, highly compatible with the original SQLite and protecting our most intimate data.
  +
 
'''Practically:''' LumoSQL exists due to the volunteer effort contributed by many skilled people. The [https://www.vub.be/ Vrije Universiteit Brussel] continues to fund valuable contributions in cryptography and mathematical analysis. The [https://nlnet.nl NLnet Foundation] funded [https://lumosql.org/src/lumosql/doc/trunk/README.md Phase I] of LumoSQL, and is continuing to fund [https://lumosql.org/src/lumosql/doc/trunk/doc/LumoSQL-PhaseII-Announce.md Phase II].
   
 
__TOC__
 
__TOC__
Line 9: Line 22:
 
The following facts often surprise people:
 
The following facts often surprise people:
   
* SQLite is the most-deployed software, by a factor of at least '''four zeros'''
+
* SQLite is very likely the most-deployed software, by a factor of at least '''four zeros'''. It is likely that SQLite is the only trillion-scale software in existence, although that statement needs some validation work to be sure.
* SQLite is a full-featured database, supporting the standard SQL language despite being tiny compared to all the other mainstream SQL databases
+
* SQLite is a full-featured database, supporting the standard SQL language despite being tiny compared to all the other mainstream SQL databases.
 
* a typical mobile phone stores all non-streaming data in several hundred SQLite databases.
 
* a typical mobile phone stores all non-streaming data in several hundred SQLite databases.
 
* SQLite is also used in web browsers, operating systems, vehicles of all kinds and so on.
 
* SQLite is also used in web browsers, operating systems, vehicles of all kinds and so on.
* SQLite is open source, exceptionally well-maintained (with many contributions) by around 8 people loosely connected in a small, unambitious company. (There are many more than 8 people who contribute occasionally to SQLite.)
+
* SQLite is open source, exceptionally well-maintained, mostly by just 3 people. Many more people contribute occasionally to SQLite, and the community of deeply technical users is very large.
  +
* SQLite version 3 is [https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml a standard data format], and relied on by companies such as Airbus whose products last many decades.
  +
* SQLite is exceptionally reliable, given the policy decisions not to change certain fundamentals.
   
The corollaries are significant:
+
The corollaries to these unusual facts are significant:
   
* This is uncharted territory for Computer Science: is SQLite's ultra-conservative compatibility commitment to its hundreds of billions of installations the right choice? Is SQLite's fast-moving support of formal database standards the best way forward?
+
* This is uncharted territory for Computer Science: is SQLite's ultra-conservative compatibility commitment to its (at least) hundreds of billions of installations the right choice? Is SQLite's fast-moving support of formal database standards the best way forward? It does certainly work very well.
* Why are there so many forks of SQLite, none with more than trivial (say, a few handsfuls of millions) of deployments? The nature of the project seems to guarantee its success and also constrain its future.
+
* Of the many forks of SQLite, none seem to have more than a relatively trivial deployed base (likely at most a few handfuls of millions.) This includes seemingly very useful forks. It appears the nature of the SQLite project ensures its success and discourages replacements, but also constrains its future in important ways.
* Why are the obvious strategic problems with SQLite not discussed more widely? The most-used software in the world is incompatible with security, privacy and other requirements of the 21st century, so why isn't this a hot topic?
+
* The obvious strategic problems with SQLite are not widely discussed. But why? It is evident the most-used software in the world is incompatible with security, privacy and other requirements of the 21st century. Perhaps because SQLite works so well and is so ubiquitous that people can't imagine an alternative. It is unacceptable to store personal data unencrypted, and features such as whole-device encryption do not really address the problem. The world needs a new option, compatible with SQLite. LumoSQL wants to be that option, and more.
   
 
== LumoSQL Phase I Completed ==
 
== LumoSQL Phase I Completed ==
   
This is a technical paragraph. For even more technical detail see [https://lumosql.org/src/lumosql the code development page].
+
This is a technical section non-technical readers can skip. For even more technical detail see [https://lumosql.org/src/lumosql the code development page].
   
 
LumoSQL is a modification (not a fork) of the SQLite embedded data storage library. LumoSQL offers multiple key-value backend storage systems selectable by the user. It offers features not found in any other mainstream database:
 
LumoSQL is a modification (not a fork) of the SQLite embedded data storage library. LumoSQL offers multiple key-value backend storage systems selectable by the user. It offers features not found in any other mainstream database:
Line 36: Line 51:
   
 
In Phase II LumoSQL is implementing [https://lumosql.org/src/lumosql/doc/trunk/doc/LumoSQL-PhaseII-Announce.md at-rest encryption and privacy] using the features developed in Phase I, and readying LumoSQL for more general testing.
 
In Phase II LumoSQL is implementing [https://lumosql.org/src/lumosql/doc/trunk/doc/LumoSQL-PhaseII-Announce.md at-rest encryption and privacy] using the features developed in Phase I, and readying LumoSQL for more general testing.
  +
  +
Notable outcomes from LumoSQL already include:
  +
  +
: ✅ The only mainstream database with swappable Key-Value stores, where all stores are peers rather than one store having special knowledge that gives it technical advantages. The two stores we have concentrated on so far are (1) the existing SQLite store with optional, binary-compatible modifications for encrypted rows and tables and the associated metadata and (2) the LMDB memory-based store which may have advantages when uses with the most modern high-performance RAM-based storage hardware. We look forward to integrating other stores, and to prove the point we also supply the ancient Oracle BDB backend as an example third store.
  +
:✅ The only mainstream database optionally without a [[:wikipedia:Write-ahead Log|Write-ahead Log]], when using the LMDB storage backend.
  +
: ✅ [https://lumosql.org/dist/benchmarks-to-date/ preliminary Benchmarking results]
  +
  +
: ✅ The [[Not Forking]] tool, which avoids forks in both simple and complicated source code
  +
  +
Exciting things in progress:
  +
  +
: ⛅ Lumions are described in a draft [https://lumosql.org/src/lumosql/doc/tip/doc/rfc/README.md RFC for universal encrypted blobs with authentication.] For the first time, a piece of data can have all the security rights of a full database, even it is called "mydata.txt" and attached to an email. LumoSQL uses Lumions as rows and tables in SQLite but is just one use case. As soon as the cryptographic design has settled we will update this RFC and consult even more widely
  +
  +
: ⛅ Documented API for arbitrary key-value stores
  +
  +
: ⛅ Documented API for accessing the key-value stores via the SQLite library, instantly making the SQLite key-value store the most widely-distributed key-value store. Nothing calls the SQLite key-value store today except SQLite
  +
   
 
[[Category:Computer Science]]
 
[[Category:Computer Science]]

Latest revision as of 22:30, 14 April 2024

LumoSQL protects data on mobile phones using a new data storage technology which is highly compatible with most existing devices. With LunmoSQL, the device owner has ultimate right to decide who can read or change their data... and this decision continues to be enforced even after it has been copied off the phone to (for example) a bank or insurance company for processing with their in-house database software. In contrast, the situation at present is that device owners are rarely in control of the privacy of their own data, despite many laws relating to privacy.

If a criminal or government officer takes a phone away from its owner, LumoSQL data cannot be read without the consent of either the phone owner or someone(s) to whom the phone owner has granted access. This is fine-grained, meaning different levels of permission can be granted.

The detailed strategy document LumoSQLMotivation-1.0.pdf explains the social, business and technical pressures that mean LumoSQL brings radical change to data storage.

As the strategy document says, "LumoSQL assumes software development should never be relied on and is getting worse".

In summary:

Technically: LumoSQL modifies the SQLite Open Source database software to add essential performance, security and privacy features. LumoSQL is a technical product for software developers to use. LumoSQL is currently a working prototype, and uses Predicate Based Encryption to manipulate Lumions.

Socially and Commercially: LumoSQL matters because virtually all data for all apps on mobile devices is stored using SQLite, making it by far the most-used database anywhere. Yet this database is vulnerable to snoopers and does not even come close to the gold-standard privacy requirements of the EU and various non-EU countries. LumoSQL is a new category of software, highly compatible with the original SQLite and protecting our most intimate data.

Practically: LumoSQL exists due to the volunteer effort contributed by many skilled people. The Vrije Universiteit Brussel continues to fund valuable contributions in cryptography and mathematical analysis. The NLnet Foundation funded Phase I of LumoSQL, and is continuing to fund Phase II.

Surprising Background

The following facts often surprise people:

  • SQLite is very likely the most-deployed software, by a factor of at least four zeros. It is likely that SQLite is the only trillion-scale software in existence, although that statement needs some validation work to be sure.
  • SQLite is a full-featured database, supporting the standard SQL language despite being tiny compared to all the other mainstream SQL databases.
  • a typical mobile phone stores all non-streaming data in several hundred SQLite databases.
  • SQLite is also used in web browsers, operating systems, vehicles of all kinds and so on.
  • SQLite is open source, exceptionally well-maintained, mostly by just 3 people. Many more people contribute occasionally to SQLite, and the community of deeply technical users is very large.
  • SQLite version 3 is a standard data format, and relied on by companies such as Airbus whose products last many decades.
  • SQLite is exceptionally reliable, given the policy decisions not to change certain fundamentals.

The corollaries to these unusual facts are significant:

  • This is uncharted territory for Computer Science: is SQLite's ultra-conservative compatibility commitment to its (at least) hundreds of billions of installations the right choice? Is SQLite's fast-moving support of formal database standards the best way forward? It does certainly work very well.
  • Of the many forks of SQLite, none seem to have more than a relatively trivial deployed base (likely at most a few handfuls of millions.) This includes seemingly very useful forks. It appears the nature of the SQLite project ensures its success and discourages replacements, but also constrains its future in important ways.
  • The obvious strategic problems with SQLite are not widely discussed. But why? It is evident the most-used software in the world is incompatible with security, privacy and other requirements of the 21st century. Perhaps because SQLite works so well and is so ubiquitous that people can't imagine an alternative. It is unacceptable to store personal data unencrypted, and features such as whole-device encryption do not really address the problem. The world needs a new option, compatible with SQLite. LumoSQL wants to be that option, and more.

LumoSQL Phase I Completed

This is a technical section non-technical readers can skip. For even more technical detail see the code development page.

LumoSQL is a modification (not a fork) of the SQLite embedded data storage library. LumoSQL offers multiple key-value backend storage systems selectable by the user. It offers features not found in any other mainstream database:

  • ability to checksum every row on write and verify on read
  • ability to trigger arbitrary functions on per-row read and write
  • a general test suite for benchmarking precisely how LumoSQL (or SQLite) is performing and the full context of that benchmark run. For some reason database benchmark is very poorly done, including by the TCP-C consortium founded for solely that purpose.
  • a general build system able to mix and match multiple versions of the database with multiple versions of multiple backends. Never before has it been possible to compare the different strategies of various Key-Value stores with the same database frontend.

If you are an SQLite user familiar with C development wanting an easier way to benchmark and measure SQLite, or if you are wanting features only available in other key-value storage engines, then you will find that LumoSQL offers new features even in its prototype stage.

LumoSQL Phase II Has Started

In Phase II LumoSQL is implementing at-rest encryption and privacy using the features developed in Phase I, and readying LumoSQL for more general testing.

Notable outcomes from LumoSQL already include:

✅ The only mainstream database with swappable Key-Value stores, where all stores are peers rather than one store having special knowledge that gives it technical advantages. The two stores we have concentrated on so far are (1) the existing SQLite store with optional, binary-compatible modifications for encrypted rows and tables and the associated metadata and (2) the LMDB memory-based store which may have advantages when uses with the most modern high-performance RAM-based storage hardware. We look forward to integrating other stores, and to prove the point we also supply the ancient Oracle BDB backend as an example third store.
✅ The only mainstream database optionally without a Write-ahead Log, when using the LMDB storage backend.
preliminary Benchmarking results
✅ The Not Forking tool, which avoids forks in both simple and complicated source code

Exciting things in progress:

⛅ Lumions are described in a draft RFC for universal encrypted blobs with authentication. For the first time, a piece of data can have all the security rights of a full database, even it is called "mydata.txt" and attached to an email. LumoSQL uses Lumions as rows and tables in SQLite but is just one use case. As soon as the cryptographic design has settled we will update this RFC and consult even more widely
⛅ Documented API for arbitrary key-value stores
⛅ Documented API for accessing the key-value stores via the SQLite library, instantly making the SQLite key-value store the most widely-distributed key-value store. Nothing calls the SQLite key-value store today except SQLite