Author here. We're using LiteFS to replicate SQLite databases in real time so th...

marcinzm · on Sept 14, 2023

Looks interesting although personally I don't see those as compelling use cases although I may very well be missing something.

> 1. Internal tooling: you're able to manage the contract better since you have control of the source & destination applications.

This has not been my experience in anything but tiny companies or companies with very strict mono-repo processes. One big point of separate teams is to minimize the communication overhead as your organization grows (otherwise it scales as N factorial). That means you do not want a lots of inter-department dependencies due to the internal tooling and APIs they leverage.

> 2. Reporting & analytics: these tend to need a lot of query flexibility & they tend to use a lot of resources. Offloading the query computation to the client makes it easier on the source application.

Depends on the resources the client has versus the server to devote to a single query. The resources are also high because of how much data is analyzed and 8gb seems tiny to me (ie: the whole thing can be kept in some DBs memory).

ryanrussell · on Sept 14, 2023

Ben, fan of your work. You guys have really moved the flag on sqlite.

Are there any plans for Corrosion to be published as OSS?

benbjohnson · on Sept 14, 2023

hey Ryan, thanks! As for Corrosion, I can't say anything publicly but something may be announced in the near future wink wink :)

LispSporks22 · on Sept 14, 2023

Curious if you've tested it with Jepsen. Anytime I come across some distributed system, my stomach ulcers start playing up while I wonder about all the weird failure modes. I kinda looked for a caveats page on the LiteFS web site, didn't really see one.

TheDong · on Sept 14, 2023

LiteFS doesn't try to be a correct distributed system, as you can see from: https://fly.io/docs/litefs/how-it-works/#cluster-management-...

Basically, the solution they have is:

1. There is a single writer. There's optional best-effort leader election for the writer.

2. If there's a network partition, split-brain, etc, availability is chosen over consistency.

Jepson's testing is focused on databases that pick "consistency". Since LiteFS didn't pick consistency, there's really not any point in running Jepson against it. Like, jepson would immediately find "you can lose acknowledged writes", and LiteFS would say "Yes! That's working exactly as intended!"

However, another way of running LiteFS is with only a single writer ever (as in one app, one server, one sqlite database only), and all clients as read-only replicas that are not eligible for taking writes ever. In that case, you also don't have a proper distributed system, just read only replicas, which is quite easy to get right, and mostly what this post is talking about.

benbjohnson · on Sept 14, 2023

LiteFS works similarly to async replication you'd find in Postgres or MySQL so it doesn't try to be as strict as something running a distributed consensus protocol like Raft. The guarantees for async replication are fairly loose so I'm not sure Jepsen testing would be useful for that per se.

On the LiteFS Cloud side, it currently does streaming backups so it has similar guarantees but we are expanding its feature set and I could see running Jepsen testing on that in the future. We worked with Kyle Kingsbury in the past on some distributed systems challenges[1] and he was awesome to work with. Would definitely love to engage with him again.

[1]: https://fly.io/dist-sys/

pininja · on Sept 14, 2023

I could imagine this technique being useful for kepler.gl or other data visualization tools

rgavuliak · on Sept 14, 2023

Do you realize most reporting & analytics use cases don't use SQLLite Databases?

FridgeSeal · on Sept 14, 2023

Do you realise that there are many reporting and analytics cases where SQLite is a great fit?

rgavuliak · on Sept 19, 2023

I am looking at the data analytics industry as a whole and being involved in communities of data practicioners. Most of these people use cloud DBs (Snowflake, BigQuery & co) since there's less dependencies on DB Admin type of work.

Some might be using Postgres or whatever the Engineering team provided them with, but I don't think I have really heard of a Data person preferring to use SQL Lite.

Might still be a great fit, but as the other comment pointed out, it might not be a good fit for the target audience.

zaphirplane · on Sept 14, 2023

The parent is saying this isn’t a great idea because target users don’t use SQLite. Your reply is it is a good idea if people changed how they do things to fit the idea

I don’t have a horse in this race the reply isn’t very well I don’t know what to make of that