Epistemic Status: seedling Sprouting


I’ve read a few posts lately, like “Optimizing SQLite for servers”1, about how SQLite, with the right settings, can be used in production. So, I decided to play around with it in Elixir and Phoenix to see how hard it would be to set up. Having done so, I think using it at the beginning can help you architect a better database layer for your application.

Obviously, if you’re building something that you know will need horizontal scaling early on, then by all means, don’t use SQLite, use PostgreSQL instead.

The Recommendations

You should definitely go read the post in full, but the short version of their recommendations are:

  • Set the journal_mode to use WAL (Write-Ahead Log)
  • Set the busy_timeout to at least 5 seconds (or 5,000 ms)
  • Set synchronous to NORMAL
  • Set a large cache_size, eg 1,000,000,000 or 1GB.
  • Set foreign_keys to true. This is mainly to have database level consistency checks.
  • Set temp_store to memory.
  • Use BEGIN IMMEDIATE transactions. This works around an issue with how SQLite handles the busy timeout inside transactions in its default mode.
  • Use a single connection for writes, and multiple for reads and split up access along those lines.
  • Use STRICT tables. I think this is another consistency thing and not related to performance.
  • Use Litestream for continuous backups to S3-compatible stores.

Phoenix uses the excellent Ecto library for its database access by default, and Ecto has support for SQLite, so it was just a matter of making sure everything was set right and that I was using the database in the ways that are suggested.

It turns out that the Ecto SQLite3 adapter, sets a lot of these defaults already. There are a few things it doesn’t set the same way so to fix that, in your Phoenix app’s config.exsconfig.exs module.

config/config.ex
config :my_app, MyApp.Repo,
  # ...
  busy_timeout: 5_000,
  cache_size: 1_000_000_000,
  # The documentation is ambiguous on whether this is set to `:default` or
  # if it's set to `memory`, so it's better to just set it to be sure.
  temp_store: :memory
config/config.ex
config :my_app, MyApp.Repo,
  # ...
  busy_timeout: 5_000,
  cache_size: 1_000_000_000,
  # The documentation is ambiguous on whether this is set to `:default` or
  # if it's set to `memory`, so it's better to just set it to be sure.
  temp_store: :memory

That said, you’ll need to do this in two places if you set up the Ecto Repos to have a read Repo and a write Repo to split those up.

Separating reads and writes

Most projects I’ve worked on during my career have eventually needed to split reads away from the primary database and onto read-only replicas. Using SQLite in the ways the author of that post recommended pushes you in that direction earlier in development even though you’re connecting to the same database file and not a replica.

To get it to work in Ecto, I started by creating a new Repo for handling writes. This means I have to be explicit about where I want to modify the database, and can’t just use the default Repo that Phoenix generates.

lib/my_app/write_repo.ex
defmodule MyApp.WriteRepo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.SQLite3
end
lib/my_app/write_repo.ex
defmodule MyApp.WriteRepo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.SQLite3
end

Then, I updated RepoRepo to be read-only and specify that migrations should happen on the WriteRepoWriteRepo. This prevents Ecto from trying to create a table on the read-only Repo to track migrations. Unfortunately, this doesn’t seem to affect where Ecto generates migration files2.

lib/my_app/repo.ex
defmodule MyApp.Repo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.SQLite3,
    read_only: true,
    migration_repo: MyApp.WriteRepo
end
lib/my_app/repo.ex
defmodule MyApp.Repo do
  use Ecto.Repo,
    otp_app: :my_app,
    adapter: Ecto.Adapters.SQLite3,
    read_only: true,
    migration_repo: MyApp.WriteRepo
end

Then, I remove the read-only Repo from the list of repositories in the application configuration. Again, this keeps Ecto from trying to apply migrations to a read-only database.

config/config.exs
config :my_app,
  ecto_repos: [MyApp.WriteRepo],
  # ...
config/config.exs
config :my_app,
  ecto_repos: [MyApp.WriteRepo],
  # ...

By splitting up the repositories like this, whenever I eventually move to using PostgreSQL3, it means the code is already ready to read from a separate Repo that can be configured to use read replicas.

Use SQLite for more projects

Especially if you’re an open source project that you expect people to self-host, at least provide an SQLite option, since it makes deployment easier. I’m much more likely to look at a project that provides a SQLite option than one that doesn’t. If you build with it in mind, you’ll also be setting up for scaling later when you find you need to switch.

But you can probably get quite far, as Expensify reports in their post about scaling to 4 million queries per second. Granted, they’re not using bare SQLite, but it’s underlying their system.

Or not?

I might not end up using SQLite from the start on some of my side-projects I’m working on and going with Postgres instead, but just taking the time to look at it as a possibility has made me realize that I want to architect my projects in this fashion, separating the reads from the writes from the beginning.

It makes it more obvious where change is happening, and where change is definitely not happening, which can aid comprehension as a project grows in size and scope. And that seems to tie in well with Elixir’s immutability.

Footnotes

  1. I realize that the author posted what amounts to a retraction a few months later saying among other things, “I believe that the current push to make SQLite a backend database is nothing more than a marketing coup by companies selling edge compute infrastructure”. I’m not sure I agree with them. I think for the specific use case of a single instance service, which Elixir/Erlang can do for quite some time, using SQLite might be a decent option. Migrating later might be difficult though.

  2. I might be missing a configuration setting for this, so if anyone knows, feel free to let me know. Update: It looks like priv might be the option, but I haven’t tested it yet. I suspect you could set priv: "repo"priv: "repo" on the WriteRepo or priv: "write_repo"priv: "write_repo" on the Repo and it would work.

  3. Even if it ends up being just for initial development, and production ends up using PostgreSQL from the start, this approach sets up for a better architecture later. There are extra things that might need to be done, like tracking Log Sequence Numbers (LSNs) to make sure that the replica has seen the changes we need when we read but those are things that would need to happen eventually anyway. We did something like this when I was working on Bitbucket.