Installing SQLite Extensions with pip/npm/gem, sqlite-vss for vector search, and more!
Welcome to the 4th issue of this SQLite newsletter! Apologies for the 10-month long delay since the last issue, but I’ve open sourced many new projects and I’m excited to share with you all!
As a reminder — this is a newsletter created by Alex Garcia about SQLite. New extensions, tools, guides, best practices, and more. There will be new issue roughly every month.
This month: A recap of many new extensions and tooling I have released recently, including:
sqlite-ecosystem
, a collection of 12 open source SQLite extensions- Distributing extensions on
pip
/npm
/gem
/deno.land/x
for easy installing sqlite-loadable-rs
, a framework for writing SQLite extensions in Rustsqlite-vss
, an extension for vector search in SQLite
The sqlite-ecosystem
with a dozen open-sourced SQLite extensions
The sqlite-ecosystem
is a growing collection of 12 open sourced SQLite extensions that all independently provide new SQL scalar function, table functions, and virtual tables in SQLite. These include:
Extension | Description | Language |
---|---|---|
sqlite-http |
Make HTTP requests | Go |
sqlite-html |
parse HTML documents | Go |
sqlite-lines |
Read files/blob line-by-line | C |
sqlite-path |
Parse and generate filepaths | C |
sqlite-url |
Parse and generate URLs | C |
sqlite-xsv |
Query CSVs | Rust |
sqlite-regex |
Regular expression functions | Rust |
sqlite-ulid |
Work with ULIDs | Rust |
sqlite-jsonschema |
Validate JSON objects with JSON Schema | Rust |
sqlite-fastrand |
Generate fast numbers/blobs quickly | Rust |
sqlite-vss |
Vector search in SQLite | C++ |
These extensions are written in different languages and exist in different codebases. However, they are all distributed in the same way (all have pip/npm/gem support), and they all have pre-compiled extensions available.
Making Installing + Distributing SQLite Extensions Easier
Historically, installing SQLite extensions has been hard! They’ve been around for 16 years, but were usually copy+pasted C files that you had to compile yourself. Some extensions like spatialite found popularity, but were still difficult to install.
Then sqlean came out in 2021 with a different strategy: pre-compiled SQLite extensions for popular platforms (MacOS/Linux/Windows) built on Github Actions and distributed on Github Releases. Which is awesome! All the SQLite extensions in the `sqlite-ecosystem followed a similar approach.
But it can be even easier! Downloading a pre-compiled extension from a Github Release is easy for one-off scripts, but doing repeatedly in “production” environments can be tedious.
So, I figured out a way to distribute SQLite extension on a few package managers for popular languages, including:
- Python:
pip install
‘able SQLite Extensions for Python - Datasette:
datasette install
‘able SQLite Extensions for Datasette - Node.js:
npm install
‘able SQLite Extensions for Node.js - Deno:
deno.land/x
SQLite Extensions for Deno - Ruby:
gem install
‘able SQLite Extensions for Ruby
Each package manager is handled in a different way: For example, the Python packages offer different pre-built wheels for each platform (MacOS/Linux/windows). The Node.js packages use "optionalDependencies"
, and the Ruby gems have different compiled .gem
files for each platform.
sqlite-loadable-rs
for writing SQLite extension in Rust
sqlite-loadable-rs
is a Rust library that makes writing loadable SQLite extension in Rust much easier. The popular rusqlite crate currently doesn’t allow you to create loadable SQLite extensions, so sqlite-loadable-rs
is a hopefully-short-term solution!
This library lead to the following extensions:
sqlite-xsv
, for reading CSVs files, based on the csv cratesqlite-regex
, for working with regular expression , based on the regex cratesqlite-ulid
, for generating/parsing ULIDs, based on the ulid cratesqlite-jsonschema
, for validating JSON documents, based on the jsonschema cratesqlite-fastrand
, for generating random numbers/blobs quickly, based on the fastrand crate
A few reasons why I prefer writing SQLite extension in Rust whenever possible:
- Memory safety: While not 100% safe, it’s much easier to avoid memory errors with
sqlite-loadable-rs
than it is writing an extension in C/C++. - Rich ecosystem: Most of the extension listed above are just light wrappers around popular and well-tested Rust crates!
- Performance: In the extensions I’ve made,
sqlite-loadable-rs
is much faster than extensions written in Go, and either at-par with C extensions or a tad slower. But some extensions, likesqlite-regex
andsqlite-xsv
, are actually faster than their C counterparts! - Supports multiple platforms: For the most part, most Rust crates are compatible with MacOS/Linux/Windows out-of-the-box, which is not the case with many C/C++ libraries.
- Rust is just nice: The borrow checker, enums, and error handling in Rust just scratches an itch in my brain that I like. It’s not for everyone, and it was incredibly difficult to learn, but I’d much rather write and maintain SQLite extensions written in Rust rather than C, C++, or Go.
For further reading:
- Introducing sqlite-loadable-rs: A framework for building SQLite Extensions in Rust (December 2022)
- Introducing sqlite-xsv: The Fastest CSV Parser for SQLite ((January 2023)
- Introducing sqlite-regex: A fast Regular Expression Extension for SQLite (January 2023)
sqlite-vss
for Vector Search in SQLite
sqlite-vss
is a SQLite extension for vector search, based on Faiss. Vector search is an integral part of semantic search engines, recommendation systems, Q&A platforms, and more.
“Vector databases” have exploded in popularity recently, fueled by the AI/ChatGPT/embeddings storm of the last several months. Many of these vector databases require a lot of resources: dedicated servers, docker-compose.yaml
, complex dependencies, etc.
sqlite-vss
, on the other hand, is much more lightweight and simpler than many of these vector databases. Just like SQLite, it’s in-process, requires no extra servers, and saves data inside your SQLite databases.
sqlite-vss
works just like the FTS5 full-text search SQLite extension: a vss0
virtual table that you instantiate, insert data with normal INSERT INTO
commands, and query it with SELECT
statements.
-- 384 == number of dimensions for this example
create virtual table vss_articles using vss0(
headline_embedding(384),
description_embedding(384),
);
insert into vss_articles(rowid, headline_embedding)
select rowid, headline_embedding from articles;
-- KNN-style search: "find the 100 closest vectors to id=123
select
rowid,
distance
from vss_articles
where vss_search(
headline_embedding,
(select headline_embedding from articles where rowid = 123)
)
limit 100;
And just like all the other sqlite-ecosystem
extensions, you can pip install
, npm install
, or gem install
this extension into your applications!
sqlite-vss
is still young and has a number of disadvantages, but is a great first-choice for many simple AI applications.
See Introducing sqlite-vss: A SQLite Extension for Vector Search (February 2023) for a full introduction and a walk-through demo!
Coming Soon
Bindings for Elixir, Go, and Rust
Similar to the Python/Node.js/Deno bindings, all 12 of the extensions in sqlite-ecosystem
will also provide binding for Elixir. That way, Elixir developers can install these extensions with Hex and integrate them into their applications. This effort was spearheaded by Thomas Rodriguez for the experimental sqlite-vss Elixir bindings, and I’ll soon bring them to all the other extensions!
Additionally, bindings for Golang and Rust will come in the next few weeks. These are still a work-in-progress, but you can see examples of then in sqlite-vss
for Go and Rust. These bindings are unique - they statically link these extensions into Go and Rust binaries, so there’s no song-and-dance needed to load extensions manually or deal with dynamically-loaded files.
New Extensions!
sqlite-arrow
: Read parquet/csv/avro/orc into SQLite tables and queries, and write SQLite queries to parquet/arrow IPCsqlite-duckdb
: Query and insert data into DuckDB databasessqlite-odbc
: Query ODBC-compatible database connectionssqlite-img
: Query and manipulate images (cropping, thumbnails, rotation, etc.)sqlite-xml
: Query XML documents with XPath stringssqlite-assert
: Make assertions of your data at query-time
New Tools!
sqlite-package-manager
: Annpm
-like tool specifically for managing/installing SQLite extensionssqlite-docs
: a CLI tool and extension for documenting SQLite tables, columns, and extension functionssqlitex
: A modernsqlite3
alternative, a new CLI with Parquet/S3 support, a bigger stdlib, syntax highlighting, and more
Supporting my Work
If you enjoy my work on SQLite extensions and tooling, or if you company uses my projects often, please consider becoming a sponsor on Github or hiring me for part-time contract development work! These projects takes a lot of my time and energy, so any support is greatly appreciated.
See you next month!