How to Build a Vector Database with SQLite in Node.js.

How to Build a Vector Database with SQLite in Node.js.

·

9 min read

SQLite is everywhere. From massive systems to tiny embedded devices—even your web browser and Android phone rely on it. It's lightweight, fast, and battle-tested.

Now, think about where AI is headed. Just as computers shrank from room-sized machines to smartphones—thanks to Moore’s Law—Large Language Models (LLMs) are following a similar trajectory. With breakthroughs in research, clever optimizations, and techniques like quantization (which shrinks models by reducing bit precision), these models are becoming smaller and more efficient.

When LLMs start running locally on smaller devices, they’ll need storage that’s just as fast and lightweight as they are. Enter SQLite—already a go-to for many systems.

But here’s a twist: what if we used SQLite not just as a storage engine, but also for vector computations? In AI, LLMs thrive on vectors and embeddings—numerical representations that capture the essence of text. While PostgreSQL has pgvector for vector search, we can explore how to turn SQLite into a powerful vector database.

In this article, we’ll build a vector database with SQLite in Node.js for Retrieval-Augmented Generation (RAG) applications.


RAG in Node.js

First, clone the repo that contains the data folder. This folder includes chunks from white papers—perfect for testing embeddings.

git clone https://github.com/SfundoMhlungu/nodejs-vectordb.git

Note: The code is messy—I was just experimenting at first and didn’t plan on writing about it. But don’t worry, we’ll recreate the core parts in this article.

I suggest creating a clean folder, copying the data folder into it, and initializing a fresh package.json:

vectorDB/
│── data/
│── embed.js
│── index.js

Seeding the Database

Make sure you have SQLite3 installed and that your Node.js version supports native modules.

pnpm add better-sqlite3 ollama uuid

We’ll use Ollama and the mxbai-embed-large model to generate embeddings, so make sure that’s installed too.

Now, open embed.js:

import { readFileSync, readdirSync } from "fs";
import ollama from "ollama";
import Database from "better-sqlite3";
import { v4 as uuidv4 } from "uuid";

const db = new Database("embedblob.db");

db.exec(`
    CREATE TABLE IF NOT EXISTS embeddings (
        id TEXT PRIMARY KEY,
        sessid TEXT,
        name TEXT,
        content TEXT,
        embeddings BLOB
    );

    PRAGMA journal_mode = WAL;  -- Improves write performance
`);

Nothing fancy—just setting up the database and an embeddings table. The important columns here are sessid and embeddings.

  • sessid lets us simulate a chat system. Think of how ChatGPT creates a "new chat"—documents embedded in that session aren’t accessible in others.
  • This limits the search space, improving performance.

The data folder simulates this behavior.

Now, let’s read the data:

const sessions = readdirSync("./data");
let data = {};

// Example structure:
// {
//     sess1: ["1.txt", "2.txt", "3.txt"],
//     sess2: ["1.txt", "2.txt", "3.txt"]
// }

for (const session of sessions) {
    const files = readdirSync(`./data/${session}`);
    data[session] = files;
}

for (const [session, files] of Object.entries(data)) {
    for (const file of files) {
        const content = readFileSync(`./data/${session}/${file}`, "utf-8");
        embed(content, { session, name: file });
    }
}

It’s just reading text files. Now, let’s generate embeddings:


Generating Embeddings

/**
 * Generates embeddings for the given text.
 * @param {string} content - The text content.
 * @param {Object} meta - Metadata (session, filename).
 */
async function embed(content, meta) {
    const res = await ollama.embed({
        model: "mxbai-embed-large",
        truncate: true,
        input: content,
    });

    meta.model = res.model;
    const embeddingArray = new Float32Array(res.embeddings.flat()); // Convert to binary

    saveToDb(embeddingArray, meta, content);
}

Our SQLite plugin expects embeddings as 32 bit binary data. In Node.js, we handle that using Float32Array, a typed array format.

Now, let’s save the embeddings to the database:


Saving to the Database

async function saveToDb(embeddings, meta, content) {
    const transaction = db.transaction(() => {
        const stmt = db.prepare(`
            INSERT INTO embeddings
            VALUES (?, ?, ?, ?, ?)
        `);

        const id = uuidv4();
        stmt.run(id, meta.session, meta.name, content, embeddings);
    });

    transaction();
}

Running the Script

Execute:

node embed.js

Now we have embeddings in SQLite. Next, let’s perform a vector search!

Vector Search with Cosine Similarity

Querying a vector database is pretty straightforward. If you're new to vector search, check out this guide from DataStax on cosine similarity.

The basic process looks like this:

  1. Take a user query (chat message) and generate its embedding.
  2. Find the most similar documents in the database.
  3. Embed those documents into the query as context.

That's Retrieval-Augmented Generation (RAG) in a nutshell. Not too complicated—until you start adding tools, but that's a story for another day.

Setting Up the Code

Open index.js and start with:

import ollama from 'ollama'
import Database from 'better-sqlite3';

const db = new Database('embedblob.db');

async function EmbedUserQuery(query) {
    const res = await ollama.embed({
        model: "mxbai-embed-large",
        truncate: true,
        input: query,
    });

    return new Float32Array(res.embeddings.flat()); // Convert embedding to Float32 binary
}

Implementing Cosine Similarity

Now, let's define a function to compute cosine similarity between two vectors:

function cosineSimilarity(v1, v2) {
    v1 = new Float32Array(v1.buffer); // Convert blob back to Float32
    v2 = new Float32Array(v2.buffer);

    if (v1.length !== v2.length) {
        throw new Error("Vectors must be of the same length.");
    }

    let dot = 0, norm1Sq = 0, norm2Sq = 0;
    for (let i = 0; i < v1.length; i++) {
        dot += v1[i] * v2[i];
        norm1Sq += v1[i] * v1[i];
        norm2Sq += v2[i] * v2[i];
    }

    return dot / (Math.sqrt(norm1Sq) * Math.sqrt(norm2Sq)); // Returns a value between 0 and 1
}

Now, we need a function to:

  • Receive a user query
  • Embed it
  • Find the most similar documents in the database
  • Return a formatted query
async function CheckSimilarity(query, sess) {
    const rows = db.prepare(`SELECT * FROM embeddings WHERE sessid = ?`).all(sess);

    let matches = ``;
    if (rows.length > 0) {
        const embedding = await EmbedUserQuery(query);

        for (const row of rows) {
            const e = new Float32Array(row.embeddings.buffer); // from SQLite BLOB to float32array
            const sim = cosineSimilarity(embedding, e); // Compute similarity

            console.log(`doc: ${row.name}, similarity: ${sim}, user query: ${query}`);

            if (sim > 0.6) { // If similarity is >60%, include in matches
                matches += row.content + "\n";
            }
        }
    }

    return matches ? `context: ${matches}\n\nuser query: ${query}` : query;
}

Chatting with the Model

Now, let’s use this to generate responses:

async function Chat(userQuery, sess) {
    const query = await CheckSimilarity(userQuery, sess);
    console.log("formatted query:", query);

    const message = { role: 'user', content: query };
    const response = await ollama.chat({ model: 'llama3.2', messages: [message], stream: true });

    for await (const part of response) {
        process.stdout.write(part.message.content);
    }
}

I’m using llama3.2, but you can swap it out for any model you prefer.

Testing It Out

At the bottom of the file, test the model:

Chat("hello model", "sess1");  // 0.39 similarity

// Chat("file system for large distributed data-intensive applications", "sess1"); // 0.81
// Chat("advertising, marketing, paid subscriptions", "sess2");  // 0.63

For example, this won’t match any documents:

Chat("hello model", "sess1");  // 0.39

But this has a document matching at 80%:

Chat("file system for large distributed data-intensive applications", "sess1");

Since our seed data includes parts of the Google File System white paper, try playing around with different queries—comment/uncomment lines and experiment.

Performance Insights

If you’re paying close attention, you’ll notice that vector search is really fast. That’s because:

  • TypedArrays are highly optimized for holding binary data.
  • Node.js is built on C++, making operations like this super efficient.

Still don’t believe me? I built a C++ SQLite plugin that does the same thing but with multiple optimizations, including SIMD.

Native vs JavaScript Performance

JavaScript actually performs faster than the native plugin in some cases. That’s because SQLite itself is C++, and when you combine it with TypedArrays, it gets a performance boost.

However, native plugins are still useful—especially for languages like PHP, Go, C#, python etc that can consume DLLs.

That’s why I packaged this into vector-sqlite3, which includes the DLL for Windows users. Other platforms can compile the C plugin from the repo.

Using vector-sqlite3

First, install it:

npm i vector-sqlite3

Then, update your code to use the optimized vector functions:

import ollama from 'ollama';
import Database from 'better-sqlite3';
import sqlitevector from 'vector-sqlite3';

const db = new Database('C:/path/to/your/db/embedblob.db');

// Enable native vector functions
sqlitevector(db, { useNative: ["cosine_similarity", "l2_distance", "dot_product"] });

Or, if you want to stick with JavaScript:

sqlitevector(db, { useNative: [] }); // Disable native functions esp for non windows users since there's no build

async function EmbedUserQuery(query) {
    const res = await ollama.embed({
        model: "mxbai-embed-large",
        truncate: true,
        input: query,
    });

    return new Float32Array(res.embeddings.flat());
}

Now, let's perform vector search using SQLite:

const f = await EmbedUserQuery("file system for large distributed data-intensive applications");

const rows = db.prepare(`
    SELECT *, cosine_similarity_unrolled(embeddings, ?) AS similarity
    FROM embeddings WHERE sessid = ?
`).all(f, "sess1");

console.log(rows[0]);

Optimized Cosine Similarity

This function inside vector-sqlite3 speeds things up by processing four elements at a time:

cosine_similarity_unrolled: (v1, v2) => {
    if (v1.length !== v2.length) {
        throw new Error("Vectors must be of the same length.");
    }

    v1 = new Float32Array(v1.buffer);
    v2 = new Float32Array(v2.buffer);

    let dot = 0, norm1Sq = 0, norm2Sq = 0;
    let i = 0, len = v1.length;

    // Process 4 elements at a time
    for (; i <= len - 4; i += 4) {
        dot += v1[i] * v2[i] + v1[i + 1] * v2[i + 1] +
               v1[i + 2] * v2[i + 2] + v1[i + 3] * v2[i + 3];

        norm1Sq += v1[i] * v1[i] + v1[i + 1] * v1[i + 1] +
                   v1[i + 2] * v1[i + 2] + v1[i + 3] * v1[i + 3];

        norm2Sq += v2[i] * v2[i] + v2[i + 1] * v2[i + 1] +
                   v2[i + 2] * v2[i + 2] + v2[i + 3] * v2[i + 3];
    }
}

Final Thoughts

You can check out the repo for more details, including the DLL.

Supported functions in JavaScript:

cosine_similarity, cosine_similarity_unrolled, l2_distance, dot_product

In the DLL, everything is already optimized, so no need for cosine_similarity_unrolled.

const rows2 = db.prepare(`
    SELECT *, l2_distance(embeddings, ?) AS distance,
    dot_product(embeddings, ?) AS dot_product
    FROM embeddings WHERE sessid = ?
`).all(f, f, "sess1");

console.log(rows2[0]);