How to use Cloudflare Workers AI for building an AI-powered search bar

Introduction

There are many ways to build search functionality for your projects:

SQL based: Using SQL LIKE & FULLTEXTSEARCH operators with MySQL & NoSQL
Solutions based: Algolia, Meilisearch, TNTSearch
Vector embeddings based: OpenAI Embeddings, gte embeddings, baai embeddings

We are going to use Cloudflare Workers, AI, BAAI/bge embeddings, PostgreSQL, and Pgvector to build a search functionality that understands queries semantically.

This article assumes you know the basics of installing PostgreSQL extensions and setting up projects (i.e., developer stuff). The focus of this article is on the high-level understanding and pointers only.

Why use vector embeddings instead of other solutions?

Embedding-based search is the next logical step above full-text search. Instead of search operators, we will use semantics, analyzed by (large language models), so it provides far more relevant results even though the keywords are unmatched.

FYI, services like Algolia have AI-powered search too, but this article is about building your own AI-powered search, not about whether vector embeddings are better than other solutions or vice versa.

Test out my AI-powered search bar implementations here using Cloudflare Workers AI here:

What AI Can Do Today - 20k+ results here, so it's quite good
Awesome AI Tools - less results, so it's showing neighbouring results

My preferred stack

I'll be sharing how to use Cloudflare Workers AI to build a search bar for your platform. The stacks in this article will be:

Cloudflare Workers AI (obviously): To generate BAAI vector embeddings
PostgreSQL: To store vector embeddings (you can use specialized vector databases like Milvus, Pinecone, Weaviate, etc. but I like to have 1 DB)
pgvector extension: To enable vector embeddings for PostgreSQL

I use Laravel + PHP for my projects, but I'll be keeping my article writeup as agnostic as possible for everyone.

Crash Course on Vector Embeddings

A vector is a set of coordinates that represent a point in space. So, a 2-dimensional vector is basically (X,Y), and a 3-dimensional vector is (X,Y,Z). An embedding is a low-dimensional representation of a context (text, images, etc.) using vectors.

Of course, we can't represent much context with a 3-dimensional vector, which is why LLM embedding models can go up to 1024 dimensions, maybe even more!

Every vector embedding model has a different count of tokens for a set of text. In general, we assume a set of text is about 70-80% tokens - e.g., 620-680 words per 512 tokens. The final token count will usually be returned by the model output.

Token Sequence length means the maximum token length the model can convert into the vector representation.

For more vector embedding LLM models, check out the Hugging Face model repository: https://huggingface.co/BAAI/bge-large-en

We will be using the bge-small-en-v1.5 model (my favorite) because BAAI/bge models are highly ranked in the Massive Text Embedding Benchmark (MTEB) leaderboard, and Cloudflare has the models ready, and the small only needs 384 dimension with decent performance and uses lesser size compared to base and large models.

In short, Cloudflare Workers AI will help you to do the following:

"The quick brown fox jumps..." becomes a set of vectors of 384 dimensions [1, 1.24, 1.2, ..., 1.23].

Setting up PostgreSQL

Install the pgvector PostgreSQL extension into your PostgreSQL DB and enable the extension:

CREATE EXTENSION vector;

Create your new searches table with a 384-dimensional vector embedding column. I used text so you can store a sentence of up to 512 token length:

CREATE TABLE searches (
  id bigserial PRIMARY KEY,
  query TEXT,
  embedding vector(384)
);

If you have an input that is longer than 512 token length (e.g., this article or PDF document content), depending on your final application, you can chunk the sentences into 512 tokens per row. This is how AI-powered PDF search is usually built. In my projects awesomeaitools.com and whataicandotoday.com, my query values are less than 4-5 words per stored row, as I want to achieve search for more relevant results within a small set of data.

Generating vector embeddings and storing in PostgreSQL

Follow the steps to setup a Cloudflare Workers AI: https://developers.cloudflare.com/workers-ai/

You can refer to my Github Gist script while setting up your Cloudflare Workers AI: https://gist.github.com/charlesteh/723f2daf51b041287e02b9f89c1e02c7

The Cloudflare Workers AI script takes a text query and returns a JSON API response of the embedding model, and you can call the Workers AI script in the form of a GET API REST call.

Store your text input in the query column, and your 384-dimensional embedding in the embedding column:

INSERT INTO searches (query, embedding)
VALUES ('Your string of 512 token lengths here', '[1, 2, 3]');

It is highly preferred to turn all your embeddings into lowercase before calling the API because generally, users do not type capitalized or uppercase in a search bar. You can, however, preserve the original input in the query column.

Getting semantically relevant results with PostgreSQL

Now that we have vectors stored in our database, we can use another vector as a search input to compare the distance between the vectors in your database. The closest distance will be your most relevant result.

Take the search input from your search bar, sanitize and chunk into not more than 512 tokens to prevent embedding failure.
Call the Cloudflare Workers AI GET API to return the embedding.
Do a SQL SELECT to get the closest embedding result:

SELECT * FROM searches
ORDER BY embedding <-> '[3, 1, 2]'
LIMIT 5;

This returns the 5 semantically closest results.

There you go, AI-powered search!

Conclusion

This is one of many approaches to implement vector database search without breaking the bank. For high-performance solutions, you should always consider managed vector database services. I am, however, happy with my current setup, and it's sufficient for my use cases.

Reference

Pgvector: https://github.com/pgvector/pgvector
Cloudflare Workers AI: https://developers.cloudflare.com/workers-ai/
My actual script used in production: https://gist.github.com/charlesteh/723f2daf51b041287e02b9f89c1e02c7
Huggingface MTEB Leaderboard: https://huggingface.co/spaces/mteb/leaderboard