Stage 3 - going beyond keyword search

Why embedding-based retrieval beats keyword search for natural language queries, with an implementation using OpenAI, Pinecone and Chroma.

Ben Rees - 21 April 2025

When building search tools, intelligent assistants, or AI-driven Q&A; systems, one of the most foundational decisions you’ll make is how to retrieve relevant content. Most systems historically use keyword-based search —great for basic use cases, but easily confused by natural language or synonyms.

That’s where embedding-based retrieval comes in.

In this guide, I’ll break down:

The difference between keyword and embedding-based retrieval
Real-world pros and cons
A step-by-step implementation using OpenAI and Pinecone
An alternative local setup using Chroma

Keyword Search vs. Embedding Search

Keyword-Based Retrieval

How it works:
Searches for exact matches between your query and stored content. Works best when both use the same words.

Example:
Query: "What is vector search?"
Returns docs with the exact phrase "vector search".

Pros:

Very fast and low-resource
Easy to explain why a match was returned
Great for structured and exact-match data

Cons:

Doesn’t understand synonyms or phrasing differences
Fails if the words aren’t an exact match

Embedding-Based Retrieval (Semantic Search)

How it works:
Both queries and documents are converted into dense vectors using machine learning models (like OpenAI's text-embedding-ada-002). The system compares their semantic similarity, not just their words.

Example:
Query: "How does semantic search work?"
Returns docs about "meaning-based search" even if the words are different.

Pros:

Understands intent, not just keywords
Great for unstructured content and natural queries
Can surface more relevant results even if phrasing is varied

Cons:

More computationally intensive
Results are harder to explain (based on vector math)
Requires pre-trained models and a vector database

Feature Comparison Table

Feature	Keyword-Based Retrieval	Embedding-Based Retrieval
Search Logic	Matches words exactly	Matches by meaning
Flexibility	Low	High
Speed	Fast	Slower
Resource Use	Low	Higher
Explainability	High	Low
Best For	Structured search	Chatbots, recommendation, unstructured data
Common Tools	Elasticsearch, Solr	Pinecone, Chroma, FAISS

Setting Up Embedding-Based Retrieval

Let’s build a basic semantic search system using:

OpenAI (text-embedding-ada-002)
Pinecone (hosted vector DB)
Chroma (optional local alternative)

1. Choose Your Tools

Embedding model:
OpenAI’s text-embedding-ada-002 or a local Hugging Face model.

Vector database:
Cloud: Pinecone (scalable, managed)
Local: Chroma (open-source, lightweight)

2. Install Required Libraries

pip install openai pinecone-client chromadb

3. Set API Keys

export OPENAI_API_KEY="your-openai-key" export PINECONE_API_KEY="your-pinecone-key"

In Python:

import openai openai.api_key = "your-openai-key"

4. Generate Embeddings

def get_embedding(text): response = openai.Embedding.create( input=text, model="text-embedding-ada-002" ) return response['data'][0]['embedding']

documents = [ {"id": "1", "text": "This is an introduction to embedding-based search."}, {"id": "2", "text": "Embedding-based retrieval finds similar meanings."}, ]

for doc in documents: doc['embedding'] = get_embedding(doc["text"])

5. Store in Pinecone

import pinecone

pinecone.init(api_key="your-pinecone-key", environment="us-east-1")

index_name = "embeddings-index" pinecone.create_index(index_name, dimension=1536)

index = pinecone.Index(index_name)

to_upsert = [(doc['id'], doc['embedding'], {"text": doc["text"]}) for doc in documents] index.upsert(vectors=to_upsert)

6. Perform a Semantic Search

query = "How does semantic search work?" query_embedding = get_embedding(query)

results = index.query(query_embedding, top_k=5, include_metadata=True)

for match in results["matches"]: print(f"ID: {match['id']} | Score: {match['score']}") print(f"Text: {match['metadata']['text']}\n")

Optional: Use Chroma for Local Embedding Search

import chromadb

client = chromadb.Client() collection = client.create_collection("documents")

for doc in documents: collection.add( documents=[doc["text"]], embeddings=[doc["embedding"]], ids=[doc["id"]] )

query_result = collection.query(query_texts=["How does embedding retrieval work?"], n_results=5) print(query_result)

Evaluate the Results

Once you’re set up:

Check result relevance
Tune your top_k or switch models if needed
Add keyword filtering for hybrid search

You now have a foundation for building:

Intelligent assistants
Internal knowledge base search
Chatbots that retrieve based on meaning

What’s Next?

You can scale this up to thousands or millions of documents. Consider:

Crawling blogs, docs, or Notion pages
Combining embeddings with filters or metadata
Using hybrid keyword + embedding pipelines for speed and precision