Ben Rees
AI / GEO / AEOContent Strategy

Stage 3 - going beyond keyword search

Ben Rees - 21 April 2025

When building search tools, intelligent assistants, or AI-driven Q&A; systems, one of the most foundational decisions you’ll make is how to retrieve relevant content. Most systems historically use keyword-based search —great for basic use cases, but easily confused by natural language or synonyms.

That’s where embedding-based retrieval comes in.

In this guide, I’ll break down:

  • The difference between keyword and embedding-based retrieval
  • Real-world pros and cons
  • A step-by-step implementation using OpenAI and Pinecone
  • An alternative local setup using Chroma

Keyword Search vs. Embedding Search

Keyword-Based Retrieval

How it works:
Searches for exact matches between your query and stored content. Works best when both use the same words.

Example:
Query: "What is vector search?"
Returns docs with the exact phrase "vector search".

Pros:

  • Very fast and low-resource
  • Easy to explain why a match was returned
  • Great for structured and exact-match data

Cons:

  • Doesn’t understand synonyms or phrasing differences
  • Fails if the words aren’t an exact match

Embedding-Based Retrieval (Semantic Search)

How it works:
Both queries and documents are converted into dense vectors using machine learning models (like OpenAI's text-embedding-ada-002). The system compares their semantic similarity, not just their words.

Example:
Query: "How does semantic search work?"
Returns docs about "meaning-based search" even if the words are different.

Pros:

  • Understands intent, not just keywords
  • Great for unstructured content and natural queries
  • Can surface more relevant results even if phrasing is varied

Cons:

  • More computationally intensive
  • Results are harder to explain (based on vector math)
  • Requires pre-trained models and a vector database

Feature Comparison Table

FeatureKeyword-Based RetrievalEmbedding-Based Retrieval
Search LogicMatches words exactlyMatches by meaning
FlexibilityLowHigh
SpeedFastSlower
Resource UseLowHigher
ExplainabilityHighLow
Best ForStructured searchChatbots, recommendation, unstructured data
Common ToolsElasticsearch, SolrPinecone, Chroma, FAISS

Setting Up Embedding-Based Retrieval

Let’s build a basic semantic search system using:

  • OpenAI (text-embedding-ada-002)
  • Pinecone (hosted vector DB)
  • Chroma (optional local alternative)

1. Choose Your Tools

Embedding model:
OpenAI’s text-embedding-ada-002 or a local Hugging Face model.

Vector database:
Cloud: Pinecone (scalable, managed)
Local: Chroma (open-source, lightweight)

2. Install Required Libraries

pip install openai pinecone-client chromadb

3. Set API Keys

export OPENAI_API_KEY="your-openai-key" export PINECONE_API_KEY="your-pinecone-key"

In Python:

import openai openai.api_key = "your-openai-key"

4. Generate Embeddings

def get_embedding(text): response = openai.Embedding.create( input=text, model="text-embedding-ada-002" ) return response['data'][0]['embedding']

documents = [ {"id": "1", "text": "This is an introduction to embedding-based search."}, {"id": "2", "text": "Embedding-based retrieval finds similar meanings."}, ]

for doc in documents: doc['embedding'] = get_embedding(doc["text"])

5. Store in Pinecone

import pinecone

pinecone.init(api_key="your-pinecone-key", environment="us-east-1")

index_name = "embeddings-index" pinecone.create_index(index_name, dimension=1536)

index = pinecone.Index(index_name)

to_upsert = [(doc['id'], doc['embedding'], {"text": doc["text"]}) for doc in documents] index.upsert(vectors=to_upsert)

6. Perform a Semantic Search

query = "How does semantic search work?" query_embedding = get_embedding(query)

results = index.query(query_embedding, top_k=5, include_metadata=True)

for match in results["matches"]: print(f"ID: {match['id']} | Score: {match['score']}") print(f"Text: {match['metadata']['text']}\n")

Optional: Use Chroma for Local Embedding Search

import chromadb

client = chromadb.Client() collection = client.create_collection("documents")

for doc in documents: collection.add( documents=[doc["text"]], embeddings=[doc["embedding"]], ids=[doc["id"]] )

query_result = collection.query(query_texts=["How does embedding retrieval work?"], n_results=5) print(query_result)


Evaluate the Results

Once you’re set up:

  • Check result relevance
  • Tune your top_k or switch models if needed
  • Add keyword filtering for hybrid search

You now have a foundation for building:

  • Intelligent assistants
  • Internal knowledge base search
  • Chatbots that retrieve based on meaning

What’s Next?

You can scale this up to thousands or millions of documents. Consider:

  • Crawling blogs, docs, or Notion pages
  • Combining embeddings with filters or metadata
  • Using hybrid keyword + embedding pipelines for speed and precision