Vector Search & RAG
Embedding generation, vector storage backends, hybrid search with RRF fusion, auto-embedding via CDC, knowledge collections, and LLM-powered Q&A with source citations.
Vector Search & RAG
The atlas-vector crate provides embedding generation, vector storage, similarity search, and a full RAG (Retrieval-Augmented Generation) pipeline. It supports multiple embedding providers, pluggable vector backends, hybrid search combining vector and full-text results, automatic embedding on data changes, and an ask endpoint that answers questions using your data with source citations.
Overview
Every AltBase project can store vector embeddings alongside regular data and run similarity searches without external infrastructure. The default backend is pgvector, which runs inside your existing PostgreSQL database. For higher-scale workloads, you can switch to Redis, Azure AI Search, or Qdrant.
The RAG pipeline builds on top of vector search: upload documents to knowledge collections, chunk them with configurable overlap, embed each chunk, then query the collection with natural language. The ask endpoint retrieves relevant chunks, sends them to an LLM as context, and returns an answer with citations referencing the source documents.
Key Concepts
Embedding configs define how a table's data gets embedded. Each config specifies the source columns to concatenate, the target embedding column, the provider and model, and whether to auto-embed on changes. Configs are per-project and support different providers simultaneously.
Embedding providers generate vector representations of text. AltBase supports OpenAI (text-embedding-3-small at 1536 dimensions, text-embedding-3-large at 3072 dimensions), Azure OpenAI, Ollama for self-hosted models, and local ONNX models. Provider credentials are stored encrypted in the embedding config's provider_config field.
Batch embedding accepts up to 2048 text inputs per request. Inputs are sent to the provider in a single API call and the resulting vectors can optionally be stored directly into a table column.
Vector backends determine where embeddings are stored and searched. pgvector is the default and requires no additional setup. Redis, Azure AI Search, and Qdrant are available for projects that need dedicated vector infrastructure. The backend is configured per-project.
Hybrid search combines vector similarity and PostgreSQL full-text search (tsvector/ts_rank). Results from both are merged using Reciprocal Rank Fusion (RRF), which interleaves rankings without requiring score normalization. Three search modes are available: vector (cosine similarity only), fulltext (BM25 only), and hybrid (both with RRF).
Auto-embedding uses the CDC (Change Data Capture) pipeline to detect inserts and updates on tables with auto_embed: true configs. When a relevant row changes, the source columns are concatenated, sent to the embedding provider, and the resulting vector is written back to the embedding column automatically.
Knowledge collections group uploaded documents for RAG. Documents are split into overlapping chunks (configurable chunk size and overlap), each chunk is embedded, and the collection becomes searchable. The search endpoint returns ranked chunks; the ask endpoint feeds those chunks to an LLM for synthesis.
LLM integration powers the ask endpoint. Supported LLM providers include OpenAI (GPT-4o, GPT-4o-mini), Azure OpenAI, and Ollama. Responses are streamed when the client accepts text/event-stream.
API Reference
Embedding Generation
| Method | Path | Description | Auth |
|---|---|---|---|
POST | /v1/projects/{id}/embeddings/generate | Generate embeddings for up to 2048 inputs | Service Key |
Embedding Configuration
| Method | Path | Description | Auth |
|---|---|---|---|
POST | /v1/projects/{id}/embeddings/config | Create embedding config for a table | Service Key |
GET | /v1/projects/{id}/embeddings/config | List all embedding configs | Service Key |
GET | /v1/projects/{id}/embeddings/config/{config_id} | Get a specific config | Service Key |
DELETE | /v1/projects/{id}/embeddings/config/{config_id} | Soft-delete a config | Service Key |
Search
| Method | Path | Description | Auth |
|---|---|---|---|
POST | /v1/projects/{id}/search | Vector, fulltext, or hybrid search | API Key |
Vector Backend Settings
| Method | Path | Description | Auth |
|---|---|---|---|
POST | /v1/projects/{id}/settings/vector | Set vector backend for the project | Service Key |
GET | /v1/projects/{id}/settings/vector | Get current vector backend config | Service Key |
Knowledge Collections (RAG)
| Method | Path | Description | Auth |
|---|---|---|---|
POST | /v1/projects/{id}/knowledge | Create a knowledge collection | Service Key |
GET | /v1/projects/{id}/knowledge | List knowledge collections | Service Key |
POST | /v1/projects/{id}/knowledge/{collection}/upload | Upload and auto-chunk a document | Service Key |
POST | /v1/projects/{id}/knowledge/{collection}/search | Semantic search over knowledge base | API Key |
POST | /v1/projects/{id}/knowledge/{collection}/ask | Ask a question (LLM answer + citations) | API Key |
Code Examples
Configure embeddings for a table
curl -X POST http://localhost:3000/v1/projects/$PROJECT_ID/embeddings/config \
-H "Authorization: Bearer $SERVICE_KEY" \
-H "Content-Type: application/json" \
-d '{
"table_name": "products",
"source_columns": ["name", "description"],
"embedding_column": "embedding",
"provider": "openai",
"model": "text-embedding-3-small",
"dimensions": 1536,
"api_key": "sk-...",
"auto_embed": true
}'
Generate embeddings and store them
curl -X POST http://localhost:3000/v1/projects/$PROJECT_ID/embeddings/generate \
-H "Authorization: Bearer $SERVICE_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": ["wireless noise-cancelling headphones", "mechanical keyboard RGB"],
"provider": "openai",
"model": "text-embedding-3-small",
"store": {
"table": "products",
"column": "embedding",
"ids": ["prod-1", "prod-2"]
}
}'
Hybrid search with RRF fusion
curl -X POST http://localhost:3000/v1/projects/$PROJECT_ID/search \
-H "Authorization: Bearer $SERVICE_KEY" \
-H "Content-Type: application/json" \
-d '{
"table": "products",
"query": "comfortable headphones for travel",
"mode": "hybrid",
"limit": 10
}'
Upload a document to a knowledge collection
curl -X POST http://localhost:3000/v1/projects/$PROJECT_ID/knowledge/docs/upload \
-H "Authorization: Bearer $SERVICE_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "User Guide",
"content": "Full text of your document...",
"metadata": {"version": "2.0"}
}'
Ask a question with RAG
curl -X POST http://localhost:3000/v1/projects/$PROJECT_ID/knowledge/docs/ask \
-H "Authorization: Bearer $SERVICE_KEY" \
-H "Content-Type: application/json" \
-d '{"question": "How do I reset my password?"}'
The response includes the LLM-generated answer and an array of source citations referencing the chunks used to produce the answer.
Switch vector backend to Qdrant
curl -X POST http://localhost:3000/v1/projects/$PROJECT_ID/settings/vector \
-H "Authorization: Bearer $SERVICE_KEY" \
-H "Content-Type: application/json" \
-d '{
"backend": "qdrant",
"backend_config": {
"endpoint": "http://qdrant:6334",
"api_key": "your-qdrant-key"
}
}'
Configuration
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | Default OpenAI API key for embedding generation |
ATLAS_OLLAMA_URL | Ollama server URL for local embedding models |
Per-Project Provider Config
Provider credentials are stored per embedding config in the provider_config JSONB column. API keys are encrypted at rest and redacted (first 8 characters only) in GET responses.
Data Model: embedding_configs
| Column | Type | Description |
|---|---|---|
id | UUID | Primary key |
project_id | UUID | FK to projects |
table_name | VARCHAR | Table to embed |
source_columns | TEXT[] | Columns to concatenate for embedding input |
embedding_column | VARCHAR | Column where vectors are stored |
provider | VARCHAR | openai, azure_openai, ollama, local |
model | VARCHAR | Model name (e.g., text-embedding-3-small) |
dimensions | INT | Vector dimensions (default 1536) |
auto_embed | BOOLEAN | Auto-embed on INSERT/UPDATE via CDC |
enabled | BOOLEAN | Soft-delete flag |
provider_config | JSONB | Encrypted API key, base_url, and provider options |
Data Model: project_vector_settings
| Column | Type | Description |
|---|---|---|
project_id | UUID | Primary key |
backend | VARCHAR | pgvector, redis, azure_search, qdrant |
backend_config | JSONB | Backend-specific config (endpoint, api_key) |
updated_at | TIMESTAMPTZ | Last updated |
Vector Backend Options
| Backend | Config Fields | Notes |
|---|---|---|
pgvector | None required | Default. Uses the tenant PostgreSQL database. |
redis | endpoint, password | Requires Redis Stack with RediSearch module. |
azure_search | endpoint, api_key, index_name | Azure AI Search service. |
qdrant | endpoint, api_key | Self-hosted or Qdrant Cloud. |
How It Works
Auto-Embed Flow
- An admin creates an embedding config with
auto_embed: truefor a table. - When a row is inserted or updated, the CDC pipeline detects the change.
- The auto-embed module concatenates the configured source columns into a single text string.
- The text is sent to the configured embedding provider (OpenAI, Azure, Ollama, or local ONNX).
- The resulting vector is written to the
embedding_columnvia a SQLUPDATEon that row.
Hybrid Search Flow
- The client sends a search request with
"mode": "hybrid". - The query text is embedded using the table's configured provider.
- A vector similarity search runs using cosine distance (
<=>operator in pgvector). - A full-text search runs using PostgreSQL
tsvectorandts_rank. - Both result sets are merged using Reciprocal Rank Fusion (RRF):
score = 1 / (k + rank)for each result, summed across both lists. - The merged results are sorted by combined RRF score and returned up to the requested limit.
RAG Ask Flow
- The client sends a question to
/knowledge/{collection}/ask. - The question is embedded using the collection's configured embedding provider.
- Top-K similar chunks are retrieved from the knowledge collection via vector search.
- The retrieved chunks are formatted as context and sent to the configured LLM along with the original question.
- The LLM generates a natural-language answer with citations referencing the source document chunks.
- If the client accepts
text/event-stream, the response is streamed token by token.
Similarity Operators (pgvector)
| Operator | Function | Use Case |
|---|---|---|
<=> | Cosine distance | Default for semantic similarity |
<-> | L2 (Euclidean) distance | Spatial and numeric data |
<#> | Inner product (negative) | Pre-normalized vectors |
Workflows & Integrations
DAG-based workflow automation with 18 node types, NATS-driven step dispatch, SSE monitoring, plus 500+ third-party integrations via Nango OAuth, templated actions, webhooks, and circuit breaker protection.
GraphQL
Auto-generated GraphQL API from PostgreSQL introspection, with queries, mutations, subscriptions, DataLoader batching, RLS enforcement, and built-in GraphiQL IDE.