Data Sources

Connect vector search backends for retrieval-augmented generation (RAG). Data sources provide semantic search over external knowledge bases.

Quick Start

builder.Services
    .AddCoreAIServices()
    .AddCoreAIOrchestration()

    // Add one or both backends:
    .AddCoreElasticsearchServices(
        builder.Configuration.GetSection("CrestApps:Elasticsearch"))
    .AddCoreAzureAISearchServices(
        builder.Configuration.GetSection("CrestApps:AzureAISearch"));

What is Vector Search?

If you are new to AI-powered search, here is a brief primer on the concepts that data sources rely on.

Embeddings

An embedding is a list of numbers (a vector) that represents the meaning of a piece of text. Two texts with similar meanings produce vectors that are close together in mathematical space — even if they use completely different words. For example, "How do I reset my password?" and "I forgot my login credentials" produce similar embeddings.

Embeddings are generated by specialized AI models (e.g., text-embedding-ada-002 from OpenAI or an equivalent model in Azure AI).

Similarity Search

Given a user's question, vector search converts it into an embedding and finds the documents whose embeddings are closest. This is fundamentally different from keyword search:

	Keyword Search	Vector Search
Query: "password reset"	Matches documents containing "password" and "reset"	Matches documents about authentication help, even if those words don't appear
Handles synonyms?	No	Yes
Handles typos?	Limited	Yes

Retrieval-Augmented Generation (RAG)

RAG combines vector search with generative AI. Instead of asking the AI model to answer from its training data alone, you retrieve relevant documents and inject them into the prompt so the model can ground its response in your actual data. This dramatically reduces hallucinations and lets the model answer questions about private or recent information.

RAG Pipeline

Here is the end-to-end flow when a user sends a query to a profile with data sources configured:

1. User Query: "What is our refund policy?"
       │
       ▼
2. Embedding Generation
   └── The query is converted to a vector using an embedding model
       │
       ▼
3. Vector Search (IVectorSearchService)
   └── The vector is compared against indexed document chunks
   └── Top-N most similar chunks are returned (e.g., top 3)
       │
       ▼
4. Context Enrichment
   └── `DataSourceAICompletionContextBuilderHandler` selects the attached data source
   └── `DataSourceOrchestrationHandler` injects availability/tool guidance
   └── `DataSourcePreemptiveRagHandler` injects retrieved chunks as system context
       │
       ▼
5. AI Completion
   └── The model generates a response grounded in the retrieved documents
       │
       ▼
6. Response with Citations
   └── The response references the source documents

tip

The number of document chunks retrieved (Top-N) is configurable per profile. A higher value provides more context but uses more tokens. The default is 3.

Architecture

Data sources integrate with the orchestration pipeline through three shared framework components:

DataSourceAICompletionContextBuilderHandler copies the selected data source id into the completion context
DataSourceOrchestrationHandler injects data-source availability instructions and keeps the search tool in scope
DataSourcePreemptiveRagHandler performs preemptive retrieval and injects matching chunks into the system message

The same shared framework layer now also exposes IAIDataSourceIndexingService for keeping knowledge-base indexes synchronized with their source indexes. MVC uses that service to rebuild data sources on demand, react to source-content changes, and run periodic background alignment.

User Query
    │
    ▼
DataSourceAICompletionContextBuilderHandler
    │ (attaches data source id)
    ▼
DataSourceOrchestrationHandler
    │ (adds data-source availability + tool guidance)
    ▼
DataSourcePreemptiveRagHandler
    │ (queries vector store)
    ▼
Completion Context (enriched with relevant documents)
    │
    ▼
AI Model (grounds response in retrieved data)

Common Services (Keyed by Provider Name)

Each data source backend registers these services, keyed by its provider name:

Service	Purpose
`IDataSourceContentManager`	Manages content in data source indices
`IDataSourceDocumentReader`	Reads documents from data source indices
`IAIDataSourceIndexingService`	Rebuilds and repairs knowledge-base indexes from source indexes
`IODataFilterTranslator`	Translates OData filters to backend-native queries
`ISearchIndexManager`	Creates, deletes, and manages search indices
`ISearchDocumentManager`	Indexes and removes documents in search indices
`IVectorSearchService`	Performs vector similarity search

Available Backends

Backend	Extension	Provider Name	Documentation
Elasticsearch	`AddCoreElasticsearchServices()`	`"Elasticsearch"`	Elasticsearch
Azure AI Search	`AddCoreAzureAISearchServices()`	`"AzureAISearch"`	Azure AI Search

Key Interfaces Deep Dive

`IVectorSearchService`

Performs vector similarity search against an index. This is the core search operation used during RAG.

public interface IVectorSearchService
{
    Task<IEnumerable<DocumentChunkSearchResult>> SearchAsync(
        IIndexProfileInfo indexProfile,
        float[] embedding,
        string referenceId,
        string referenceType,
        int topN,
        CancellationToken cancellationToken = default);
}

The embedding parameter is the vector representation of the user's query. The topN parameter controls how many chunks to return.

`ISearchDocumentManager`

Manages the lifecycle of documents within a search index — adding, updating, and removing documents.

public interface ISearchDocumentManager
{
    Task<bool> AddOrUpdateAsync(
        IIndexProfileInfo profile,
        IReadOnlyCollection<IndexDocument> documents,
        CancellationToken cancellationToken = default);

    Task DeleteAsync(
        IIndexProfileInfo profile,
        IEnumerable<string> documentIds,
        CancellationToken cancellationToken = default);

    Task DeleteAllAsync(
        IIndexProfileInfo profile,
        CancellationToken cancellationToken = default);
}

`ISearchIndexManager`

Creates and manages search indexes themselves (not the documents within them).

public interface ISearchIndexManager
{
    Task<bool> ExistsAsync(string indexFullName, CancellationToken cancellationToken = default);

    Task CreateAsync(
        IIndexProfileInfo profile,
        IReadOnlyCollection<SearchIndexField> fields,
        CancellationToken cancellationToken = default);

    Task DeleteAsync(string indexFullName, CancellationToken cancellationToken = default);
}

`IDataSourceContentManager`

A higher-level service that searches for document chunks with optional OData filtering and manages data source content.

public interface IDataSourceContentManager
{
    Task<IEnumerable<DataSourceSearchResult>> SearchAsync(
        IIndexProfileInfo indexProfile,
        float[] embedding,
        string dataSourceId,
        int topN,
        string filter = null,
        CancellationToken cancellationToken = default);

    Task<long> DeleteByDataSourceIdAsync(
        IIndexProfileInfo indexProfile,
        string dataSourceId,
        CancellationToken cancellationToken = default);
}

The optional filter parameter accepts an OData expression that is translated to the backend-native query language via IODataFilterTranslator.

`IAIDataSourceIndexingService`

Coordinates full and partial synchronization between a source index and its AI knowledge-base index.

public interface IAIDataSourceIndexingService
{
    Task SyncAllAsync(CancellationToken cancellationToken = default);
    Task SyncDataSourceAsync(AIDataSource dataSource, CancellationToken cancellationToken = default);
    Task SyncSourceDocumentsAsync(IEnumerable<string> documentIds, CancellationToken cancellationToken = default);
    Task RemoveSourceDocumentsAsync(IEnumerable<string> documentIds, CancellationToken cancellationToken = default);
    Task DeleteDataSourceDocumentsAsync(AIDataSource dataSource, CancellationToken cancellationToken = default);
}

SyncDataSourceAsync() performs a full rebuild for a data source by deleting that data source's existing chunk documents and re-reading the mapped source index through IDataSourceDocumentReader. SyncSourceDocumentsAsync() and RemoveSourceDocumentsAsync() are intended for source-level handlers so article or catalog updates can keep the knowledge-base index aligned without waiting for the next scheduled full sync.

`IDataSourceDocumentReader`

Reads raw documents from a source index, typically used during re-indexing or migration.

public interface IDataSourceDocumentReader
{
    IAsyncEnumerable<KeyValuePair<string, SourceDocument>> ReadAsync(
        IIndexProfileInfo indexProfile,
        string keyFieldName,
        string titleFieldName,
        string contentFieldName,
        CancellationToken cancellationToken = default);

    IAsyncEnumerable<KeyValuePair<string, SourceDocument>> ReadByIdsAsync(
        IIndexProfileInfo indexProfile,
        IEnumerable<string> documentIds,
        string keyFieldName,
        string titleFieldName,
        string contentFieldName,
        CancellationToken cancellationToken = default);
}

`IODataFilterTranslator`

Translates OData $filter expressions into backend-native query syntax. Each backend (Elasticsearch, Azure AI Search) has its own implementation.

public interface IODataFilterTranslator
{
    string Translate(string odataFilter);
}

For example, the Elasticsearch translator converts category eq 'support' into an Elasticsearch query DSL filter targeting the filters.category field.

Adding a Custom Backend

To add a custom vector store backend (e.g., Pinecone, Qdrant, Weaviate), implement all six keyed services:

const string providerName = "MyBackend";

builder.Services.AddKeyedScoped<IVectorSearchService, MyVectorSearchService>(providerName);
builder.Services.AddKeyedScoped<ISearchIndexManager, MySearchIndexManager>(providerName);
builder.Services.AddKeyedScoped<ISearchDocumentManager, MySearchDocumentManager>(providerName);
builder.Services.AddKeyedScoped<IDataSourceContentManager, MyDataSourceContentManager>(providerName);
builder.Services.AddKeyedScoped<IDataSourceDocumentReader, MyDataSourceDocumentReader>(providerName);
builder.Services.AddKeyedScoped<IODataFilterTranslator, MyODataFilterTranslator>(providerName);

Example: Custom Vector Search Implementation

public sealed class PineconeVectorSearchService : IVectorSearchService
{
    private readonly PineconeClient _client;

    public PineconeVectorSearchService(PineconeClient client)
    {
        _client = client;
    }

    public async Task<IEnumerable<DocumentChunkSearchResult>> SearchAsync(
        IIndexProfileInfo indexProfile,
        float[] embedding,
        string referenceId,
        string referenceType,
        int topN,
        CancellationToken cancellationToken = default)
    {
        var response = await _client.QueryAsync(new QueryRequest
        {
            Vector = embedding,
            TopK = topN,
            Namespace = indexProfile.IndexFullName,
            IncludeMetadata = true,
        }, cancellationToken);

        return response.Matches.Select(match => new DocumentChunkSearchResult
        {
            DocumentId = match.Id,
            Score = match.Score,
            Content = match.Metadata["content"].ToString(),
        });
    }
}

warning

All six services must be registered with the same providerName key. The framework resolves them by key at runtime based on the data source configuration.

Configuration Guide

Data source backends are configured in appsettings.json under the CrestApps:Search section. Each backend has its own configuration section:

{
  "CrestApps": {
    "Search": {
      "Elasticsearch": {
        "Url": "https://localhost:9200",
        "Username": "elastic",
        "Password": "your-password"
      },
      "AzureAISearch": {
        "Endpoint": "https://my-search.search.windows.net",
        "ApiKey": "your-admin-api-key"
      }
    }
  }
}

The configuration section is passed to the provider registration extension method:

// Bind from configuration
builder.Services.AddCoreElasticsearchServices(
    builder.Configuration.GetSection("CrestApps:Elasticsearch"));

// Or bind Azure AI Search
builder.Services.AddCoreAzureAISearchServices(
    builder.Configuration.GetSection("CrestApps:AzureAISearch"));

See the individual backend pages for detailed configuration options: Elasticsearch | Azure AI Search

Quick Start​

What is Vector Search?​

Embeddings​

Similarity Search​

Retrieval-Augmented Generation (RAG)​

RAG Pipeline​

Architecture​

Common Services (Keyed by Provider Name)​

Available Backends​

Key Interfaces Deep Dive​

IVectorSearchService​

ISearchDocumentManager​

ISearchIndexManager​

IDataSourceContentManager​

IAIDataSourceIndexingService​

IDataSourceDocumentReader​

IODataFilterTranslator​

Adding a Custom Backend​

Example: Custom Vector Search Implementation​

Configuration Guide​