Building Enterprise AI Agents with Semantic Kernel and RAG in .NET

1 The Modern Enterprise Knowledge Challenge

Enterprise teams are increasingly adding AI assistants to internal tools, but most quickly run into the limits of simple RAG implementations. A basic “vector search → LLM answer” pipeline works for demos and narrow use cases, yet it starts to fail once it is exposed to real enterprise conditions: cross-department questions, inconsistent data sources, strict security boundaries, and audit requirements. The problem is not retrieval alone. It is coordination, reasoning, and trust.

This is where AI agents become necessary. In this context, an AI agent is not just a prompt wrapped around a vector store. An enterprise AI agent is a system that can reason over a goal, decide which tools to use, execute multi-step retrieval and actions, and produce answers that remain grounded, permission-aware, and traceable. The goal of this guide is to show how to build such agents—ones that operate reliably across SharePoint, shared drives, and line-of-business systems using .NET and Semantic Kernel.

Before getting into architecture and implementation, it is important to understand why traditional RAG breaks down and what additional capabilities enterprise agents must provide.

1.1 Beyond Simple RAG: Why Agentic Retrieval Matters

A simple RAG workflow assumes the user query is already well-formed, maps cleanly to stored documents, and can be answered through a single retrieval step followed by text generation. In enterprise environments, those assumptions rarely hold. Real questions are often incomplete, cross-cutting, or expressed in business language that does not match document structure.

Consider the question:

“How do I request a contractor laptop when onboarding someone next month?”

A basic RAG system will embed the question and retrieve a handful of semantically similar documents. Sometimes that works. More often, it produces partial or misleading answers because the system does not understand the intent behind the question.

An enterprise AI agent handles this differently. It treats the query as a goal, not a search string. To answer correctly, the agent must:

Recognize that the request spans multiple domains (HR onboarding and IT device provisioning).
Detect missing constraints such as location, employment type, or lead time.
Decide which tools to invoke (HR policy search, IT SOP lookup, procurement guidelines).
Sequence those steps correctly and synthesize a single response.
Attach citations that reflect where each part of the answer came from.

This behavior is what distinguishes an agent from simple RAG. The agent is autonomous within defined boundaries. It reasons, selects tools, and performs multi-step retrieval before generating an answer.

At a high level, the enterprise agent you are building in this guide will support these core capabilities:

Goal-driven reasoning rather than single-step retrieval
Dynamic tool selection across SharePoint, shared drives, and APIs
Domain routing to avoid mixing HR, IT, and Operations content
Memory-backed retrieval with hybrid semantic and keyword search
Provenance tracking so every claim can be verified

Agentic retrieval introduces additional steps that adapt at runtime instead of following a fixed pipeline:

Query refactoring, where ambiguous questions are expanded or structured.
Query routing, where parts of the request are sent to the correct domain or plugin.
Multi-stage retrieval, combining keyword search, vector search, metadata filters, and parent-document reconstruction.

These steps allow the system to behave more like a senior employee who knows where to look and how to connect information, rather than a search box that returns loosely related text.

1.2 The Challenge of Departmental Silos: How Agents Work Across HR, IT, and Operations

Most enterprises do not have a single, clean knowledge store. Information is spread across systems that evolved independently, each with its own structure and conventions. HR policies live in tightly controlled SharePoint libraries. IT procedures are scattered across wikis, ticket exports, and network shares. Operations teams rely heavily on spreadsheets, diagrams, and vendor documentation.

This fragmentation is not just a data problem. It is an agent coordination problem. The agent must understand how to navigate these silos without blending incompatible sources or producing answers that violate domain boundaries.

1.2.1 Handling Heterogeneous Structures Through Dynamic Tool Selection

HR documents tend to be formal, versioned, and policy-driven. IT documentation is often procedural and updated frequently. Operations content may encode meaning in tables, formulas, or diagrams rather than prose. Treating all of this as plain text embeddings produces unreliable results.

An enterprise agent addresses this by selecting different tools and retrieval strategies based on the domain. For example, HR queries may prioritize SharePoint connectors with strong version metadata, while IT queries may rely more on shared-drive ingestion or live Graph searches. The agent does not assume a single representation fits all content types.

1.2.2 Resolving Divergent Taxonomies with Agent-Level Routing

Departments often describe the same concept using different terms. What IT calls “end-user computing,” HR might call “employee equipment,” and Operations might refer to “asset provisioning.” A keyword-based system struggles here.

An agent mitigates this through classification and routing. Before retrieval, it classifies the intent of the query and maps it to the correct domain vocabulary. This prevents the agent from pulling irrelevant documents simply because the wording happens to be similar. Routing also constrains which plugins and indexes the agent is allowed to use for a given request.

1.2.3 Staying Aligned with Asynchronous Update Cycles

Different systems change at different speeds. SharePoint content may update weekly after approvals, while shared drives change multiple times a day. If ingestion pipelines are not aligned, the agent will answer confidently using outdated information.

Agents reduce this risk by combining pre-indexed memory with real-time tools. Stable policy documents come from indexed stores, while time-sensitive information can be fetched live. The agent decides when freshness matters and adjusts its retrieval strategy accordingly.

The result is a system that presents a unified interface to the user while respecting the realities of how enterprise knowledge is actually maintained.

1.3 Addressing the “Trust Gap”: Citations, Provenance, and Verifiable Answers

Enterprise adoption depends on trust. Users will not rely on an AI agent if they cannot verify where answers come from. This is especially true for HR, legal, security, and compliance scenarios. A confident answer without evidence is worse than no answer at all.

Citations are necessary, but for agents they serve a deeper purpose than simple attribution. Agents must track provenance across multi-step reasoning paths, not just attach a document link at the end. When an answer is assembled from multiple sources, the agent needs to remember which step produced which fact.

Verifiable references address several concerns:

1.3.1 Transparency

Each statement in the response can be traced back to:

A specific document
A SharePoint item or file identifier
A page, section, or paragraph
A known version or timestamp

This makes it clear what the agent relied on, even when the reasoning involved multiple retrieval steps.

1.3.2 Accountability

When policies change, teams can verify whether the agent used the latest approved version or an older one. This matters in regulated environments where outdated guidance can create real risk.

1.3.3 Navigability

Deep links back to SharePoint or shared drives allow users to review the original context immediately. This turns the agent into a guided entry point to enterprise knowledge rather than a black box.

For this reason, enterprise RAG systems include a dedicated Reference Engine. Its job is to convert internal identifiers—such as SharePoint GUIDs or drive item IDs—into user-accessible links and maintain the mapping between retrieved chunks and generated answers. Without this layer, agent responses cannot be reliably audited or trusted.

2 The .NET Agent Technology Stack

Building enterprise AI agents in .NET is less about choosing a single framework and more about assembling a reliable agent runtime. That runtime needs to support planning, tool execution, memory, security, and observability—while remaining predictable enough for production use. This section focuses on the parts of the .NET ecosystem that directly enable agent behavior, not just generic AI calls.

At a high level, the stack breaks down into four layers:

Agent orchestration (how decisions and steps are coordinated)
Model execution (LLMs and embeddings)
Memory and retrieval (vector stores, hybrid search)
Enterprise integrations (SharePoint, OneDrive, identity)

Each component plays a specific role in enabling agents to reason, act, and explain their answers.

2.1 Semantic Kernel and AutoGen: When to Use Each (and When to Combine Them)

Most .NET-based agent systems today are built using either Semantic Kernel (SK) or AutoGen. Both are agent frameworks, but they optimize for very different styles of reasoning and control.

2.1.1 Semantic Kernel: Deterministic Agent Orchestration

Semantic Kernel is designed for structured, tool-driven agents. It gives you strong control over what the agent can do, how it does it, and in what order.

In practice, SK is used to:

Register tools (plugins) such as SharePoint search, SQL queries, or REST APIs
Enable function calling with clear input and output contracts
Attach memory stores for retrieval and grounding
Enforce execution boundaries that matter in enterprise systems

This makes SK a good fit when the agent must interact with internal systems that expect predictable behavior—HR systems, IT service management tools, or financial APIs.

A corrected, production-aligned example of configuring SK with Azure OpenAI looks like this:

var builder = Kernel.CreateBuilder();

builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o",
    endpoint: new Uri(config.Endpoint),
    apiKey: config.ApiKey);

builder.Plugins.AddFromType<SharePointSearchPlugin>();

Kernel kernel = builder.Build();

Here, the kernel becomes the execution engine for the agent. Every decision—whether to call a plugin, retrieve memory, or respond directly—flows through this controlled surface.

2.1.2 AutoGen: Multi-Agent and Emergent Reasoning

AutoGen approaches the problem from the opposite direction. Instead of strict orchestration, it focuses on conversation between agents. Each agent has a role and communicates with others to refine ideas or solve complex problems.

Common AutoGen patterns include:

A retrieval-focused agent that gathers evidence
A reasoning agent that synthesizes conclusions
A critique or validation agent that checks assumptions

This is useful when problems are open-ended or benefit from exploration. However, AutoGen requires more guardrails in enterprise environments, especially when tools can mutate state or access sensitive systems.

2.1.3 Combining Semantic Kernel and AutoGen

In practice, many mature systems use both frameworks together. A common pattern looks like this:

Semantic Kernel acts as the primary orchestrator, handling routing, tool access, permissions, and memory.
AutoGen is invoked as a sub-agent when deeper reasoning is required, such as policy interpretation or multi-step troubleshooting.
The SK orchestrator treats AutoGen like any other tool, with bounded input and output.

This hybrid approach gives teams the best of both worlds: predictable system behavior with the ability to escalate complex reasoning when needed.

2.2 LLM Selection for Agents: Practical Trade-Offs

Model selection for agents is different from model selection for simple chatbots. Agents run loops: classify, retrieve, call tools, reason, and respond. That puts pressure on token limits, function calling reliability, and structured output quality.

2.2.1 GPT-4o

GPT-4o is often the default model for enterprise agents because it performs well across several agent-specific requirements:

Reliable function calling with low error rates
Strong structured JSON output for tool invocation
Good latency characteristics for multi-step agent loops
Native support for multimodal inputs such as PDFs

It is well-suited for orchestrator roles where the model needs to decide what to do next, not just generate text.

2.2.2 OpenAI o1 Models

The o1 family is optimized for deep reasoning, which makes it useful when agents need to reason over policies, logs, or complex operational scenarios. These models handle long chains of thought well, but there are trade-offs:

Higher latency increases the cost of agent loops
Higher token usage requires stricter budgeting
Function calling is supported, but best used behind routing logic

For this reason, o1 models are typically invoked selectively, based on router-agent classification.

2.2.3 Claude 3.5 Sonnet, Opus, and Claude 4

Anthropic models are often chosen for agents that need:

Very large context windows for multi-document reasoning
Strong adherence to instructions and structured outputs
Conservative, safety-oriented behavior

Sonnet works well for everyday agent tasks at a lower cost. Opus and Claude 4 are better suited for long-running analytical agents, especially when output quality matters more than speed.

2.2.4 Agent-Oriented Model Considerations

When choosing models for agents, teams should evaluate:

Function calling stability across retries
Token ceilings for multi-step loops
JSON schema adherence for planning outputs
Latency under repeated calls

A typical production setup chains models together: smaller, faster models for routing and tool selection, and larger models for final synthesis.

2.3 Essential .NET Libraries for Agent Systems

Beyond frameworks and models, several libraries form the backbone of enterprise agent implementations in .NET.

2.3.1 `Microsoft.Extensions.AI`: Model and Telemetry Abstraction

Microsoft.Extensions.AI provides a consistent way to work with chat models, embeddings, and telemetry across providers. It is not an agent framework by itself, but it underpins agent runtimes by standardizing how models are called and observed.

A corrected example using current APIs looks like this:

builder.Services.AddChatCompletion(serviceProvider =>
    new AzureOpenAIChatCompletionService(
        deploymentName: "gpt-4o",
        endpoint: new Uri(config.Endpoint),
        apiKey: config.ApiKey));

This abstraction enables:

Provider-agnostic model swapping
Consistent logging and metrics
Integration with OpenTelemetry pipelines

For agents, this matters because every tool call and reasoning step must be observable.

2.3.2 `Microsoft.SemanticKernel.Agents`: The Agent Abstraction Layer

The Microsoft.SemanticKernel.Agents namespace introduces first-class agent concepts on top of the kernel. It formalizes ideas such as agent roles, message history, and execution boundaries.

This layer is where planners, router agents, and role-specific agents are defined. Instead of treating the kernel as a raw execution engine, agents provide structure around intent, context, and lifecycle.

As systems grow, this abstraction becomes essential for managing multiple cooperating agents without turning orchestration logic into unmaintainable glue code.

2.3.3 Kernel Memory: Document Ingestion and Retrieval

Kernel Memory builds on Semantic Kernel to handle document ingestion, chunking, embeddings, and retrieval. It is designed specifically for RAG-style workloads where traceability and metadata matter.

It supports:

Multiple chunking strategies
Pluggable vector stores
Metadata enrichment for citations
Parent–child document reconstruction

A representative setup looks like:

var memory = new KernelMemoryBuilder()
    .WithAzureOpenAITextEmbedding(
        deploymentName: "text-embedding-3-large",
        endpoint: config.Endpoint,
        apiKey: config.ApiKey)
    .WithAzureAISearch(config.SearchEndpoint, config.SearchKey)
    .Build();

Kernel Memory acts as the long-term knowledge layer for agents, complementing real-time tool calls.

2.3.4 Microsoft Graph SDK: Enterprise Knowledge Access

The Graph SDK remains essential for accessing SharePoint and OneDrive securely. Agents use it for live queries, permission checks, and link resolution.

Typical access patterns include:

var stream = await graphClient
    .Sites[siteId]
    .Drives[driveId]
    .Items[itemId]
    .Content
    .GetAsync();

Graph integration ensures that agents respect user permissions, retrieve current content, and generate verifiable citations—capabilities that are non-negotiable in enterprise environments.

3 Enterprise Architectural Blueprint

With the fundamentals in place, we can now define an enterprise-ready architectural blueprint that balances retrieval quality, system reliability, and user trust. This architecture reflects a proven pattern for SharePoint-based knowledge ecosystems built on .NET services and orchestrated through AI agents.

Figure 1 conceptually illustrates the complete agent architecture and how requests flow through the system:

User
  |
  v
Chat Interface
(Blazor / Web UI, Entra ID Auth)
  |
  v
Router / Gatekeeper Agent
  |  (classify intent, select domain, plan steps)
  v
Agent Orchestration Loop (Semantic Kernel)
(Reason → Act → Observe → Repeat)
  |
  v
Hybrid Retrieval & Tools Layer
  |-- Vector Search (semantic)
  |-- BM25 Search (keyword)
  |-- Live Tools / Plugins
      (SharePoint, File Shares, APIs)
  |
  v
Re-Ranker / Reasoning Model
  |
  v
Reference Engine
  |  (IDs → permission-aware URLs,
  |   versioning, provenance)
  v
Response + Verifiable Citations
  |
  v
Agent State & Memory
  |-- Short-term (conversation context)
  |-- Long-term (durable knowledge, preferences)

Figure 1 illustrates an enterprise AI agent architecture in which user requests are authenticated, classified, planned, and executed through a controlled agent loop. Retrieval is hybrid and tool-aware, reasoning is multi-step, and all responses are grounded with verifiable references and auditable state.

3.1 The Router Agent Pattern: Gatekeeper Query Classification

The Router Agent (also called a Gatekeeper Agent) is the first active decision-maker in the system. Rather than acting as a simple classifier, it performs lightweight agentic reasoning before any retrieval occurs. This reduces cost, improves answer quality, and prevents unnecessary calls to expensive models or tools.

3.1.1 Router Agent Decision Loop

At runtime, the Router Agent follows a consistent decision loop:

Observe → Classify → Plan → Execute → Evaluate → Respond

Observe – Inspect the user query, conversation history, and user context.
Classify – Determine the domain (HR, IT, Operations, Compliance, etc.).
Plan – Decide which retrieval strategies, tools, and models are required.
Execute – Invoke search, plugins, or downstream agents.
Evaluate – Validate whether sufficient evidence was retrieved.
Respond – Forward a structured request to the reasoning or answer agent.

This loop ensures the agent behaves predictably and transparently, even though the underlying models are probabilistic.

3.1.2 Responsibilities of the Router Agent

Identify the business domain (HR, IT, Operations, Compliance, Procurement, General).
Select the retrieval strategy (keyword, vector, or hybrid).
Choose the appropriate model (e.g., fast general model vs. deep reasoning model).
Determine whether external tools are required (SharePoint Search, ServiceNow, SQL).
Attach routing metadata for logging and observability.

Example routing prompt (simplified):

You are a routing agent.
Classify the user query into: HR, IT, Operations, Other.
Decide if multi-step reasoning is required.

Return JSON:
{
  "category": "IT",
  "reasoningRequired": true,
  "tools": ["SharePointSearch"]
}

3.1.3 Benefits

Reduces hallucinations by constraining downstream reasoning to the correct domain.
Improves latency through early filtering and model selection.
Enables domain-level metrics, tracing, and auditability.

3.2 Hybrid Retrieval Strategy: Combining Vector and BM25 Search

Enterprise retrieval must support both conceptual questions and exact-match lookups. A hybrid strategy combines semantic vector search with traditional BM25 keyword search to achieve this balance.

3.2.1 Semantic Search (Vector-Based)

Vector search is used for natural language questions such as:

“How do I request VPN access?”

Embeddings capture meaning and intent rather than exact wording, making this approach resilient to phrasing differences.

3.2.2 Keyword Search (BM25)

Keyword search excels at precise identifiers and structured terms, such as:

“SOP-IT-0041”
“Form HR-12”
“ISO 27001 Annex A”

For numeric or code-like queries, BM25 consistently outperforms vector-only approaches.

3.2.3 Combined Ranking

Results from both channels can be merged using heuristic scoring:

combinedScore = (semanticScore * 0.7) + (keywordScore * 0.3)

Or through LLM-based re-ranking:

Retrieve top N results from both searches.
Pass them to a re-ranking model.
Return a single ordered list with citation metadata.

Hybrid retrieval significantly improves recall while preserving precision, which is non-negotiable for policy and compliance scenarios.

3.3 Agent Memory and State Management

Enterprise agents must maintain continuity across turns and across sessions. This requires explicit memory and state management rather than relying solely on model context windows.

3.3.1 Short-Term vs. Long-Term Memory

Short-term memory Maintains conversational context (current topic, follow-up questions, references). Typically stored in an in-memory cache or fast store (e.g., Redis) and scoped to a session.
Long-term memory Stores durable context such as user preferences, prior interactions, and frequently accessed documents. Persisted in databases or vector stores and reused across sessions.

Agents can selectively write to long-term memory only when information is stable and valuable.

3.3.2 Agent State Persistence

Agent state captures where the agent is in a workflow:

Routing decisions
Retrieval results used
Tool calls executed
Partial reasoning steps

State is persisted between requests to support:

Resumable workflows
Human review and audit trails
Compliance and incident investigation

In practice, state is stored as structured JSON tied to a request or conversation ID, enabling deterministic replay of agent behavior.

3.4 The Reference Engine: Converting Internal IDs to URLs

SharePoint uses internal GUID-based identifiers during ingestion, but end users require human-readable, permission-aware URLs.

The Reference Engine is responsible for:

Mapping driveItemId values to canonical URLs
Resolving list items, site paths, and library names
Creating deep links (PDF page anchors, Office links)
Applying On-Behalf-Of (OBO) tokens to enforce access control

Example transformation:

Input:  driveItemId = "01A3BC7DEF..."
Output: https://tenant.sharepoint.com/sites/IT/Shared%20Documents/Policies/VPNPolicy.pdf

This layer is essential for trust—citations must be verifiable and accessible to the user who receives them.

3.5 Optional: When to Add Knowledge Graphs

Knowledge graphs are optional and should be introduced only when query patterns demand relationship reasoning. Typical triggers include:

Complex role or approval hierarchies
Multi-step business processes spanning systems
Regulatory mappings and impact analysis

Agents can use graph queries as tools, alongside search and SQL, to answer questions such as:

“Show all SOPs impacted when the Access Control Policy changes.”

While powerful, graphs require:

Strong governance and taxonomy alignment
Dedicated ingestion and maintenance pipelines

Most teams should start with vector and keyword retrieval, then incrementally introduce graphs once relationship-driven queries become common and well-defined.

4 Data Ingestion: Connecting SharePoint and Shared Drives

An enterprise AI agent is only as effective as the data it can reliably retrieve. While ingestion often looks like a pure data-engineering concern, an agent’s retrieval quality, grounding accuracy, and trustworthiness depend directly on how documents are indexed, segmented, and kept fresh. In practice, ingestion is the first “hidden dependency” of every agent decision loop.

This section explains how SharePoint content is indexed using the Azure AI Search SharePoint Connector, how legacy file shares are ingested using a .NET worker service, and how chunking strategies directly affect agent reasoning, context usage, and drill-down behavior. Together, these pipelines form the knowledge substrate that agents query through plugins and tools during runtime.

4.1 SharePoint Indexing: Implementing the Azure AI Search SharePoint Connector

The Azure AI Search SharePoint Connector provides a managed, M365-native way to ingest SharePoint Online content for agent retrieval. Instead of building custom crawlers, the connector handles Entra ID authentication, permission trimming, incremental updates, and metadata hydration automatically.

From an agent’s perspective, this connector creates a searchable knowledge surface that can be queried through a plugin or tool call at runtime. The agent does not “browse SharePoint”; it queries the indexed representation produced by this connector.

A typical setup includes:

A Data Source connected to a SharePoint site or library
A Skillset to enrich or split content
An Indexer that pushes documents into a vector or hybrid index

The connector normalizes diverse SharePoint artifacts—lists, libraries, and pages—into consistent fields such as title, content, path, author, ACLs, and timestamps like lastModified. This metadata is critical for routing, filtering, and freshness checks during agent responses.

Example data source configuration (simplified):

{
  "type": "Microsoft.Search/searchServices/dataSources",
  "apiVersion": "2023-11-01",
  "name": "sp-docs-source",
  "properties": {
    "type": "sharepoint",
    "credentials": {
      "connectionString": "resourceId=/subscriptions/.../providers/Microsoft.Web/connections/sharepointonline"
    },
    "container": {
      "name": "sites/IT",
      "query": "Shared Documents"
    }
  }
}

Once indexed, the agent queries this content through a search plugin. Conceptually, the agent interaction looks like:

var results = await sharePointSearchPlugin.SearchAsync(
    query: "VPN access policy",
    top: 5,
    filters: "department eq 'IT'"
);

In production environments, teams often configure separate connectors per department. This allows the router agent to select the correct index based on domain classification while still supporting hybrid retrieval across the enterprise.

4.2 Shared Drive Crawling: Building a .NET Worker Service for Legacy File Shares

Despite cloud adoption, many organizations still rely on SMB file shares for operational and historical content. To make this information accessible to AI agents, teams typically implement a .NET Worker Service that continuously ingests files into the same retrieval layer used by SharePoint.

From an agent standpoint, this worker extends the agent’s “world knowledge” rather than introducing a parallel system. Once ingested, file-share content is queried through the same plugins and ranking logic as SharePoint documents.

A common worker pattern involves:

Scanning configured directories
Detecting changes via hashing or timestamps
Extracting text
Chunking content
Writing embeddings and metadata to the index

Example worker outline:

public class FileShareIngestionWorker : BackgroundService
{
    private readonly IEmbeddingStore _store;
    private readonly string _rootPath = @"\\fileserver01\operations";

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            foreach (var file in Directory.EnumerateFiles(_rootPath, "*.*", SearchOption.AllDirectories))
            {
                var hash = HashUtil.Compute(file);
                if (ChangeTracker.HasChanged(file, hash))
                {
                    var text = FileExtractor.Read(file);
                    var chunks = ChunkingEngine.CreateChunks(text);

                    foreach (var chunk in chunks)
                    {
                        await _store.UpsertAsync(new MemoryRecord
                        {
                            Id = $"{file}:{chunk.Offset}",
                            Text = chunk.Text,
                            Metadata = new()
                            {
                                ["path"] = file,
                                ["lastUpdated"] = File.GetLastWriteTimeUtc(file)
                            }
                        });
                    }
                }
            }

            await Task.Delay(TimeSpan.FromMinutes(30), stoppingToken);
        }
    }
}

At runtime, the agent invokes this ingested content through the same retrieval plugin:

var results = await sharedDrivePlugin.SearchAsync(
    query: "operations escalation procedure",
    top: 3
);

Including lastUpdated metadata allows agents to surface freshness indicators in responses (for example: “This document was last updated 18 months ago”), which is especially important for operational or compliance scenarios.

4.3 Smart Chunking Strategies: Designing for Agent Context and Drill-Down

Chunking is not just an ingestion concern—it directly affects how agents reason, cite sources, and manage context window budgets. Poor chunking leads to fragmented answers or irrelevant citations, even when retrieval technically “works.”

Two agent-specific considerations should guide chunking decisions:

Context window efficiency – Larger chunks consume more tokens during grounding.
Drill-down behavior – Agents often retrieve summaries first, then refine answers using smaller, more specific chunks.

Semantic chunking uses language models to split content along natural boundaries such as headings or topic changes. This improves grounding for procedural documents but increases ingestion cost.

Recursive chunking applies size limits hierarchically (section → paragraph → sentence), making it resilient to poorly formatted content like scanned PDFs.

Parent-document chunking introduces hierarchy. Each child chunk references a parent document summary, allowing agents to:

Answer high-level questions using summaries
Drill into specific sections when needed
Avoid stitching together unrelated fragments

This pattern aligns naturally with agent planning and evaluation loops.

Example using Semantic Kernel:

var text = File.ReadAllText("VPNPolicy.pdf.txt");

var splitter = new SemanticTextSplitter(embeddingService);
var chunks = await splitter.SplitTextAsync(text);

foreach (var chunk in chunks)
{
    await memory.UpsertAsync("vpn-policy", chunk.Text, new Metadata
    {
        ["source"] = "SharePoint",
        ["parentDocument"] = "VPNPolicy",
        ["chunkId"] = chunk.ChunkId,
        ["lastUpdated"] = DateTime.UtcNow
    });
}

In practice, teams tune chunk sizes empirically. Smaller chunks (300–500 tokens) work well for FAQs and runbooks, while policy documents often require 700–1000 tokens to preserve clause continuity. Consistency across ingestion pipelines is more important than absolute size, ensuring that hybrid retrieval returns comparable context units for agent reasoning.

5 Security, Permissions, and Identity Passthrough

Unlike simple APIs, enterprise AI agents make autonomous decisions about which tools to invoke, which data to retrieve, and which actions to take. This autonomy is powerful, but it also introduces new security risks. As a result, agents must operate within stricter identity, permission, and authorization boundaries than traditional applications.

In an enterprise setting, retrieval results must always reflect the user’s permissions across SharePoint, OneDrive, and file shares. At the same time, agents must avoid leaking sensitive HR, financial, or personal data during reasoning or response generation. This section describes how identity flows through agent tool chains, how permission trimming and PII redaction are enforced, and how agent actions themselves are constrained.

5.1 On-Behalf-Of (OBO) Flow: Enforcing User-Centric Access Across Agent Tool Chains

The On-Behalf-Of (OBO) flow ensures that when a user interacts with an AI agent, every downstream operation executes under that user’s identity. Rather than granting the agent a high-privilege service account, the agent receives a delegated token representing the signed-in user. This prevents privilege escalation and simplifies auditing.

In agent-based systems, identity propagation often spans multiple steps and tools:

The user signs in to the chat interface (for example, Blazor Server or WebAssembly).
The front end sends a JWT access token to the backend agent host.
The agent exchanges this token using OBO to obtain downstream tokens.
The agent invokes Plugin A (e.g., SharePoint Search).
Plugin A may call Plugin B (e.g., Microsoft Graph or a custom API), passing the delegated token forward.
Every call enforces the same user permissions.

This chained identity propagation ensures that no intermediate plugin can “escape” the user’s security context.

An example OBO exchange using Microsoft.Identity.Client:

var app = ConfidentialClientApplicationBuilder
    .Create(clientId)
    .WithClientSecret(clientSecret)
    .WithAuthority(authority)
    .Build();

var userAssertion = new UserAssertion(accessToken);

var result = await app.AcquireTokenOnBehalfOf(
        new[] { "https://graph.microsoft.com/.default" },
        userAssertion)
    .ExecuteAsync();

That token is then passed through every agent tool invocation:

request.Headers.Authorization =
    new AuthenticationHeaderValue("Bearer", result.AccessToken);

With this model, the agent cannot retrieve documents, folders, or list items the user cannot access directly. This is especially critical for HR and IT scenarios involving compensation data, disciplinary records, or infrastructure diagrams.

Token lifetime management is an important operational detail. Short-lived caching reduces repeated OBO calls while still honoring revocations, conditional access policies, and user sign-out.

5.2 Permission Trimming: Enforcing Access Control in Agent Retrieval

Even with OBO protecting live Graph calls, many agents rely on Azure AI Search indexes that aggregate content from multiple departments. Permission trimming ensures that retrieval results are filtered before the agent reasons over them.

Azure AI Search supports storing Access Control Lists (ACLs) alongside each indexed document. During ingestion, SharePoint permissions are normalized—typically mapped to Entra ID group IDs—and stored as filterable fields.

Example index field:

{
  "name": "allowedGroups",
  "type": "Collection(Edm.String)",
  "filterable": true
}

At query time, the agent constructs a security filter based on the current user’s group memberships:

var filter = $"allowedGroups/any(g: g in ({string.Join(",", userGroups)}))";

var options = new SearchOptions
{
    Filter = filter,
    Size = 10
};

var results = await searchClient.SearchAsync(query, options);

From an agent perspective, permission trimming has an additional requirement: graceful failure. Agents must handle permission-denied scenarios and explain them clearly to users. For example:

“I found a relevant document, but you don’t have access to it. You may need to request permission from the document owner.”

This behavior preserves trust and avoids the impression that the agent is “missing” information or malfunctioning.

5.3 PII Redaction: Deciding When and Where to Sanitize Data

Even when permissions are enforced, many organizations require proactive redaction of personally identifiable information (PII). Agents should never expose raw personal data in responses, and models should not be given unnecessary access to it during reasoning.

There are three common redaction points:

At ingestion (most common) Content is sanitized before it is stored in the index or memory store. Pros: Simplifies retrieval, minimizes risk. Cons: Original data is permanently masked.
At retrieval time Raw content is stored, but redacted just before being sent to the LLM. Pros: Preserves originals for auditing. Cons: More complex and higher risk.
At the agent response level The agent performs a final PII scan on generated responses. Pros: Last line of defense. Cons: Should never be the only control.

Most enterprises combine ingestion-time redaction with agent-level response checks.

Presidio is commonly used to detect and redact names, phone numbers, SSNs, passport numbers, and birth dates. Example redaction service:

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

def redact(text):
    analyzer = AnalyzerEngine()
    anonymizer = AnonymizerEngine()
    results = analyzer.analyze(text=text, language="en")
    return anonymizer.anonymize(text=text, analyzer_results=results).text

Calling this from a .NET ingestion pipeline:

var response = await http.PostAsJsonAsync("/redact", new { text = content });
var cleaned = await response.Content.ReadAsStringAsync();

await memory.UpsertAsync(id, cleaned, metadata);

Detection thresholds should favor false positives over false negatives. Many organizations also add custom recognizers for internal identifiers such as employee IDs or contractor numbers.

5.4 Agent Action Authorization: Controlling What Agents Are Allowed to Do

Data access is only one dimension of security. Agents may also be capable of taking actions—and not all users or agents should be allowed to perform all actions.

Common agent actions include:

Sending emails or Teams messages
Creating or updating ServiceNow tickets
Modifying SharePoint lists
Writing to databases
Triggering workflows

These capabilities should be explicitly authorized using a capability matrix, separate from data permissions.

Example capability model:

Action	Allowed Roles	Approval Required
Read documents	All users	No
Create support ticket	Employees	No
Update SharePoint list	Managers	Yes
Send external email	Disabled by default	Yes

At runtime, the agent checks both user permissions and agent capabilities before executing an action. If an action is disallowed, the agent should explain why and suggest an alternative.

This explicit authorization layer prevents “over-helpful” agents from performing unintended or risky operations and is essential for enterprise adoption.

6 Implementation: Building the Agent in C#

The implementation phase brings together routing, retrieval, permissions, and reasoning into a cohesive enterprise AI agent. In most production systems, this agent is built as a C# application with Semantic Kernel acting as the orchestration layer between LLMs, plugins, memory stores, and security controls.

Unlike a simple chat application, an enterprise agent must reason about what to do next, which tools to call, and when to stop. This section focuses on assembling those pieces into a working agent that follows predictable patterns and guardrails. The examples assume hybrid retrieval and OBO enforcement are already configured.

6.1 Setting Up the Kernel and Creating the Agent

The kernel provides capabilities; the agent defines behavior. A production setup typically includes:

One or more chat completion models
An embeddings model for memory
Plugin registration
Explicit agent instructions

Kernel configuration remains mostly infrastructure-oriented:

var builder = Kernel.CreateBuilder();

builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o",
    endpoint: config.OpenAiEndpoint,
    apiKey: config.OpenAiKey);

builder.AddAzureOpenAITextEmbeddingGeneration(
    deploymentName: "text-embedding-3-large",
    endpoint: config.OpenAiEndpoint,
    apiKey: config.OpenAiKey);

builder.AddMemory(memory =>
{
    memory.UseCustomVectorStore(new VectorStore(config.VectorDbConnection));
});

builder.Plugins.AddFromType<SharePointSearchPlugin>();
builder.Plugins.AddFromType<SharedDrivePlugin>();
builder.Plugins.AddFromType<ReferenceEnginePlugin>();

var kernel = builder.Build();

Once the kernel is built, the agent is instantiated explicitly. This step is often missing in examples but is critical for enterprise clarity:

var agent = new ChatCompletionAgent
{
    Kernel = kernel,
    Name = "EnterpriseAgent",
    Instructions = """
    You are an enterprise AI agent.
    Always retrieve supporting evidence before answering.
    Use plugins when available.
    Cite sources using SharePoint URLs.
    Do not guess when information is missing.
    """
};

This instruction block is the system prompt. It defines the agent’s operating rules and is more important than any individual plugin. Teams usually version and review these prompts just like application code.

6.2 Defining Agent Instructions and Roles

Most enterprise systems use multiple agents, each with focused responsibilities.

Router Agent Instructions (Example)

You are a router agent.
Classify the user request into HR, IT, Operations, or General.
Decide whether retrieval or tool usage is required.
Return JSON only.

Domain Specialist Agent Instructions (Example)

You are an HR policy assistant.
Use only HR-approved sources.
Explain missing permissions clearly.
Always include document citations.

Citation Formatting Instructions (Example)

When answering, include citations as:
- Document Title (URL)
Do not fabricate links.

Clear role separation reduces hallucinations and keeps agents aligned with enterprise governance.

6.3 Orchestration Patterns: ReAct, Tool Selection, and Multi-Agent Flow

Enterprise agents rarely succeed in a single step. Instead, they follow an agent loop, commonly implemented using the ReAct pattern:

Reason → Act → Observe → Repeat

Reason – Decide what information is missing.
Act – Call a plugin or retrieval function.
Observe – Inspect results.
Repeat – Refine or stop.

Tool Selection Logic

Semantic Kernel enables models to choose tools, but the agent instructions guide that choice. For example:

var response = await agent.InvokeAsync(
    """
    User question: {{$input}}
    Decide which tools to call before answering.
    """,
    new() { ["input"] = userInput }
);

If the model determines that SharePoint data is required, it calls search_sharepoint. If results are insufficient, it may call the shared-drive plugin next. This is agentic reasoning—not simple classification.

Multi-Agent Coordination

Routing becomes essential as systems scale. A Router Agent hands off requests to specialized agents:

var routeJson = await routerAgent.InvokeAsync(prompt);
var route = JsonSerializer.Deserialize<RouteResult>(routeJson);

return route.Category switch
{
    "HR" => await hrAgent.InvokeAsync(userInput),
    "IT" => await itAgent.InvokeAsync(userInput),
    _ => await generalAgent.InvokeAsync(userInput)
};

Note: InvokeAsync returns a string. Structured output must be parsed explicitly, avoiding brittle dictionary access.

This separation prevents HR policies, IT runbooks, and operational procedures from being mixed during reasoning.

6.4 Error Handling, Timeouts, and Agent Guardrails

Agent systems must expect partial failure. A tool may fail mid-plan, a file share may time out, or a model may exceed token limits. Robust agents re-plan instead of crashing.

Key patterns include:

Tool Failure and Re-Planning

If a plugin fails, the agent can retry with an alternative strategy:

SharePoint search fails → fall back to vector memory
Shared drive unavailable → explain limitation to user

Iteration and Timeout Guards

Agents should never loop indefinitely. Common safeguards include:

Maximum reasoning iterations (e.g., 5 loops)
Timeouts on plugin calls
Token budget enforcement per request

Example wrapper:

for (int i = 0; i < 5; i++)
{
    try
    {
        return await agent.InvokeAsync(prompt);
    }
    catch (TimeoutException)
    {
        logger.LogWarning("Retrying agent step...");
    }
}

return "I couldn't complete the request within safe limits.";

Graceful Degradation

When all strategies fail, the agent should respond transparently:

“I wasn’t able to retrieve authoritative sources for this request. You may want to check SharePoint directly or refine your question.”

This preserves trust and avoids fabricated answers.

7 Building the Chat Interface with Citations and Agent Transparency

In an enterprise system, the chat interface is not just a messaging surface—it is the primary window into the agent’s decision-making. Unlike simple RAG chatbots, agents reason over multiple steps, invoke tools, and may change plans mid-execution. The UI must reflect this behavior in a way that builds trust rather than obscuring complexity.

A well-designed interface makes the agent’s actions visible, explains why certain information was used (or unavailable), and allows users to stay in control. Blazor integrates naturally with the .NET backend and Entra ID, making it a practical choice for building agent-aware interfaces with secure identity passthrough.

7.1 Blazor Front-End Options: Server vs. WebAssembly for Agent Experiences

Blazor Server is often the default choice for enterprise AI agents. Because execution happens server-side, it supports long-running agent workflows, tool invocation, and identity propagation without exposing sensitive data to the browser. It also enables agent streaming, where partial results are sent to the UI as the agent progresses through its plan.

Typical use cases for Blazor Server include:

Streaming intermediate agent steps (tool calls, retrieval progress)
Displaying partial answers while tools execute
Allowing users to interrupt or redirect the agent mid-run

Blazor WebAssembly runs client-side and is better suited for lightweight chat experiences. However, for agent-driven systems, WASM requires strict API boundaries. The browser should never receive raw document text, intermediate reasoning traces, or privileged tool outputs. As a result, most teams reserve WebAssembly for non-sensitive or read-only agent scenarios.

A simplified Blazor Server chat component with agent streaming support:

@inject ChatService Chat

<div class="chat-window">
    @foreach (var message in Messages)
    {
        <div class="msg">@message.Author: @message.Text</div>
    }
</div>

<input @bind="Current" @onkeypress="Send" />

@code {
    private List<Message> Messages = new();
    private string Current;

    private async Task Send(KeyboardEventArgs e)
    {
        if (e.Key != "Enter") return;

        Messages.Add(new("User", Current));

        await foreach (var update in Chat.StreamAsync(Current))
        {
            Messages.Add(new("Agent", update));
        }

        Current = "";
    }
}

This streaming pattern allows the UI to show progress such as “Searching SharePoint…” or “Analyzing policy updates…” while the agent is still executing.

7.2 Citation Rendering Across Multi-Step Agent Execution

Agents often retrieve information from multiple tools in a single response—vector memory, SharePoint search, and shared drives. The UI must aggregate and render citations across these steps in a coherent way.

Each citation should include:

Source title
Location (page, section, or paragraph)
Tool used (e.g., SharePoint, Shared Drive, Vector Memory)
Confidence score from re-ranking or retrieval
Direct link resolved by the Reference Engine

Example agent response payload:

{
  "text": "You can request a contractor laptop through the onboarding portal [1][2].",
  "citations": [
    {
      "id": "1",
      "title": "Contractor Onboarding SOP",
      "source": "SharePoint",
      "page": 4,
      "url": "https://tenant.sharepoint.com/.../sop.pdf",
      "confidence": 0.87
    },
    {
      "id": "2",
      "title": "IT Equipment Request Guide",
      "source": "SharedDrive",
      "section": "Hardware Requests",
      "url": "\\\\fileserver\\IT\\Equipment.docx",
      "confidence": 0.72
    }
  ]
}

From an agent perspective, confidence thresholds matter. If no citation exceeds a minimum confidence (for example, 0.6), the agent should either:

Refuse to answer and explain the gap, or
Provide a low-confidence response with a warning such as “This answer may be incomplete.”

This prevents agents from presenting weakly grounded answers as authoritative.

7.3 Agent Transparency: Showing Decisions, Tools, and Audit Trails

Enterprise users often ask “Why did the agent answer this way?” A transparency panel answers that question without exposing raw model reasoning.

Effective agent UIs show:

Which tools were invoked (SharePoint search, shared drive, memory)
High-level decisions (domain routing, retrieval strategy)
Execution timeline (step-by-step plan)
Permission constraints (what was skipped and why)

Example transparency view:

Agent Plan:
1. Classified query as IT-related
2. Queried SharePoint IT policies
3. Retrieved 3 relevant documents
4. Ranked results by relevance
5. Generated answer with citations

For compliance and audit scenarios, this information should also be logged server-side. Audit trails typically include:

User identity
Tools invoked
Documents accessed
Final response and citations
Timestamp and correlation ID

This allows security teams to reconstruct agent behavior during reviews or investigations.

7.4 Deep Link Logic: Verifiable Grounding in Source Documents

Deep links allow users to independently verify agent responses. The Reference Engine resolves internal identifiers into user-accessible URLs, while the UI renders links that open documents at meaningful locations.

C# helper for generating deep links:

public class DeepLinkService
{
    public string ToPdfPage(string url, int page)
        => $"{url}#page={page}";

    public string ToOfficeDocument(string url)
        => $"{url}?web=1";
}

When users click a citation, the document opens exactly where the agent sourced its answer. This tight feedback loop is critical for trust, especially in policy, HR, and compliance-heavy environments.

8 Production Deployment and Operations

Running an enterprise AI agent in production is an ongoing operational discipline, not a one-time deployment. Unlike static RAG systems, agents evolve continuously: they learn new routing patterns, invoke tools in new ways, and reason over longer, multi-step workflows. As a result, production operations must measure not only answer quality, but also agent behavior.

This section covers how to evaluate agent decisions, observe multi-step execution, manage versions safely, involve humans when needed, and keep costs predictable as usage grows.

8.1 Evaluation Pipelines: Measuring Agent Decisions and Outcomes

Traditional RAG evaluation focuses on answer quality. For agents, evaluation must also assess how the answer was produced. Automated frameworks such as RAGAS and DeepEval still apply, but they should be extended with agent-specific metrics.

Common evaluation dimensions include:

Faithfulness – Does the answer align with retrieved evidence?
Answer relevance – Does the response address user intent?
Context precision – Did retrieval return the right chunks?
Citation completeness – Are claims supported by sources?

Agent-specific metrics add another layer:

Tool selection accuracy – Did the agent choose the correct plugin?
Plan efficiency – Did the agent complete the task in minimal steps?
Hallucinated tool calls – Did the agent attempt to invoke non-existent capabilities?

Example evaluation script:

from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevance

data = {
    "question": [...],
    "answer": [...],
    "contexts": [...]
}

results = evaluate(data, metrics=[faithfulness, answer_relevance])
print(results)

In production, enterprises often run nightly evaluations against recent agent conversations. Poor scores may indicate degraded embeddings, outdated chunking strategies, or misaligned agent instructions. Over time, this feedback loop becomes the primary mechanism for keeping agent behavior aligned with real-world usage.

8.2 Observability: Tracing Full Agent Loops and Detecting Failures

Observability for agents goes beyond request-level logging. Teams must trace entire agent loops, including reasoning, tool calls, retries, and termination conditions.

OpenTelemetry integrates cleanly with .NET, Semantic Kernel, and HTTP clients, enabling end-to-end tracing of agent execution:

builder.Services.AddOpenTelemetry()
    .WithTracing(t =>
    {
        t.AddAspNetCoreInstrumentation()
         .AddHttpClientInstrumentation()
         .AddSource("SemanticKernel")
         .AddAzureMonitorTraceExporter(o =>
         {
             o.ConnectionString = config.AppInsightsConnection;
         });
    });

Typical spans include:

RouterAgent.Classify
Planner.GeneratePlan
SharePointSearchPlugin.Execute
SharedDrivePlugin.Execute
LLM.GenerateResponse

Agent-specific observability patterns include:

Loop detection – Alert when an agent exceeds a maximum number of steps.
Repeated failures – Detect tool calls that consistently error or time out.
Decision audit logs – Persist routing decisions, tool choices, and final citations.

These traces serve both operational debugging and compliance audits, allowing teams to reconstruct why an agent answered the way it did.

8.3 Agent Versioning, Rollback, and Human-in-the-Loop Controls

Agents are software artifacts. Their prompts, routing logic, and tool access rules must be versioned and deployed safely.

A common strategy includes:

Versioned agent instructions stored alongside code
Canary or A/B deployments comparing agent versions
Rollback mechanisms for prompt or planner regressions

For example, a new router prompt might be deployed to 10% of users while the previous version remains active for the rest. Evaluation metrics and user feedback determine whether the change is promoted or reverted.

Equally important are human-in-the-loop patterns. Agents should escalate to humans when:

Confidence scores fall below a defined threshold
The request involves approvals, financial impact, or legal risk
Required data is unavailable or permission-restricted

Escalation may take the form of:

Creating a support ticket
Handing off to a human operator
Asking the user for confirmation before proceeding

This balance allows agents to remain autonomous for routine tasks while deferring high-risk decisions.

8.4 Cost Management: Understanding Agent-Specific Cost Drivers

Agent systems incur costs differently than single-shot chatbots. Key cost drivers include:

Multi-turn conversations – Each reasoning loop adds tokens.
Tool call overhead – Function calls and structured outputs consume tokens.
Reasoning models – Models with “thinking” tokens (e.g., o1-class models) are significantly more expensive.
Long contexts – Larger plans and retrieval results increase prompt size.

Mitigation strategies include:

Using lightweight models for routing and classification
Limiting maximum agent iterations
Caching retrieval results and intermediate plans
Compressing or summarizing context before generation

Example cache wrapper:

var key = Hash(input);

if (_cache.TryGet(key, out var cached))
    return cached;

var response = await kernel.InvokePromptAsync(input);
_cache.Set(key, response, TimeSpan.FromMinutes(10));

return response;

Another effective approach is to separate thinking from answering—using expensive reasoning models only when planning is required, and cheaper models for final response synthesis.

With disciplined routing, caching, and guardrails, enterprises can scale agent usage without unpredictable cost growth.