The question I keep running into with production agents isn't whether the model is capable enough. It's whether the agent knows when to apply which expertise and whether anyone can verify that it did.
Longer system prompts don't solve this. Neither does cramming every policy document into context on every turn. What works is giving the agent structured, discoverable capability packages it can load on demand: instructions, reference materials, templates, and guardrails scoped to a specific responsibility.
That's exactly what Skills are in Microsoft Agent Framework. The agent discovers what's available, loads what a given task requires, reads the relevant references, and then responds. It doesn't carry everything at all times, and that constraint is the whole point.
Deliberate loading over blanket context is what separates a demo agent from one you'd trust with actual business decisions. I built a full end-to-end sample around this pattern, a supply chain incident command center with .NET 10 on the backend and React on the frontend with two cooperating skills that handle disruption triage and stakeholder communications.
Progressive disclosure pattern
If you've read my previous posts on orchestration patterns, you'll recognise a recurring theme that the system's shape matters more than the model's intelligence. Skills follow that same principle.
Progressive disclosure means the agent doesn't start every turn loaded with every policy and every template. Instead, the system follows a clear lifecycle.

First, available skills are advertised to the model as part of the context. When the model recognises that a task requires domain expertise, it calls load_skill to fetch the full skill instructions. Then it calls read_skill_resource to pull specific references or templates it needs. Only then does it produce its response.
This is not a minor optimisation. It directly reduces context noise, improves response consistency and most importantly, it gives you a traceable audit trail. You can see exactly which skill was loaded, which references were read and in what order. When someone asks "why did the agent recommend this?", you point to the timeline instead of guessing.
Supply Chain Incident Command Center: skill-driven triage and stakeholder communications
How skill invocation actually works
This is the part that gets interesting architecturally. The sample doesn't use hardcoded prompt injection for skills. Instead, it leverages Agent Framework's AIContextProvider pattern to expose skills as native tools the model can invoke on demand.
The NativeAgentSkillsContextProvider extends AIContextProvider and does two things during each invocation; it injects a set of instructions listing the available skills and it registers two native tools, load_skill(skillName) and read_skill_resource(skillName, resourcePath), that the model can call through standard tool use.
When the model calls load_skill("incident-triage"), the provider resolves the skill from a FileAgentSkillsProvider that reads from disk, returns the full SKILL.md content, and records a timeline event. The same flow applies when the model follows up with read_skill_resource("incident-triage", "references/SLA_POLICY.md"), the file is read, content is returned, a trace event is logged.
A SkillRunContextAccessor backed by AsyncLocal ties every tool call to the current run ID so the timeline stays correlated even under concurrent requests. And a SkillExecutionGuard validates after each run that the expected skill was actually loaded. If the model skips the skill invocation (which can happen with prompt drift), the API returns a clear error instead of a potentially unreliable result.
Below is the architecture at a glance.
The key relationship to notice is that FileAgentSkillsProvider never talks to the model directly. The model talks to the native tools exposed by the context provider, which then delegates to the file-based catalog.
Skills are first-class assets on disk, not hidden prompt internals.
Skill packages as versionable units
Each skill lives in its own folder with a SKILL.md definition, a references/ directory for domain knowledge and an assets/ directory for templates.
backend/skills/
├── incident-triage/
│ ├── SKILL.md
│ ├── references/
│ │ ├── SLA_POLICY.md
│ │ ├── ESCALATION_MATRIX.md
│ │ └── VENDOR_CONSTRAINTS.md
│ └── assets/
│ ├── triage-report-template.md
│ └── executive-brief-template.md
└── incident-communications/
├── SKILL.md
├── references/
│ └── tone-guidelines.md
└── assets/
├── customer-update-template.md
└── supplier-escalation-template.md
This is inspectable, diffable and reviewable by engineers. When a policy changes, you update one markdown file rather than hunting through system prompts.
The incident-triage skill handles severity classification, probable cause mapping and prioritised action plans. The incident-communications skill takes the triage output and drafts audience-specific updates (customer, supplier, or internal leadership) using tone guidelines and message templates. Two distinct responsibilities, two distinct boundaries.
A command center, not a chat box
I deliberately shaped the frontend as an operations surface rather than a generic chat shell. It has a KPI strip showing open incidents and at-risk order counts, an incident queue for selection, a triage workspace with an operator directive field, a live skill timeline that shows the progressive disclosure lifecycle as events come in and an evidence drawer that lists every reference the agent actually read.
That last part changes how people evaluate the system. When the evidence drawer shows exactly which SLA policy and which vendor constraint document the agent consulted, trust builds quickly because the reasoning context is visible.
After triage completes, Phase 2 kicks in. The operator selects an audience, clicks "Draft Stakeholder Update," and the second skill (incident-communications) loads, reads its own references and produces the communication. No system redesign required. You add a new skill boundary and the agent picks it up through the same native invocation flow.
I am sure you might want to check out the sample yourself so here it is. Please do not forget to repo a star it helps you.
Most pain I see with skill systems is not about model quality. It's about boundary quality.
If you dump all references into every turn, you lose the efficiency benefit of progressive disclosure entirely. If you skip event tracing, you lose explainability that makes skills trustworthy in production. If you build five skills before your first one is solid, you're debugging capability boundaries instead of business logic.
Lastly, my recommendation is to make one flagship skill excellent. Get the lifecycle visible. Build confidence in the trace. Then add a cooperating skill once the first boundary is proven. That's exactly the phased approach this sample follows with incident-triage first and incident-communications second.
If your current agent mostly works but still feels fragile under real business conditions, Skills are one of the highest-leverage upgrades you can make. Not because they make models smarter but because they make expertise modular, explicit and observable. Once agent behavior becomes a business dependency those properties stop being optional.
Until next time.
