Lightgroup.io | AI Architecture & Enablement

01 / Solutions

What we build

Solution architecture, implementation, and deployment of scaled autonomous systems — engineered for production reliability and integrated across the organization.

Customer Support Resolution Agents

Deploy reliable agents for high-ambiguity requests like returns, billing disputes, and account issues. We design systems integrating custom Model Context Protocol (MCP) tools to achieve 80%+ first-contact resolution while securely escalating to human agents.

MCP Escalation CX

Code Generation with Claude Code

Accelerate software development. We integrate Claude Code into development workflows with custom slash commands, CLAUDE.md configurations, and intelligent utilization of plan mode versus direct execution for refactoring, debugging, and documentation.

Claude Code DevEx

Multi-Agent Research Systems

Orchestrate complex workflows using coordinator-subagent patterns. We build systems where a coordinator delegates to specialized subagents for web search, document analysis, and data synthesis — producing comprehensive, highly accurate, and cited reports.

Orchestration Research

Developer Productivity Suites

Empower teams to navigate unfamiliar, massive codebases. We integrate built-in tools (Read, Write, Bash, Grep, Glob) alongside custom MCP servers to help developers understand legacy systems, generate boilerplate, and securely automate repetitive tasks.

MCP Codebase

CI/CD AI Integration

Seamlessly embed Claude into your continuous integration and deployment pipelines. We configure systems for automated code reviews, missing test case generation, and actionable pull request feedback — while mitigating false positives.

CI/CD Code review

Structured Data Extraction

Transform unstructured documentation into clean datasets. Our architectures leverage strict JSON schemas, validation retries, and explicit tool-use patterns to maintain near-perfect accuracy and handle edge cases gracefully.

JSON Schema ETL

02 / Capabilities

The architectural spine

A breakdown of foundational architecture expertise and the engineering methodologies we apply to ensure robust, production-ready AI deployments.

Agentic Architecture & Orchestration A.01

Design and implement agentic loops for autonomous, reliable task execution.
Orchestrate multi-agent systems using advanced coordinator-subagent patterns.
Configure secure subagent invocation, context passing, and dynamic spawning.
Implement multi-step workflows with strict enforcement and handoff patterns.
Apply SDK hooks for tool call interception, telemetry, and data normalization.
Design task decomposition strategies to solve highly complex, ambiguous workflows.
Manage session state, memory resumption, and intelligent thread forking.

Tool Design & MCP Integration A.02

Design highly effective tool interfaces with clear boundaries and descriptions.
Implement structured error responses and recovery loops for custom APIs.
Distribute tools appropriately across distinct agents to avoid hallucination.
Integrate custom Model Context Protocol (MCP) servers into enterprise workflows.
Select and apply native tools securely within constrained environments.

Context Management & Reliability A.03

Manage context decay to preserve critical information across long interactions.
Formulate strict escalation criteria for secure human-agent handoffs.
Implement error propagation and mitigation strategies across multi-agent meshes.
Maintain context limits through intelligent scratchpad files and state exports.
Design human-in-the-loop (HITL) review workflows based on statistical confidence calibration.

Prompt Engineering & Extraction A.04

Design metaprompts with explicit criteria to improve precision and reduce false positives.
Apply few-shot techniques to handle document variability and unstructured noise.
Enforce structured outputs using advanced tool-use configurations and JSON schemas.
Implement self-correction validations and schema error recovery loops.
Design highly efficient batch processing architectures for large datasets.

Development Workflows A.05

Configure internal tooling and repository configurations for scalable AI collaboration.
Isolate skill sets using strict context boundaries and allowed-tool matrices.
Apply path-specific rules for conditional, dynamic loading of coding conventions.
Utilize planning frameworks for large-scale migrations and architectural overhauls.
Apply iterative refinement techniques for progressive codebase improvement.

03 / Use Cases

Production scenarios

Real-world production contexts and architectural challenges derived from enterprise deployments. Each scenario reflects patterns we've shipped.

Customer Support Resolution Agent

Designed to handle high-ambiguity requests such as returns, billing disputes, and account issues using the Claude Agent SDK. The system leverages custom MCP tools like get_customer and process_refund to achieve an 80%+ first-contact resolution rate with integrated human escalation paths.

Code Generation with Claude Code

Accelerates software development through refactoring, debugging, and documentation workflows. Implementation utilizes custom slash commands, CLAUDE.md configurations, and strategic toggling between plan mode and direct execution.

Multi-Agent Research System

Employs a coordinator-subagent architecture where specialized agents handle web searches, document analysis, and synthesis. This orchestration allows the system to produce comprehensive, cited reports on complex topics.

Developer Productivity with Claude

Assists engineers in exploring unfamiliar codebases and understanding legacy systems. The agent utilizes built-in tools such as Bash, Grep, and Glob alongside custom MCP server integrations to automate repetitive tasks.

Claude Code for Continuous Integration

Embeds Claude into CI/CD pipelines to provide automated code reviews and test case generation. The focus is on designing prompts that offer actionable pull request feedback while minimizing false positives.

Structured Data Extraction

Extracts information from unstructured documents and validates the results against strict JSON schemas. This system is built to maintain high accuracy across edge cases and facilitate integration with downstream backend systems.

04 / Enablement

Training curriculum.

Equip your personnel with hands-on, operational AI mastery. Comprehensive training across the entire official Anthropic course ecosystem.

AI Fluency & Fundamentals

Claude 101Use Claude for everyday work tasks, understand core features, and explore advanced topics.

AI Fluency: Framework & FoundationsCollaborate with AI systems effectively, efficiently, ethically, and safely.

AI Capabilities and LimitationsAn introductory course breaking down how AI actually works under the hood.

Teaching AI FluencyEmpowers academic faculty and instructional designers to teach and assess AI Fluency.

AI Fluency for EducatorsApply AI Fluency into teaching practices and institutional strategy.

AI Fluency for StudentsDevelop skills that enhance learning, career planning, and academic success.

AI Fluency for NonprofitsIncrease organizational impact and efficiency while staying true to mission and values.

Claude Code & Agentic Workflows

Claude Code 101Use Claude Code effectively in your daily development workflow.

Claude Code in ActionHands-on integration of Claude Code into real-world software development pipelines.

Introduction to Claude CoworkWork alongside Claude on your real files. Covers task loops, plugins, and multi-step work.

Introduction to Agent SkillsBuild, configure, and share reusable markdown instructions Claude applies contextually.

Introduction to SubagentsManage context and delegate tasks by creating specialized sub-agents in Claude Code.

API & Cloud Integrations

Building with the Claude APIA comprehensive deep dive covering the full spectrum of building with the Claude API.

Claude with Amazon BedrockFollow the official accreditation program to deploy and configure Claude securely on AWS.

Claude with Google Cloud's Vertex AIMaster working with Anthropic models natively through GCP's Vertex AI infrastructure.

Model Context Protocol

Introduction to MCPBuild MCP servers and clients from scratch using Python. Master tools, resources, and prompts.

MCP: Advanced TopicsAdvanced patterns including sampling, notifications, file system access, and transport.

05 / Knowledge

Practical
domain knowledge

Comprehensive task statements detailing the exact knowledge and skills required for architectural certification.

Agentic Architecture & Orchestration

Domain 01

Design and implement agentic loops for autonomous task execution

Knowledge of

The agentic loop lifecycle: sending requests to Claude, inspecting stop_reason ("tool_use" vs "end_turn"), executing requested tools, and returning results for the next iteration
How tool results are appended to conversation history so the model can reason about the next action
The distinction between model-driven decision-making (Claude reasons about which tool to call next based on context) and pre-configured decision trees or tool sequences

Skills in

Implementing agentic loop control flow that continues when stop_reason is "tool_use" and terminates when stop_reason is "end_turn"
Adding tool results to conversation context between iterations so the model can incorporate new information into its reasoning
Avoiding anti-patterns such as parsing natural language signals to determine loop termination, setting arbitrary iteration caps as the primary stopping mechanism, or checking for assistant text content as a completion indicator

Orchestrate multi-agent systems with coordinator-subagent patterns

Knowledge of

Hub-and-spoke architecture where a coordinator agent manages all inter-subagent communication, error handling, and information routing
How subagents operate with isolated context — they do not inherit the coordinator's conversation history automatically
The role of the coordinator in task decomposition, delegation, result aggregation, and deciding which subagents to invoke based on query complexity
Risks of overly narrow task decomposition by the coordinator, leading to incomplete coverage of broad research topics

Skills in

Designing coordinator agents that analyze query requirements and dynamically select which subagents to invoke rather than always routing through the full pipeline
Partitioning research scope across subagents to minimize duplication (e.g., assigning distinct subtopics or source types to each agent)
Implementing iterative refinement loops where the coordinator evaluates synthesis output for gaps, re-delegates to search and analysis subagents with targeted queries, and re-invokes synthesis until coverage is sufficient
Routing all subagent communication through the coordinator for observability, consistent error handling, and controlled information flow

Configure subagent invocation, context passing, and spawning

Knowledge of

The Task tool as the mechanism for spawning subagents, and the requirement that allowedTools must include "Task" for a coordinator to invoke subagents
That subagent context must be explicitly provided in the prompt — subagents do not automatically inherit parent context or share memory between invocations
The Agent Definition configuration including descriptions, system prompts, and tool restrictions for each subagent type
Fork-based session management for exploring divergent approaches from a shared analysis baseline

Skills in

Including complete findings from prior agents directly in the subagent's prompt (e.g., passing web search results and document analysis outputs to the synthesis subagent)
Using structured data formats to separate content from metadata (source URLs, document names, page numbers) when passing context between agents to preserve attribution
Spawning parallel subagents by emitting multiple Task tool calls in a single coordinator response rather than across separate turns
Designing coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions, to enable subagent adaptability

Implement multi-step workflows with enforcement and handoff patterns

Knowledge of

The difference between programmatic enforcement (hooks, prerequisite gates) and prompt-based guidance for workflow ordering
When deterministic compliance is required (e.g., identity verification before financial operations), prompt instructions alone have a non-zero failure rate
Structured handoff protocols for mid-process escalation that include customer details, root cause analysis, and recommended actions

Skills in

Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed (e.g., blocking process_refund until get_customer has returned a verified customer ID)
Decomposing multi-concern customer requests into distinct items, then investigating each in parallel using shared context before synthesizing a unified resolution
Compiling structured handoff summaries (customer ID, root cause, refund amount, recommended action) when escalating to human agents who lack access to the conversation transcript

Apply Agent SDK hooks for tool call interception and data normalization

Knowledge of

Hook patterns (e.g., Post ToolUse) that intercept tool results for transformation before the model processes them
Hook patterns that intercept outgoing tool calls to enforce compliance rules (e.g., blocking refunds above a threshold)
The distinction between using hooks for deterministic guarantees versus relying on prompt instructions for probabilistic compliance

Skills in

Implementing Post ToolUse hooks to normalize heterogeneous data formats (Unix timestamps, ISO 8601, numeric status codes) from different MCP tools before the agent processes them
Implementing tool call interception hooks that block policy-violating actions (e.g., refunds exceeding $500) and redirect to alternative workflows (e.g., human escalation)
Choosing hooks over prompt-based enforcement when business rules require guaranteed compliance

Design task decomposition strategies for complex workflows

Knowledge of

When to use fixed sequential pipelines (prompt chaining) versus dynamic adaptive decomposition based on intermediate findings
Prompt chaining patterns that break reviews into sequential steps (e.g., analyze each file individually, then run a cross-file integration pass)
The value of adaptive investigation plans that generate subtasks based on what is discovered at each step

Skills in

Selecting task decomposition patterns appropriate to the workflow: prompt chaining for predictable multi-aspect reviews, dynamic decomposition for open-ended investigation tasks
Splitting large code reviews into per-file local analysis passes plus a separate cross-file integration pass to avoid attention dilution
Decomposing open-ended tasks (e.g., "add comprehensive tests to a legacy codebase") by first mapping structure, identifying high-impact areas, then creating a prioritized plan that adapts as dependencies are discovered

Manage session state, resumption, and forking

Knowledge of

Named session resumption using --resume <session-name> to continue a specific prior conversation
fork_session for creating independent branches from a shared analysis baseline to explore divergent approaches
The importance of informing the agent about changes to previously analyzed files when resuming sessions after code modifications
Why starting a new session with a structured summary is more reliable than resuming with stale tool results

Skills in

Using --resume with session names to continue named investigation sessions across work sessions
Using fork_session to create parallel exploration branches (e.g., comparing two testing strategies or refactoring approaches from a shared codebase analysis)
Choosing between session resumption (when prior context is mostly valid) and starting fresh with injected summaries (when prior tool results are stale)
Informing a resumed session about specific file changes for targeted re-analysis rather than requiring full re-exploration

Tool Design & MCP Integration

Domain 02

Design effective tool interfaces with clear descriptions and boundaries

Knowledge of

Tool descriptions as the primary mechanism LLMs use for tool selection; minimal descriptions lead to unreliable selection among similar tools
The importance of including input formats, example queries, edge cases, and boundary explanations in tool descriptions
How ambiguous or overlapping tool descriptions cause misrouting (e.g., analyze_content vs analyze_document with near-identical descriptions)
The impact of system prompt wording on tool selection: keyword-sensitive instructions can create unintended tool associations

Skills in

Writing tool descriptions that clearly differentiate each tool's purpose, expected inputs, outputs, and when to use it versus similar alternatives
Renaming tools and updating descriptions to eliminate functional overlap (e.g., renaming analyze_content to extract_web_results with a web-specific description)
Splitting generic tools into purpose-specific tools with defined input/output contracts (e.g., splitting a generic analyze_document into extract_data_points, summarize_content, and verify_claim_against_source)
Reviewing system prompts for keyword-sensitive instructions that might override well-written tool descriptions

Implement structured error responses for MCP tools

Knowledge of

The MCP isError flag pattern for communicating tool failures back to the agent
The distinction between transient errors (timeouts, service unavailability), validation errors (invalid input), business errors (policy violations), and permission errors
Why uniform error responses (generic "Operation failed") prevent the agent from making appropriate recovery decisions
The difference between retryable and non-retryable errors, and how returning structured metadata prevents wasted retry attempts

Skills in

Returning structured error metadata including errorCategory (transient/validation/permission), isRetryable boolean, and human-readable descriptions
Including retriable: false flags and customer-friendly explanations for business rule violations so the agent can communicate appropriately
Implementing local error recovery within subagents for transient failures, propagating to the coordinator only errors that cannot be resolved locally along with partial results and what was attempted
Distinguishing between access failures (needing retry decisions) and valid empty results (representing successful queries with no matches)

Distribute tools appropriately across agents and configure tool choice

Knowledge of

The principle that giving an agent access to too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability by increasing decision complexity
Why agents with tools outside their specialization tend to misuse them (e.g., a synthesis agent attempting web searches)
Scoped tool access: giving agents only the tools needed for their role, with limited cross-role tools for specific high-frequency needs
tool_choice configuration options: "auto", "any", and forced tool selection ({"type": "tool", "name": "..."})

Skills in

Restricting each subagent's tool set to those relevant to its role, preventing cross-specialization misuse
Replacing generic tools with constrained alternatives (e.g., replacing fetch_url with load_document that validates document URLs)
Providing scoped cross-role tools for high-frequency needs (e.g., a verify_fact tool for the synthesis agent) while routing complex cases through the coordinator
Using tool_choice forced selection to ensure a specific tool is called first (e.g., forcing extract_metadata before enrichment tools), then processing subsequent steps in follow-up turns
Setting tool_choice: "any" to guarantee the model calls a tool rather than returning conversational text

Integrate MCP servers into Claude Code and agent workflows

Knowledge of

MCP server scoping: project-level (.mcp.json) for shared team tooling vs user-level (~/.claude.json) for personal/experimental servers
Environment variable expansion in .mcp.json (e.g., ${GITHUB_TOKEN}) for credential management without committing secrets
That tools from all configured MCP servers are discovered at connection time and available simultaneously to the agent
MCP resources as a mechanism for exposing content catalogs (e.g., issue summaries, documentation hierarchies, database schemas) to reduce exploratory tool calls

Skills in

Configuring shared MCP servers in project-scoped .mcp.json with environment variable expansion for authentication tokens
Configuring personal/experimental MCP servers in user-scoped ~/.claude.json
Enhancing MCP tool descriptions to explain capabilities and outputs in detail, preventing the agent from preferring built-in tools (like Grep) over more capable MCP tools
Choosing existing community MCP servers over custom implementations for standard integrations (e.g., Jira), reserving custom servers for team-specific workflows
Exposing content catalogs as MCP resources to give agents visibility into available data without requiring exploratory tool calls

Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

Knowledge of

Grep for content search (searching file contents for patterns like function names, error messages, or import statements)
Glob for file path pattern matching (finding files by name or extension patterns)
Read/Write for full file operations; Edit for targeted modifications using unique text matching
When Edit fails due to non-unique text matches, using Read + Write as a fallback for reliable file modifications

Skills in

Selecting Grep for searching code content across a codebase (e.g., finding all callers of a function, locating error messages)
Selecting Glob for finding files matching naming patterns (e.g., **/*.test.tsx)
Using Read to load full file contents followed by Write when Edit cannot find unique anchor text
Building codebase understanding incrementally: starting with Grep to find entry points, then using Read to follow imports and trace flows, rather than reading all files upfront
Tracing function usage across wrapper modules by first identifying all exported names, then searching for each name across the codebase

Claude Code Configuration & Workflows

Domain 03

Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization

Knowledge of

The CLAUDE.md configuration hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), and directory-level (subdirectory CLAUDE.md files)
That user-level settings apply only to that user — instructions in ~/.claude/CLAUDE.md are not shared with teammates via version control
The @import syntax for referencing external files to keep CLAUDE.md modular (e.g., importing specific standards files relevant to each package)
.claude/rules/ directory for organizing topic-specific rule files as an alternative to a monolithic CLAUDE.md

Skills in

Diagnosing configuration hierarchy issues (e.g., a new team member not receiving instructions because they're in user-level rather than project-level configuration)
Using @import to selectively include relevant standards files in each package's CLAUDE.md based on maintainer domain knowledge
Splitting large CLAUDE.md files into focused topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md, deployment.md)
Using the /memory command to verify which memory files are loaded and diagnose inconsistent behavior across sessions

Create and configure custom slash commands and skills

Knowledge of

Project-scoped commands in .claude/commands/ (shared via version control) vs user-scoped commands in ~/.claude/commands/ (personal)
Skills in .claude/skills/ with SKILL.md files that support frontmatter configuration including context: fork, allowed-tools, and argument-hint
The context: fork frontmatter option for running skills in an isolated sub-agent context, preventing skill outputs from polluting the main conversation
Personal skill customization: creating personal variants in ~/.claude/skills/ with different names to avoid affecting teammates

Skills in

Creating project-scoped slash commands in .claude/commands/ for team-wide availability via version control
Using context: fork to isolate skills that produce verbose output (e.g., codebase analysis) or exploratory context (e.g., brainstorming alternatives) from the main session
Configuring allowed-tools in skill frontmatter to restrict tool access during skill execution (e.g., limiting to file write operations to prevent destructive actions)
Using argument-hint frontmatter to prompt developers for required parameters when they invoke the skill without arguments
Choosing between skills (on-demand invocation for task-specific workflows) and CLAUDE.md (always-loaded universal standards)

Apply path-specific rules for conditional convention loading

Knowledge of

.claude/rules/ files with YAML frontmatter paths fields containing glob patterns for conditional rule activation
How path-scoped rules load only when editing matching files, reducing irrelevant context and token usage
The advantage of glob-pattern rules over directory-level CLAUDE.md files for conventions that span multiple directories (e.g., test files spread throughout a codebase)

Skills in

Creating .claude/rules/ files with YAML frontmatter path scoping (e.g., paths: ["terraform/**/*"]) so rules load only when editing matching files
Using glob patterns in path-specific rules to apply conventions to files by type regardless of directory location (e.g., **/*.test.tsx for all test files)
Choosing path-specific rules over subdirectory CLAUDE.md files when conventions must apply to files spread across the codebase

Determine when to use plan mode vs direct execution

Knowledge of

Plan mode is designed for complex tasks involving large-scale changes, multiple valid approaches, architectural decisions, and multi-file modifications
Direct execution is appropriate for simple, well-scoped changes (e.g., adding a single validation check to one function)
Plan mode enables safe codebase exploration and design before committing to changes, preventing costly rework
The Explore subagent for isolating verbose discovery output and returning summaries to preserve main conversation context

Skills in

Selecting plan mode for tasks with architectural implications (e.g., microservice restructuring, library migrations affecting 45+ files, choosing between integration approaches with different infrastructure requirements)
Selecting direct execution for well-understood changes with clear scope (e.g., a single-file bug fix with a clear stack trace, adding a date validation conditional)
Using the Explore subagent for verbose discovery phases to prevent context window exhaustion during multi-phase tasks
Combining plan mode for investigation with direct execution for implementation (e.g., planning a library migration, then executing the planned approach)

Apply iterative refinement techniques for progressive improvement

Knowledge of

Concrete input/output examples as the most effective way to communicate expected transformations when prose descriptions are interpreted inconsistently
Test-driven iteration: writing test suites first, then iterating by sharing test failures to guide progressive improvement
The interview pattern: having Claude ask questions to surface considerations the developer may not have anticipated before implementing
When to provide all issues in a single message (interacting problems) versus fixing them sequentially (independent problems)

Skills in

Providing 2-3 concrete input/output examples to clarify transformation requirements when natural language descriptions produce inconsistent results
Writing test suites covering expected behavior, edge cases, and performance requirements before implementation, then iterating by sharing test failures
Using the interview pattern to surface design considerations (e.g., cache invalidation strategies, failure modes) before implementing solutions in unfamiliar domains
Providing specific test cases with example input and expected output to fix edge case handling (e.g., null values in migration scripts)
Addressing multiple interacting issues in a single detailed message when fixes interact, versus sequential iteration for independent issues

Integrate Claude Code into CI/CD pipelines

Knowledge of

The -p (or --print) flag for running Claude Code in non-interactive mode in automated pipelines
--output-format json and --json-schema CLI flags for enforcing structured output in CI contexts
CLAUDE.md as the mechanism for providing project context (testing standards, fixture conventions, review criteria) to CI-invoked Claude Code
Session context isolation: why the same Claude session that generated code is less effective at reviewing its own changes compared to an independent review instance

Skills in

Running Claude Code in CI with the -p flag to prevent interactive input hangs
Using --output-format json with --json-schema to produce machine-parseable structured findings for automated posting as inline PR comments
Including prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues to avoid duplicate comments
Providing existing test files in context so test generation avoids suggesting duplicate scenarios already covered by the test suite
Documenting testing standards, valuable test criteria, and available fixtures in CLAUDE.md to improve test generation quality and reduce low-value test output

Prompt Engineering & Structured Output

Domain 04

Design prompts with explicit criteria to improve precision and reduce false positives

Knowledge of

The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate")
How general instructions like "be conservative" or "only report high-confidence findings" fail to improve precision compared to specific categorical criteria
The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories

Skills in

Writing specific review criteria that define which issues to report (bugs, security) versus skip (minor style, local patterns) rather than relying on confidence-based filtering
Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories
Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification

Apply few-shot prompting to improve output consistency and quality

Knowledge of

Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results
The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps)
How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases
The effectiveness of few-shot examples for reducing hallucination in extraction tasks (e.g., handling informal measurements, varied document structures)

Skills in

Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives
Including few-shot examples that demonstrate specific desired output format (location, issue, severity, suggested fix) to achieve consistency
Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization
Using few-shot examples to demonstrate correct handling of varied document structures (inline citations vs bibliographies, methodology sections vs embedded details)
Adding few-shot examples showing correct extraction from documents with varied formats to address empty/null extraction of required fields

Enforce structured output using tool use and JSON schemas

Knowledge of

Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors
The distinction between tool_choice: "auto" (model may return text instead of calling a tool), "any" (model must call a tool but can choose which), and forced tool selection (model must call a specific named tool)
That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total, values in wrong fields)
Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories

Skills in

Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response
Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown
Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps
Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values to satisfy required fields
Adding enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization
Including format normalization rules in prompts alongside strict output schemas to handle inconsistent source formatting

Implement validation, retry, and feedback loops for extraction quality

Knowledge of

Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction
The limits of retry: retries are ineffective when the required information is simply absent from the source document (vs format or structural errors)
Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns
The difference between semantic validation errors (values don't sum, wrong field placement) and schema syntax errors (eliminated by tool use)

Skills in

Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction
Identifying when retries will be ineffective (e.g., information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors)
Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings
Designing self-correction validation flows: extracting calculated_total alongside stated_total to flag discrepancies, adding conflict_detected booleans for inconsistent source data

Design efficient batch processing strategies

Knowledge of

The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA
Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits, nightly test generation) and inappropriate for blocking workflows (pre-merge checks)
The batch API does not support multi-turn tool calling within a single request (cannot execute tools mid-request and return results)
custom_id fields for correlating batch request/response pairs

Skills in

Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis
Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing)
Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications (e.g., chunking documents that exceeded context limits)
Using prompt refinement on a sample set before batch-processing large volumes to maximize first-pass success rates and reduce iterative resubmission costs

Design multi-instance and multi-pass review architectures

Knowledge of

Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session
Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking
Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution and contradictory findings

Skills in

Using a second independent Claude instance to review generated code without the generator's reasoning context
Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis
Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing

Context Management & Reliability

Domain 05

Manage conversation context to preserve critical information across long interactions

Knowledge of

Progressive summarization risks: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries
The "lost in the middle" effect: models reliably process information at the beginning and end of long inputs but may omit findings from middle sections
How tool results accumulate in context and consume tokens disproportionately to their relevance (e.g., 40+ fields per order lookup when only 5 are relevant)
The importance of passing complete conversation history in subsequent API requests to maintain conversational coherence

Skills in

Extracting transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside summarized history
Extracting and persisting structured issue data (order IDs, amounts, statuses) into a separate context layer for multi-issue sessions
Trimming verbose tool outputs to only relevant fields before they accumulate in context (e.g., keeping only return-relevant fields from order lookups)
Placing key findings summaries at the beginning of aggregated inputs and organizing detailed results with explicit section headers to mitigate position effects
Requiring subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis
Modifying upstream agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains when downstream agents have limited context budgets

Design effective escalation and ambiguity resolution patterns

Knowledge of

Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps (not just complex cases), and inability to make meaningful progress
The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward
Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity
How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection

Skills in

Adding explicit escalation criteria with few-shot examples to the system prompt demonstrating when to escalate versus resolve autonomously
Honoring explicit customer requests for human agents immediately without first attempting investigation
Acknowledging frustration while offering resolution when the issue is within the agent's capability, escalating only if the customer reiterates their preference
Escalating when policy is ambiguous or silent on the customer's specific request (e.g., competitor price matching when policy only addresses own-site adjustments)
Instructing the agent to ask for additional identifiers when tool results return multiple matches, rather than selecting based on heuristics

Implement error propagation strategies across multi-agent systems

Knowledge of

Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions
The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches)
Why generic error statuses ("search unavailable") hide valuable context from the coordinator
Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns

Skills in

Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery
Distinguishing access failures from valid empty results in error reporting so the coordinator can make appropriate decisions
Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve, including what was attempted and partial results
Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources

Manage context effectively in large codebase exploration

Knowledge of

Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier
The role of scratchpad files for persisting key findings across context boundaries
Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding
Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume

Skills in

Spawning subagents to investigate specific questions (e.g., "find all test files," "trace refund flow dependencies") while the main agent preserves high-level coordination
Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation
Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context
Designing crash recovery using structured agent state exports (manifests) that the coordinator loads on resume and injects into agent prompts
Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output

Design human review workflows and confidence calibration

Knowledge of

The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields
Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns
Field-level confidence scores calibrated using labeled validation sets for routing review attention
The importance of validating accuracy by document type and field segment before automating high-confidence extractions

Skills in

Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection
Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review
Having models output field-level confidence scores, then calibrating review thresholds using labeled validation sets
Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity

Preserve information provenance and handle uncertainty in multi-source synthesis

Knowledge of

How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings
The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings
How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value
Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions

Skills in

Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis
Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations and methodological context
Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis
Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation
Rendering different content types appropriately in synthesis outputs — financial data as tables, news as prose, technical findings as structured lists — rather than converting everything to a uniform format

What we build

The architectural spine

Agentic Architecture & Orchestration A.01

Tool Design & MCP Integration A.02

Context Management & Reliability A.03

Prompt Engineering & Extraction A.04

Development Workflows A.05

Production scenarios

Training curriculum.

AI Fluency & Fundamentals

Claude Code & Agentic Workflows

API & Cloud Integrations

Model Context Protocol

Practical domain knowledge

Agentic Architecture & Orchestration

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Tool Design & MCP Integration

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Claude Code Configuration & Workflows

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Prompt Engineering & Structured Output

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Context Management & Reliability

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Knowledge of

Skills in

Practical
domain knowledge