Kwangmin Kim - MINERVA 에이전트 위임 – Supervisor가 하위 에이전트를 호출하는 패턴

1 에이전트 위임이란

정의: Multi-Agent / Agent Delegation

여러 에이전트가 협업해 하나의 사용자 요청을 처리하는 패턴이다. 각 에이전트는 자신만의 그래프와 State를 가지며, 상위 에이전트(Supervisor)가 어느 하위 에이전트를 언제 호출할지 결정한다.

핵심 단위: Supervisor + Worker(들), 각자 BaseAgent v2 인스턴스
전이 매개: 메시지(MessagesState) 또는 명시적 핸드오프 schema
차별점: 한 그래프 안의 노드(C12·C13)와 달리, 워커들이 독립 인스턴스로 분리되어 독립 배포·교체·테스트 가능

LangGraph는 세 가지 멀티 에이전트 패턴을 제공한다.

패턴	구조	reference
Supervisor (centralized)	중앙 supervisor가 모든 worker 라우팅	03-Use-Cases/07 Multi-Agent-Supervisor
Collaboration (peer-to-peer)	worker끼리 직접 메시지 교환	03-Use-Cases/06 Multi-Agent-Collaboration
Hierarchical	supervisor 위에 또 supervisor (트리)	03-Use-Cases/08 Hierarchial-Agent-Team

MINERVA의 현재 구조와 가장 가까운 것은 Supervisor 패턴이다.

2 단일 에이전트의 한계 — 왜 위임이 필요한가

C11~C13의 Tool Binding·ReAct·Plan-and-Execute는 한 에이전트 안에서 도구·단계를 동적으로 결정했다. 이 구조도 충분히 강력하지만 다음 한계에 부딪힌다.

도구 수 폭증

도구가 50개·100개를 넘기면 한 LLM 컨텍스트에 도구 설명을 모두 싣기 어렵다. 도구 설명만으로 토큰 예산을 다 쓰면 본 작업 추론에 쓸 토큰이 부족해진다. 비슷한 도구는 묶어 별도 에이전트로 캡슐화해야 한다.

책임 혼재

검색·분류·감사·요약을 모두 한 에이전트에 묶으면 프롬프트가 비대해진다. 한 도메인 변경이 다른 도메인 동작에 영향을 미친다. 도메인별로 에이전트를 분리하면 변경의 폭발 반경이 줄어든다.

컨텍스트 분산 / 누적 토큰

ReAct messages가 누적되면 도구 호출 N회마다 토큰이 선형 증가한다. 위임은 sub-agent의 메시지를 요약된 결과만 부모로 전파하므로 부모 그래프의 messages가 가볍게 유지된다.

팀 단위 소유권

50명·1000명 단계에서 모든 도구·프롬프트를 한 곳에 두면 변경 충돌이 잦다. 에이전트 단위로 소유 팀을 가르면 독립 배포·롤백·실험이 가능해진다 (Phase C-7 스킬 생태계와 연결).

3 MINERVA 현재 상태 — 이미 정적 Supervisor

DataStandardizerAgent는 LangGraph 없이도 이미 supervisor 구조를 가지고 있다 (02·08·10편 참조).

# src/agents/data_standardizer/agent.py — 단순화
class DataStandardizerAgent(BaseAgent):
    name = "data_standardizer"

    def run(self, query: Query) -> Response:
        recommender = self._ensure_recommender()      # sub_agent 1
        mode = _resolve_mode(query)                    # 정적 라우팅
        raw, docs = recommender.run(query.text, mode=mode)

        processed = _apply_post_processing(raw, ...)   # sub_agents/post_processing 호출
        if _is_full_format(processed):
            processed = self._auditor.audit_and_fix(processed)  # sub_agent 2

        return self._build_response(processed, docs, query, ...)

이미 가지고 있는 자산:

RagRecommender sub_agent (RAG + LLM)
DomainAuditor sub_agent (LLM + 규칙 기반)
post_processing/{tables,code} 함수 모듈 (ALBERT 분류기 + 약어 사전)

부족한 것:

supervisor 라우팅이 정적 if/else — LLM이 동적으로 결정하지 않음
sub_agent들이 함수 호출로 묶여 있음 — 독립 배포·교체 불가
sub_agent 간 메시지 흐름이 암묵적 — processed 텍스트를 그대로 다음에 넘김

LangGraph Multi-Agent로 전환하면 위 자산을 그대로 살리면서 부족한 부분을 채울 수 있다.

4 LangGraph Supervisor 패턴 — 골격

상위 그래프의 노드 하나가 하위 에이전트의 그래프를 호출한다.

# 주요 패키지 (2026년 기준)
# langchain >= 1.1, langgraph >= 1.0

from langgraph.graph import StateGraph, MessagesState, START, END
from langchain_openai import AzureChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from typing import Literal
from dotenv import load_dotenv

load_dotenv()
llm = AzureChatOpenAI(model="gpt-4.1", temperature=0.0)

4.1 하위 에이전트 — 각자 독립 그래프

각 sub-agent가 자신만의 그래프를 가진다.

class RetrievalAgent(BaseAgent):
    name = "retrieval"
    state_schema = RetrievalState

    def __init__(self, config):
        self.graph = build_retrieval_subgraph(config)

    def invoke(self, payload: dict) -> dict:
        """Supervisor가 호출하는 표준 엔트리."""
        return self.graph.invoke(payload)


class AuditorAgent(BaseAgent):
    name = "auditor"
    state_schema = AuditorState

    def __init__(self, config):
        self.graph = build_auditor_subgraph(config)

    def invoke(self, payload: dict) -> dict:
        return self.graph.invoke(payload)

각 BaseAgent v2(02-1편)가 graph 속성을 노출하므로 supervisor는 sub-agent를 같은 인터페이스로 호출한다.

4.2 Supervisor — 라우팅 LLM

Supervisor 노드는 LLM에게 “다음으로 어느 하위 에이전트를 호출할지 또는 종료할지”를 묻는다.

class SupervisorState(MessagesState):
    next_agent: str | None
    final_response: str | None


_SUPERVISOR_PROMPT = (
    "당신은 데이터 표준화 supervisor다. "
    "사용자 요청을 보고 다음 워커를 선택한다.\n"
    "- recommender: 표준 약어·도메인 RAG 추천이 필요할 때\n"
    "- auditor: 표 형식 답변의 도메인 우선순위 검증이 필요할 때\n"
    "- post_processor: 물리명 부여·표 정렬이 필요할 때\n"
    "- FINISH: 충분한 정보가 모였으면 종료\n"
    "한 번에 한 워커만 선택한다."
)


def supervisor_node(state: SupervisorState) -> dict:
    messages = [
        {"role": "system", "content": _SUPERVISOR_PROMPT},
        *state["messages"],
    ]
    response = llm.with_structured_output(_RouterSchema).invoke(messages)
    return {"next_agent": response.next}

_RouterSchema는 Pydantic으로 정의된 Literal["recommender", "auditor", "post_processor", "FINISH"] enum이다. 구조화된 출력을 강제해 supervisor가 자유 텍스트로 답하지 않도록 한다.

4.3 워커 노드 — sub-agent 호출 + 결과를 messages로 흘리기

def make_worker_node(agent: BaseAgent, name: str):
    def worker_node(state: SupervisorState) -> dict:
        # supervisor가 결정한 워커가 본인이면 sub-agent 호출
        result = agent.invoke({"messages": state["messages"]})
        # sub-agent의 결과를 부모 messages에 AIMessage로 추가
        return {
            "messages": [AIMessage(content=result["output"], name=name)],
            "next_agent": None,  # 다시 supervisor로
        }
    return worker_node

워커가 끝나면 결과를 부모 messages에 추가하고 supervisor로 돌아간다. supervisor는 새 메시지를 보고 다음 워커를 결정하거나 FINISH로 종료한다.

4.4 그래프 조립

recommender_node = make_worker_node(RetrievalAgent(config), "recommender")
auditor_node = make_worker_node(AuditorAgent(config), "auditor")
post_processor_node = make_worker_node(PostProcessorAgent(config), "post_processor")


def route_supervisor(state: SupervisorState) -> str:
    nxt = state["next_agent"]
    if nxt == "FINISH":
        return END
    return nxt


graph = StateGraph(SupervisorState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("recommender", recommender_node)
graph.add_node("auditor", auditor_node)
graph.add_node("post_processor", post_processor_node)

graph.add_edge(START, "supervisor")
graph.add_conditional_edges(
    "supervisor",
    route_supervisor,
    {
        "recommender": "recommender",
        "auditor": "auditor",
        "post_processor": "post_processor",
        END: END,
    },
)
# 각 워커가 끝나면 supervisor로 복귀
graph.add_edge("recommender", "supervisor")
graph.add_edge("auditor", "supervisor")
graph.add_edge("post_processor", "supervisor")

app = graph.compile()

이 그래프는 ReAct 루프(C12)와 비슷하지만 노드 하나가 한 sub-agent 전체다. supervisor가 “도구”가 아니라 “에이전트”를 선택한다.

5 State 분리 — 부모와 sub-graph

sub-agent는 자신만의 State를 가진다. 부모 그래프와 어떻게 격리할까?

5.1 분리하지 않으면 — 단일 거대 State

class GiantState(TypedDict):
    query: Query
    child_docs: list[Document]
    parent_docs: list[Document]
    raw_recommendation: str
    domain_classifications: list[dict]
    physical_names: dict[str, str]
    audited_table: str
    audit_corrections: list[dict]
    final_response: Response
    # ... 50개 필드

15편 — State 설계에서 본 부풀어 오른 State 안티패턴이다. 모든 sub-agent가 같은 dict를 공유하면 책임 경계가 흐려진다.

5.2 분리 — Subgraph + 명시적 변환

각 sub-agent가 자신의 State를 가지고, 부모는 입력·출력만 변환해 전달한다.

class SupervisorState(MessagesState):
    next_agent: str | None
    final_response: str | None


class RetrievalState(TypedDict):
    query_text: str
    docs: list[Document]
    citations: list[Citation]


def make_recommender_node(agent: BaseAgent):
    def recommender_node(state: SupervisorState) -> dict:
        # 부모 → sub-agent 입력 변환
        sub_input = {"query_text": state["messages"][-1].content}
        sub_result = agent.invoke(sub_input)
        # sub-agent 출력 → 부모 messages 변환
        summary = _format_recommendation(sub_result["docs"])
        return {
            "messages": [AIMessage(content=summary, name="recommender")],
            "next_agent": None,
        }
    return recommender_node

이 변환 함수가 sub-agent 내부 자료 구조를 부모에게 노출하지 않는 방화벽 역할을 한다. sub-agent가 내부 필드를 추가해도 변환 함수만 손대면 부모는 영향 없다.

6 메시지 vs 명시적 핸드오프

워커끼리 정보를 어떻게 전달할까? 두 가지 선택.

6.1 메시지 방식 (자연어)

# recommender의 출력을 자연어로 messages에 추가
return {
    "messages": [AIMessage(
        content="환자 식별 컬럼은 PT_ID로 표준화. 도메인 그룹 후보: 환자정보.",
        name="recommender",
    )],
}

장점: supervisor가 자연어를 읽고 다음 결정 — 유연
단점: 자연어 파싱 오류 가능, 토큰 비용

6.2 명시적 핸드오프 (구조화)

class HandoffPayload(BaseModel):
    physical_name: str
    domain_candidates: list[str]
    confidence: float


# State에 별도 필드로 핸드오프
class SupervisorState(MessagesState):
    next_agent: str | None
    handoff: HandoffPayload | None

장점: 구조화 데이터로 정확
단점: sub-agent마다 핸드오프 schema 정의 필요, 새 sub-agent 추가 시 부모 State 수정

MINERVA 권장: 작은 도메인(2~3 sub-agent)은 메시지 방식, 큰 시스템(5+ sub-agent + 정확한 데이터 흐름 필요)은 핸드오프. 둘을 혼합해도 된다 — supervisor 라우팅은 메시지로, 워커 결과는 구조화 핸드오프로.

7 MINERVA 시나리오 — DataStandardizer Supervisor 운영

질문: “환자 식별 컬럼의 표준 약어와 도메인을 알려주고, 답변 표는 도메인 우선순위로 정렬해줘”

START
  ▼
[supervisor] LLM 결정 → "recommender"
  ▼
[recommender] RAG + LLM → "PT_ID 표준화, 환자정보 도메인 후보"
  ▼ (messages에 결과 추가)
[supervisor] LLM 결정 → "post_processor"  (표 정렬 필요)
  ▼
[post_processor] 표 정렬 + 물리명 부여
  ▼ (messages 갱신)
[supervisor] LLM 결정 → "auditor"  (도메인 우선순위 검증)
  ▼
[auditor] 우선순위 규칙 + LLM 검증 → 수정 사항
  ▼ (messages 갱신)
[supervisor] LLM 결정 → "FINISH"
  ▼
END (final messages가 사용자 답변)

핵심 개선:

supervisor가 호출 순서를 동적으로 결정 — 정적 if/else가 사라짐
각 워커가 독립 그래프 — RetrievalAgent를 다른 에이전트(QnA)에서도 재사용 가능
메시지가 자료 구조 — runs.jsonl/Checkpointer가 자연스럽게 캡처

8 관찰성 — 멀티 에이전트의 추적 단위

단일 에이전트의 [timing] 로그는 노드 단위였지만, 멀티 에이전트는 에이전트별 + 노드별 2차원으로 추적된다.

{"timestamp": "...", "agent": "supervisor", "node": "supervisor", "ms": 412}
{"timestamp": "...", "agent": "retrieval", "node": "retrieve_node", "ms": 187}
{"timestamp": "...", "agent": "retrieval", "node": "rerank_node", "ms": 56}
{"timestamp": "...", "agent": "supervisor", "node": "supervisor", "ms": 320}
{"timestamp": "...", "agent": "auditor", "node": "audit_node", "ms": 1840}
{"timestamp": "...", "agent": "supervisor", "node": "supervisor", "ms": 280}

운영 대시보드에서 다음 질문에 답할 수 있어야 한다.

어느 에이전트가 평균 가장 느린가? (병목 식별)
어느 에이전트가 가장 자주 호출되는가? (핫스팟)
supervisor가 워커 라우팅에 평균 몇 회 호출되는가? (라우팅 비용)
특정 워커가 실패해도 supervisor가 자연스럽게 우회하는가? (resilience)

이 메트릭은 09편 상태 관리·06편 A/B 실험의 runs.jsonl/monitoring/ab 인프라를 그대로 재사용한다 — 단지 agent_name 필드가 워커별로 분기될 뿐이다.

9 Hierarchical — 위임의 위임

team_supervisor → team_a_supervisor → worker_a1, a2, ... 형태로 supervisor 자체를 워커로 둘 수 있다. MINERVA가 50명 단계로 가면 다음 구조가 가능하다.

[platform_supervisor]
  ├── [data_team_supervisor]
  │     ├── retrieval_agent
  │     ├── auditor_agent
  │     └── post_processor_agent
  │
  ├── [insilico_team_supervisor]
  │     ├── ast_parser_agent
  │     ├── code_search_agent
  │     └── docstring_agent
  │
  └── [governance_team_supervisor]
        ├── policy_agent
        └── compliance_agent

각 팀 supervisor는 자기 도메인의 워커만 라우팅한다. 플랫폼 supervisor는 팀만 라우팅한다 — 팀의 내부 워커는 모름. 이 격리가 팀 단위 소유권 모델의 토대다 (Phase C-10 플랫폼 스케일링과 연결).

10 응용 분야

분야	Multi-Agent가 빛나는 시나리오
MINERVA Data Standardizer	RetrievalAgent + AuditorAgent + PostProcessorAgent
MINERVA QnA Chatbot	검색 에이전트 + 답변 생성 에이전트 (분리)
MINERVA Insilico (미작성)	AST 파서 + 의존성 검색 + 문서 검색 + 요약 — 각 도메인 분리
거버넌스 보고서	데이터 수집 + 정책 검증 + 자동 리포트
멀티 도메인 챗봇	HR, 표준화, 코드 분석 — 도메인 supervisor 라우팅
엔드투엔드 분석 파이프라인	추출 + 변환 + 검증 + 적재

11 마이그레이션 — DataStandardizer를 Multi-Agent로

C11~C13의 진화 단계 위에 한 단계 더 올린다.

단계	작업	영향
1	현 sub_agent를 BaseAgent v2로 래핑 (RetrievalAgent, AuditorAgent, PostProcessorAgent)	동작 변화 없음, 인터페이스만 통일
2	각 sub_agent의 그래프(sub-graph) 구현 (15편 분리 기준 적용)	단위 테스트 표면적 좁아짐
3	Supervisor LLM 도입 — 정적 if/else를 LLM 라우팅으로 교체	A/B arm으로 격리, 라우팅 정확도 측정
4	메시지 vs 핸드오프 결정 — sub-agent별 schema 정의	명시성 ↑
5	관찰성 분리 — `runs.jsonl`에 `agent_name`별 메트릭 추가	병목·실패 패턴 식별
6	Hierarchical 도입 (50명 단계 도달 시) — 팀별 supervisor	소유권 분리, 변경 폭발 반경 ↓

각 단계가 BaseAgent v2 계약(02-1편) 위에서 점진적으로 진행된다.

12 자주 발생하는 오류 패턴

WRONG:

# supervisor가 항상 워커를 선택하도록 강제
graph.add_conditional_edges(
    "supervisor",
    lambda s: s["next_agent"],  # FINISH 케이스 없음
    {"recommender": "recommender", "auditor": "auditor"},
)

CORRECT:

def route_supervisor(state):
    if state["next_agent"] == "FINISH":
        return END
    return state["next_agent"]

graph.add_conditional_edges(
    "supervisor",
    route_supervisor,
    {"recommender": "recommender", "auditor": "auditor", END: END},
)

supervisor 프롬프트에 “FINISH”를 옵션으로 명시하고, 라우팅 함수가 END로 분기하도록 만든다. 그러지 않으면 무한 루프 (supervisor → worker → supervisor → …)가 발생한다.

WRONG:

# sub-agent가 부모 State 직접 변경
def worker_node(state):
    state["raw_internal_field"] = ...  # 부모 State에 sub-agent 내부 필드 노출
    return {...}

CORRECT:

def worker_node(state):
    sub_result = agent.invoke({"query": state["messages"][-1].content})
    summary = _format_for_parent(sub_result)  # 변환 함수가 방화벽 역할
    return {"messages": [AIMessage(content=summary, name=agent.name)]}

sub-agent의 내부 필드가 부모 State에 그대로 흘러가면 sub-agent 내부 변경이 부모 그래프를 깨뜨린다. 변환 함수로 격리한다.

WRONG:

return {"messages": [AIMessage(content=summary)]}  # 어느 에이전트가 만든 메시지인지 불명

CORRECT:

return {"messages": [AIMessage(content=summary, name="recommender")]}

supervisor가 다음 워커를 결정할 때 이전 메시지를 누가 만들었는지 알아야 한다. name 필드를 채우면 라우팅 정확도가 올라가고 관찰성 메트릭이 명확해진다.

13 정리

항목	핵심
패턴	Supervisor (centralized), Collaboration (peer-to-peer), Hierarchical
단위	각 에이전트가 BaseAgent v2 인스턴스 + 자기 그래프
State	Supervisor는 MessagesState + next_agent, sub-agent는 독립 State
전이 매개	자연어 메시지 또는 구조화 핸드오프 schema
관찰성	agent_name + node 2차원 메트릭, runs.jsonl 그대로 활용
MINERVA 적용	DataStandardizer를 Recommender + Auditor + PostProcessor 분리
종료 조건	supervisor 프롬프트에 FINISH 명시 + 라우팅 END
마이그레이션	sub_agent 래핑 → sub-graph → Supervisor LLM → 핸드오프 → 관찰성 → Hierarchical

위임은 단일 에이전트가 책임 폭증으로 무너지기 직전에 도입한다. MINERVA의 DataStandardizer는 이미 정적 supervisor라 LangGraph Multi-Agent로의 전환이 자연스럽고, BaseAgent v2 계약(02-1편) 위에서 점진적으로 옮겨갈 수 있다. 50명·1000명 단계로 가는 길에서 Hierarchical 패턴이 팀 단위 소유권의 토대가 된다 (Phase C-10).

14 Phase C-3 마무리

Phase C-3 네 편(C11~C14)으로 Agentic Mode를 다뤘다.

C11 Tool Binding: 도구를 LLM에 부착
C12 ReAct: 도구 사용을 사이클로 반복
C13 Plan-and-Execute: 사이클을 사전 계획으로 압축
C14 (이 글) 에이전트 위임: 단일 에이전트를 멀티 에이전트로 분해

이 네 패턴은 누적이 아니라 선택지다. 운영에서는 작업 복잡도에 따라 골라 쓴다. 다음 Phase C-4부터는 이 자율적 에이전트들의 품질을 어떻게 측정하고 개선할지 — 실험·발화 분석·관측성을 본격적으로 다룬다.

15 관련 주제

선행 지식 (같은 시리즈)

Tool Binding – 도구 등록의 출발점
ReAct 루프 – 사이클 단위 도구 호출
멀티스텝 플래닝 – 사전 계획 후 실행
BaseAgent 계약 v2 – 모든 에이전트의 공통 인터페이스
State 설계 – Subgraph 분리 기준

후속 주제 (Phase C-4 ~ C-10)

A/B 테스트 심화 (작성 예정) – supervisor 라우팅 정확도 측정
지능형 라우팅 (작성 예정) – Thompson Sampling으로 라우팅 학습
관측성 설계 (작성 예정) – agent_name 분기 메트릭

다른 카테고리 연결

LangGraph Multi-Agent Supervisor – 본 패턴의 reference 구현
LangGraph Multi-Agent Collaboration – peer-to-peer 변형
LangGraph Hierarchial Agent Team – 트리 구조
LangGraph Subgraph – sub-graph 캡슐화 기초
멀티에이전트 설계 패턴 – 이론적 배경
AWS Deep Insight 멀티에이전트 – 산업 사례