Agentic RAG로 “스스로 찾아오고, 스스로 검증하는” 자율 에이전트 구현법 (2026년 3월 기준)

게시 2026/03/02

By Daewook Kwon

14 분읽는 시간

들어가며

기존 RAG는 보통 query → retrieve → generate의 단선형 파이프라인입니다. 문제는 현실의 질문이 그렇게 단순하지 않다는 데 있습니다. 예를 들어 “내부 문서 기준으로 답하되, 최신 변경사항은 웹에서 확인해줘” 같은 요청은 (1) 질의 분해, (2) 검색 소스 선택, (3) 결과 품질 평가, (4) 실패 시 재시도/대안 탐색이 필요합니다.

그래서 2025~2026 흐름에서 “Agentic RAG(AgRAG)”가 뜬 이유는 명확합니다. RAG를 ‘고정된 단계’가 아니라, 에이전트가 필요할 때 꺼내 쓰는 ‘도구(tool)’로 승격시키는 겁니다. LangGraph 같은 stateful orchestration, LlamaIndex의 agentic strategies, 그리고 Qdrant의 Agentic RAG 레퍼런스가 공통으로 강조하는 포인트도 동일합니다: 계획-실행-평가-수정 루프가 핵심입니다. (qdrant.tech)

🔧 핵심 개념

1) Agentic RAG 정의

Agentic RAG는 “LLM이 retrieval 전략을 스스로 세우고(planning), 여러 도구를 오케스트레이션(tool orchestration)하며, 중간 결과를 평가/수정(adaptation)하는” RAG입니다. 즉 RAG는 더 이상 메인 파이프라인이 아니라 에이전트 루프의 일부 도구가 됩니다. (agentic-design.ai)

2) 왜 ‘Graph(상태 머신)’가 필요한가

자율 에이전트의 본질은 반복(iteration) 입니다. “한 번 검색해서 끝”이 아니라,

문서 검색 결과가 부정확하면 query rewrite 후 재검색
내부 문서로 부족하면 web search로 fallback
최종 답변이 근거 부족이면 추가 증거를 더 모으기
같은 분기가 필수입니다.

이때 LangGraph류 접근이 유리한 이유는 (a) 상태(state) 저장, (b) 노드 재사용, (c) 조건 분기, (d) 관찰/디버깅(트레이싱)이 구조적으로 가능하기 때문입니다. 특히 Corrective/Adaptive RAG 패턴(관련도 grading 후 재시도, 시간민감 질의는 web로 라우팅 등)이 대표적입니다. (leanware.co)

3) “평가(Evaluator) + 가드레일”이 에이전트의 브레이크다

Agentic RAG에서 제일 위험한 건 무한 루프와 비용 폭발, 그리고 “그럴듯한데 근거 없는 합성”입니다. 그래서 패턴 문서들이 공통으로 말하는 Best Practice는:

iteration/토큰/시간 상한
source credibility/관련도 점수화
실패 시 대안 전략(다른 retriever, 다른 키워드, web fallback)
provenance(출처 추적)
입니다. (agentic-design.ai)

4) 2026년 3월 구현 트렌드 요약(실무 관점)

단일 에이전트 + 툴셋(retriever/web/evaluator/router)로 시작하고,
필요해지면 multi-agent orchestration(planner가 executor들을 조합)으로 확장하는 흐름이 자연스럽습니다. 실제로 multi-agent로 adaptive workflow를 계획하는 연구도 나와 있습니다. (arxiv.org)

💻 실전 코드

아래 예제는 “Agentic RAG 자율 검색 에이전트”의 최소 실전형입니다.

벡터 DB(Qdrant)에서 1차 검색
검색 결과를 LLM이 관련도 평가(grading)
부족하면 query rewrite 후 재검색
그래도 부족하면 “웹 검색 필요”로 종료(현업에서는 여기서 web_search tool 호출로 연결)

실행 전 준비: pip install langgraph langchain-openai qdrant-client
Qdrant는 로컬/클라우드 어느 쪽이든 가능(예제는 URL만 가정)

  
import os
from typing import TypedDict, List, Literal, Optional

from qdrant_client import QdrantClient
from langchain_openai import ChatOpenAI
from langchain_core.documents import Document
from langgraph.graph import StateGraph, END

# ----------------------------
# 1) State 정의: 에이전트의 "작업 메모리"
# ----------------------------
class AgentState(TypedDict):
    query: str
    rewritten_query: Optional[str]
    docs: List[Document]
    decision: Literal["answer", "rewrite", "need_web"]
    answer: Optional[str]
    tries: int


# ----------------------------
# 2) 외부 의존 컴포넌트
# ----------------------------
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)  # 모델은 환경에 맞게 교체

qdrant = QdrantClient(
    url=os.environ.get("QDRANT_URL", "http://localhost:6333"),
    api_key=os.environ.get("QDRANT_API_KEY", None),
)

COLLECTION = os.environ.get("QDRANT_COLLECTION", "docs")


def qdrant_retrieve(query: str, top_k: int = 5) -> List[Document]:
    """
    NOTE: 실제로는 query embedding + search를 해야 합니다.
    여기서는 '이미 Qdrant에 text payload 기반 검색이 준비됐다'는 가정으로 단순화합니다.
    (현업: embeddings 모델, payload 구조, rerank까지 붙이세요)
    """
    hits = qdrant.search(
        collection_name=COLLECTION,
        query_vector=[0.0] * 1536,  # TODO: 실제 임베딩 벡터로 대체
        limit=top_k,
        with_payload=True,
    )
    docs = []
    for h in hits:
        text = (h.payload or {}).get("text", "")
        source = (h.payload or {}).get("source", "unknown")
        docs.append(Document(page_content=text, metadata={"source": source, "score": h.score}))
    return docs


# ----------------------------
# 3) 노드: retrieve
# ----------------------------
def node_retrieve(state: AgentState) -> AgentState:
    q = state["rewritten_query"] or state["query"]
    docs = qdrant_retrieve(q, top_k=6)
    return {**state, "docs": docs}


# ----------------------------
# 4) 노드: grade(관련도/충분성 평가)
# ----------------------------
def node_grade(state: AgentState) -> AgentState:
    q = state["rewritten_query"] or state["query"]
    docs = state["docs"]

    # LLM에게 "충분하면 answer, 애매하면 rewrite, 내부 문서로 불가면 need_web"를 선택시키는 evaluator
    prompt = f"""
You are an evaluator for an agentic RAG system.
Given a user query and retrieved documents, decide next action:
- "answer": documents are sufficient and relevant
- "rewrite": documents are weak; rewrite the query and try again
- "need_web": internal docs clearly insufficient or time-sensitive; web search is needed

Return ONLY one word: answer | rewrite | need_web

Query: {q}

Docs (snippets):
{chr(10).join([f"- ({d.metadata.get('source')}) {d.page_content[:240]}" for d in docs])}
""".strip()

    decision = llm.invoke(prompt).content.strip()
    if decision not in ("answer", "rewrite", "need_web"):
        decision = "rewrite"  # 방어적 기본값

    return {**state, "decision": decision}


# ----------------------------
# 5) 노드: rewrite(질의 재작성)
# ----------------------------
def node_rewrite(state: AgentState) -> AgentState:
    q = state["rewritten_query"] or state["query"]
    tries = state["tries"] + 1

    prompt = f"""
Rewrite the query to improve retrieval.
- Add specific keywords, synonyms
- Disambiguate intent
- Keep it concise
Return ONLY the rewritten query.

Original query: {q}
""".strip()

    rq = llm.invoke(prompt).content.strip()
    return {**state, "rewritten_query": rq, "tries": tries}


# ----------------------------
# 6) 노드: answer(근거 기반 생성)
# ----------------------------
def node_answer(state: AgentState) -> AgentState:
    q = state["rewritten_query"] or state["query"]
    docs = state["docs"]

    context = "\n\n".join(
        [f"[source={d.metadata.get('source')}, score={d.metadata.get('score')}] {d.page_content}" for d in docs]
    )

    prompt = f"""
You are a senior engineer writing a grounded answer.
Use ONLY the provided context. If insufficient, say so.

Query: {q}

Context:
{context}

Answer in Korean, keep technical terms in English.
""".strip()

    ans = llm.invoke(prompt).content.strip()
    return {**state, "answer": ans}


# ----------------------------
# 7) 그래프 구성: retrieve -> grade -> (answer | rewrite | need_web)
# ----------------------------
MAX_TRIES = 2

def route_after_grade(state: AgentState):
    if state["decision"] == "answer":
        return "answer"
    if state["decision"] == "need_web":
        return "need_web"
    # rewrite
    if state["tries"] >= MAX_TRIES:
        return "need_web"  # 무한루프 방지: 일정 횟수 넘으면 web 필요로 전환
    return "rewrite"


graph = StateGraph(AgentState)
graph.add_node("retrieve", node_retrieve)
graph.add_node("grade", node_grade)
graph.add_node("rewrite", node_rewrite)
graph.add_node("answer", node_answer)

graph.set_entry_point("retrieve")
graph.add_edge("retrieve", "grade")
graph.add_conditional_edges(
    "grade",
    route_after_grade,
    {
        "answer": "answer",
        "rewrite": "rewrite",
        "need_web": END,  # 여기서 실제 제품은 web_search tool 노드를 연결
    },
)
graph.add_edge("rewrite", "retrieve")
graph.add_edge("answer", END)

app = graph.compile()


if __name__ == "__main__":
    init_state: AgentState = {
        "query": "Agentic RAG 자율 에이전트를 구현하는 방법과 아키텍처 패턴을 알려줘",
        "rewritten_query": None,
        "docs": [],
        "decision": "rewrite",
        "answer": None,
        "tries": 0,
    }

    result = app.invoke(init_state)
    print("Decision:", result["decision"])
    print("Tries:", result["tries"])
    print("Rewritten:", result["rewritten_query"])
    print("Answer:", result["answer"])

핵심은 “retrieval”이 아니라 grade → rewrite → retry가 들어가면서 에이전트가 “자기 검색을 스스로 개선”하기 시작한다는 점입니다. LangGraph 레퍼런스들이 말하는 Corrective/Adaptive RAG를 최소 형태로 구현한 셈입니다. (leanware.co)

⚡ 실전 팁

1) retrieval quality를 ‘에이전트가’ 측정하게 하지 말고, evaluator를 분리

generation LLM과 evaluator LLM을 분리하거나(모델/프롬프트 분리),
evaluator는 “근거 충분성/출처 신뢰도/최신성 요구”만 판단하게 만드세요.
이게 hallucination을 눈에 띄게 줄입니다. (Corrective RAG의 요지) (leanware.co)

2) 루프 종료 조건을 코드 레벨에서 강제 Agentic RAG 패턴 문서가 경고하는 대표 함정이 “uncontrolled loops”입니다. MAX_TRIES, 시간 제한, 비용 제한(토큰/툴콜 횟수)을 반드시 걸어야 합니다. (agentic-design.ai)

3) fallback 전략을 계층화 현업에서 추천하는 우선순위는 보통:

(1) internal vector store (정책/사내 지식)
(2) internal DB/API (정합성 높은 정형 데이터)
(3) web search (최신성/외부 근거)
그리고 이 라우팅 자체가 “Adaptive RAG”의 핵심입니다. (leanware.co)

4) multi-agent는 “필요할 때”만 연구/사례에서 planner가 여러 executor를 조합하는 multi-agent adaptive RAG가 제안되지만, 운영 복잡도(관측성, 비용, 디버깅)가 크게 증가합니다. 우선은 단일 agent + graph로 충분히 이득을 보고, 병목이 생길 때만 확장하는 게 안전합니다. (arxiv.org)

5) 프로덕션에서는 provenance(출처) 메타데이터가 제품 신뢰를 결정 문서 chunk마다 source, timestamp, doc_version, access_scope를 메타데이터로 강제하고, 최종 답변에도 근거를 남겨야 감사/검증이 가능합니다. “에이전트가 멋지게 말함”보다 “어디서 가져왔는지”가 더 중요합니다. (agentic-design.ai)

🚀 마무리

Agentic RAG 자율 에이전트의 본질은 “RAG를 고도화”가 아니라 retrieval을 스스로 계획/평가/수정하는 루프를 만든 것입니다. 2026년 3월 시점에서 실무적으로 가장 재현성 높은 접근은:

LangGraph 같은 stateful orchestration으로 Adaptive/Corrective RAG 루프를 만들고, (leanware.co)
evaluator와 종료 조건으로 안전한 자율성을 확보하는 것입니다. (agentic-design.ai)

다음 학습 추천:

“Corrective RAG / Adaptive RAG” 패턴을 그래프로 직접 구현해보기(라우팅 + 재시도 정책 튜닝)
multi-agent planner-executor 구조(필요 시)에 대한 최신 연구 흐름 따라가기 (arxiv.org)
(제품 단계) web search / file search 같은 tool 기반 RAG를 붙여 외부 최신성 + 내부 근거를 동시에 만족시키기 (theverge.com)

AI, Agent

ai agent trend 2026-03