[리서치] 멀티 에이전트 LLM 시스템에서 스레드 컨텍스트 공유 패턴

Summary

Pantheon은 페르소나마다 독립 봇으로 운영되어 각 페르소나가 같은 Slack 스레드의 다른 페르소나 발화를 보지 못하는 구조다. 이는 산업 전반에서 흔한 문제이고, 해결 패턴은 크게 5가지로 수렴한다 — GroupChat manager (AutoGen), TypedDict 공유 state + checkpointer (LangGraph), context_variables 명시 전달 (OpenAI Swarm), orchestrator-worker + 파일시스템 (Anthropic), 공유 blackboard (학술 + 산업 융합). 우리 케이스(Slack 멀티 봇 + 비동기 인간 진행)에는 호출 직전 thread 컨텍스트 주입 + blackboard 진화 경로가 가장 fit하다.

배경 / 질문

Pantheon은 #northstar 채널에서 jarvis·raphael·wansu·jini·nano 등 여러 페르소나를 별도 Slack 봇으로 운영한다. 현재 한계:

각 페르소나는 *자기에게 직접 멘션된 메시지(또는 mention 사용자 발화)*만 컨텍스트에 받는다.
같은 스레드에서 다른 페르소나가 무슨 말을 했는지 안 보인다 → "다른 페르소나 의견에 동의하는가" 류 질문에 답이 불가능하다.
임시 우회: 매 응답 직전 slack_read_thread 호출로 스레드 전체 읽기. 비효율적, 일관성 보장 안 됨.

질문: 다른 시스템에서는 이 문제를 어떻게 해결하는가? Pantheon에 어떤 패턴이 fit한가?

조사 내용

1. 멀티 에이전트 LLM 프레임워크의 공유 컨텍스트

#### AutoGen GroupChat (Microsoft)

모든 agent가 단일 groupchat.messages를 공유한다 — *모든 발화가 모든 에이전트에게 보임*.
GroupChatManager가 매 턴 다음 speaker를 선택 → broadcast → 다음 speaker 선정 반복.
SelectorGroupChat은 LLM이 shared context를 보고 다음 speaker를 동적으로 선정한다.
Pantheon 매핑: GroupChatManager가 "어떤 페르소나가 응답해야 하는가"를 결정하는 라우터 역할.

#### CrewAI Shared Memory

memory=True 시 crew 단위 shared memory가 자동 활성화.
각 task 실행 전 관련 context를 prompt에 inject, 실행 후 facts를 자동 추출 저장.
2026년에 Cognitive Memory(단일 Memory class, semantic + recency + importance 복합 점수)로 통합됨.
한계: crew는 task-oriented sync 모델 — 비동기 인간 대화엔 자연스럽지 않음.

#### LangGraph State + Checkpointer

StateGraph의 모든 노드가 단일 TypedDict state 공유.
add_messages reducer로 메시지 누적 (replace가 아닌 append).
Checkpointer (PostgresSaver, MemorySaver)는 thread_id 단위로 state를 영속화 → 인터럽트/재개 가능.
강점: thread별 상태를 명시적으로 관리, 감사·time travel 지원.

#### OpenAI Swarm

Stateless, 두 primitive: routines + handoffs.
context_variables라는 명시 dict를 매 호출에 전달 — *hidden state 없음*.
함수가 context_variables를 받고 수정해 반환 가능.
강점: 단순, 명시적. 약점: 추상화 얕아 대규모 운영엔 직접 빌드 부담.

#### Anthropic Multi-Agent Research System (Claude Code 기반)

Orchestrator-Worker 패턴: Lead Researcher가 계획 수립 → subagent들이 병렬 실행.
각 subagent는 *독립 context window*를 가짐, structured result만 orchestrator로 반환.
Coordination은 **파일시스템 + tool call results**가 핵심.
성능: Opus 4 lead + Sonnet 4 subagents 조합이 단일 Opus 4 대비 +90.2% 성능 (token usage가 분산 80% 설명).

2. Slack/Discord 멀티 봇 운영 사례

#### Redis distributed locking + event idempotency

같은 Slack event ID는 SET NX로 중복 처리 차단.
per-thread short-lived lease로 동시 처리 race condition 차단.
멀티 워커가 같은 스레드에 응답하는 환경에서 production-safety 기본.

#### Multi-agent bot coordination (e.g. OpenClaw)

allowBots: true — 각 봇이 *다른 봇의 메시지를 context로 소비*하도록 명시 허용.
requireMention: true — bot loop 방지 (멘션 시에만 응답).
centralized state store(보통 Redis)로 thread 별 컨텍스트 캐시.
Slack의 권장은 "원하는 thread 슬라이스만 추출해 structured state로 저장 → 매 응답 LLM에 inject" (re-query 최소화).

3. 대안 아키텍처 패턴

#### Blackboard architecture (Hearsay-II → LLM 부활)

1970년대 음성인식 시스템 Hearsay-II에서 출발한 디자인 패턴.
에이전트가 직접 통신하지 않고 *공유 blackboard*에 read/write — 디커플링.
arxiv 2507.01701 (Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture), 2510.01285 (Information Discovery in Data Science) 두 논문이 LLM 적용 최신 결과 보고: static/dynamic 멀티 에이전트와 경쟁 + *토큰 사용량 감소*.
패턴: 모든 발화는 blackboard에 events로 기록 → controller가 blackboard 상태를 보고 다음 발화자(페르소나) 선정 → 호출 시 *relevant chunk만 inject*.

#### Event sourcing + INMS

모든 발화/결정/handoff를 immutable event log로 저장 → reasoning까지 보존.
INMS (Interactive Memory Sharing, arxiv 2404.09982): 비동기 multi-agent shared memory pool, 실시간 filtering/storage/retrieval.
충돌 처리는 CRDTs (Conflict-free Replicated Data Types) 활용.

#### Distillation + Structured Context Objects

Google Developers Blog 권장: orchestrator가 typed context object를 들고, worker에게 *필요한 필드만 전달*.
"전체 history dump"는 토큰 낭비 → 구조화된 슬라이스가 효율적.

4. 관련 논문·1차 출처

Anthropic blog: "How we built our multi-agent research system" (2025).
arxiv 2507.01701 / 2510.01285: LLM blackboard 시스템.
arxiv 2404.09982: INMS (memory sharing framework).
Slack Developer Docs — Context management (docs.slack.dev/ai/agent-context-management/).
LangChain docs — Graph API, State, Checkpointer.

발견 사항

패턴	컨텍스트 공유 방식	장점	단점	Pantheon 적용도
AutoGen GroupChat	단일 messages list + Manager 선정	모든 발화 보장 공유	중앙 Manager 병목, scale ↓	중 (Manager = 라우터로 매핑 가능)
CrewAI shared memory	crew 단위, 자동 추출/주입	자동화 강함	task-oriented sync 모델	낮음 (비동기 대화 부적합)
LangGraph State + Checkpointer	TypedDict + thread_id 영속화	감사·time travel·재개	학습곡선, 그래프 사전 정의	중-높음 (장기 진화안)
OpenAI Swarm context_variables	매 호출 명시 dict 전달	단순·명시	추상화 얕음	중 (얇은 라이브러리 빌드 부담)
Anthropic orchestrator-worker	파일시스템 + tool return	병렬, 독립 context	orchestrator 통과 강제	중 (Claude Code 류 작업엔 검증됨)
Slack allowBots + Redis lock	봇 간 메시지 가시화 + 분산 lock	우리 인프라에 직접 fit	Redis 인프라 추가	매우 높음
Blackboard (Hearsay-II → LLM)	공유 blackboard, indirect comm	디커플링, 토큰 절감, 추가 쉬움	트리거·셀렉터 룰 설계 필요	높음 (진화 경로)
Event sourcing + INMS	모든 발화를 event log	감사·재현·비동기	스토리지 비용	중-높음 (이미 Slack이 비슷한 역할)

결론

Pantheon 상황을 다시 정리하면:

*Slack 자체가 이미 event log + 공유 blackboard 역할*을 하고 있다 (스레드).
빠진 것은 **페르소나 봇이 자기 호출 직전에 그 blackboard를 읽도록 만드는 라우터/주입 레이어**다.

따라서 추천 경로는 2단계다:

1단계 — 즉시 적용 (Slack allowBots 패턴 + 호출 직전 thread 주입)

Slack Bolt middleware로 app_mention 수신 시, 봇이 LLM을 부르기 직전에 conversations.replies로 전체 thread를 가져와 system prompt에 *구조화된 컨텍스트*로 inject.
다른 페르소나 메시지는 bot_id/username을 prefix로 붙여 구분.
비용 거의 0, 인프라 추가 없음. 약점: 매 호출 thread 풀 fetch (긴 스레드면 토큰 부담).

2단계 — 진화 경로 (Blackboard 스타일 압축 store)

thread events를 별도 store(Redis stream 또는 Postgres event table)에 *요약된 형태*로 축적.
페르소나 호출 시 *relevant slice*만 retrieval(semantic/recency)로 inject — Anthropic·arxiv 2507.01701 패턴.
토큰 절감 + 페르소나 추가/제거 쉬움.

채택 안 함 (현재 시점):

CrewAI 통째 도입: task-oriented sync 모델이 우리 비동기 대화와 mismatch.
LangGraph 도입: 명시 그래프가 자연 대화엔 과한 abstraction. (장기 옵션으로 유지).
자체 GroupChat Manager 빌드: 라우팅 정책을 Slack mention으로 이미 처리 중이라 중복.

다음 단계 / 관련 티켓

**판단 티켓 후보**: Decision: Pantheon 페르소나 스레드 컨텍스트 자동 주입 — 1단계 즉시 적용 + 2단계 ADR
**1단계 실험 작업 단위**:

1. Slack Bolt middleware에 inject_thread_context 추가 (LLM 호출 직전 fetch & prepend).

2. 봇 prefix 정책 — : prefix로 다른 페르소나 발화 구분.

3. 긴 스레드 절단 규칙 — recent N + 결정/룰 키워드 매칭 슬라이스.

**2단계 ADR 트리거**: 1단계에서 토큰 비용이 채널 평균 응답당 일정선 초과, 또는 페르소나 4개 초과 시.

참고 문헌

[Anthropic — How we built our multi-agent research system](https://www.anthropic.com/news/how-we-built-our-multi-agent-research-system)
[How Anthropic Built a Multi-Agent Research System (ByteByteGo)](https://blog.bytebytego.com/p/how-anthropic-built-a-multi-agent)
[AutoGen — Group Chat design pattern](https://microsoft.github.io/autogen/stable//user-guide/core-user-guide/design-patterns/group-chat.html)
[AutoGen — Selector Group Chat](https://microsoft.github.io/autogen/stable//user-guide/agentchat-user-guide/selector-group-chat.html)
[CrewAI — Memory concept docs](https://docs.crewai.com/en/concepts/memory)
[CrewAI — How we built Cognitive Memory for Agentic Systems](https://crewai.com/blog/how-we-built-cognitive-memory-for-agentic-systems)
[LangChain — Graph API overview](https://docs.langchain.com/oss/python/langgraph/graph-api)
[State Management in LangGraph: Checkpointing and Time Travel](https://rajatpandit.com/agentic-ai/langgraph-state-management-checkpoints/)
[OpenAI Swarm — GitHub](https://github.com/openai/swarm)
[Slack — Context management for AI agents](https://docs.slack.dev/ai/agent-context-management/)
[Redis tutorial — Slack bot with distributed locking](https://redis.io/tutorials/chat-sdk-slackbot-distributed-locking/)
[arxiv 2507.01701 — Exploring Advanced LLM Multi-Agent Systems Based on Blackboard Architecture](https://arxiv.org/abs/2507.01701)
[arxiv 2510.01285 — LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science](https://arxiv.org/abs/2510.01285)
[arxiv 2404.09982 — INMS: Memory Sharing for Large Language Model based Agents](https://arxiv.org/abs/2404.09982)
[Memory in multi-agent systems: technical implementations (Artium.AI)](https://artium.ai/insights/memory-in-multi-agent-systems-technical-implementations)
[Architecting efficient context-aware multi-agent framework for production (Google Dev Blog)](https://developers.googleblog.com/architecting-efficient-context-aware-multi-agent-framework-for-production/)