Pantheon durable replay disabled incident wrap-up
Summary
TL;DR
2026-06-20 Jini 일정 조정 스레드에서 Calendar 변경은 완료됐지만 Slack 최종 보고가 누락됐다. 원인은 calendar tool 자체가 아니라 Asurada BrokenPipeError cluster로 bridge가 재시작됐고, 운영 DATABASE_URL 미설정 때문에 해당 Jini turn이 durable replay DB에 기록되지 않은 것이다.
Incident
- Slack thread:
C0B5KLB7DHR, parent1781959782.542359. - 2026-06-20 22:50 KST: 사용자가 Jini에게 일정 이동 이벤트를 별도 생성하고 17:30-19:00 본 면접은 건드리지 말라고 정정.
- Jini는 progress/background notice를 남겼고 Calendar side effect는 완료됐다.
- Slack 최종 완료 보고는 누락됐다.
- 22:53 KST 근처 Asurada socket
BrokenPipeErrorcluster가 bridge early exit을 유발했다. - 재시작 후 로그:
[durable_replay] no running turns to replay.
Root Cause
durable_replay는 pantheon_turns에서 running 상태 row를 claim해야 작동한다. 운영 환경에는 DATABASE_URL이 없어 NoopTurnStore가 선택됐고, turn row 자체가 기록되지 않았다.
| Check | Result |
|---|---|
live bridge.py process env | DATABASE_URL 없음 |
| launchd env | DATABASE_URL 없음 |
pantheon/.env after load | DATABASE_URL 없음 |
| venv dependency | psycopg 설치됨 |
| live Postgres connection | 5432 connection 없음 |
turn_store.get_turn_store() | NoopTurnStore |
Fix Shipped
Deployment
- main fast-forward to
27e09ac. - live
bridge.pychild restarted through existingrun.shwrapper. - old PID
90075, new PID94310. - health endpoint: 6/6 personas active.
- startup log confirmed:
[turn_store] DATABASE_URL unset; durable turn store disabled.
Follow-Up
- Configure real production
DATABASE_URLand ensure Postgres is running. - Restart bridge and confirm startup no longer logs durable store disabled.
- Trigger or simulate a long-running turn, restart bridge, and verify
pantheon_turnshasrunning -> replay_queuedrecovery behavior. - Consider owner-facing admin/DM alert if durable turn store is disabled in production, not just log warning.
Notes For Future Agents
When investigating durable_replay no running turns to replay, do not assume there were no in-flight turns. First check whether turn_store is active:
cd ~/Workspace/pantheon
./venv/bin/python - <<'PY'
import os
from dotenv import load_dotenv
load_dotenv(".env", override=True)
import turn_store
print(bool(os.environ.get("DATABASE_URL")))
print(turn_store.get_turn_store().__class__.__name__)
PY
If this prints False and NoopTurnStore, durable replay is disabled by configuration.
Links
- Linear: HAN-727
- PR: PR #286
- Extends: Pantheon audit vs durable replay
- Extends: ADR-004 blackboard context store
- Related: Pantheon redeploy mechanism