Agent Harness Survey — 학습 리소스 인덱스

Agent Harness Engineering: A Survey 를 본격 공부할 때 함께 보면 좋은 1차/2차 자료 모음.

1. 1차 자료 (Primary Sources)

종류	링크	비고
Project page	https://picrew.github.io/LLM-Harness/	단일 페이지 요약 + 핵심 다이어그램
PDF (project)	https://picrew.github.io/LLM-Harness/main.pdf	본문 PDF
PDF (OpenReview)	https://openreview.net/pdf?id=eONq7FdiHa	동일 PDF, 인용 가능
OpenReview forum	https://openreview.net/forum?id=eONq7FdiHa	토론·메타데이터
GitHub (paper repo)	https://github.com/Picrew/LLM-Harness	docs/ + README, MIT/CC BY-SA 4.0
Awesome list	https://github.com/Picrew/awesome-agent-harness	207 entries, 9 categories, 87.9% GitHub-backed (2026-05-25 verified)
HF dataset	https://huggingface.co/datasets/ChenLiu1996/Agent-Harness-Engineering	코딩된 프로젝트 데이터셋
BibTeX	`li2026agentharness`	OpenReview PDF URL 인용

207개 항목을 ETCLOVG에 맞춰 다시 보면:

Awesome 카테고리	Entries	ETCLOVG 매핑
Harness Architecture & Orchestration	27	L
Context & Working-State Engineering	10	C
Execution Substrates & Sandboxing	23	E
Protocols, Tool Interfaces & Agent Contracts	14	T
Evaluation Harnesses & Benchmarks	24	V
Observability & Reliability Operations	14	O
Guardrails, Security & Governance	16	G
Reference Harness Implementations	50	(혼합)
Essential Readings & Ecosystem Maps	29	메타

카테고리는 awesome list 분류 / 서베이 본문 통계는 “primary layer” 카운트 기준(서로 정의가 약간 다름).

프로덕션 회사의 1차 글:

Anthropic — Scaling Managed Agents (“brain ↔ hands decoupling”) — meta-harness architecture for long-horizon agents
Anthropic — Claude Code auto mode — classifier-backed approval delegation
OpenAI — Harness engineering field report — building reliable agent-first software via harness constraints + verification
Anthropic — Building Effective AI Agents — workflows vs autonomous agents
Anthropic — Writing effective tools for AI agents — tool interface design
Anthropic — Effective harnesses for long-running agents — state / resumability / reliability
Anthropic — Harness design for long-running application development — follow-up
LangChain — Improving Deep Agents with harness engineering — 하네스만 바꿔도 +13.7점 증거
LangChain — Evaluating Deep Agents: Our Learnings — stateful 장기 에이전트 평가
Inngest — Your Agent Needs a Harness, Not a Framework — reliability-first 인프라

Era	Systems
2022–2023 (ReAct)	ReAct, AutoGPT, BabyAGI
2023–2024 (tool & multi-agent)	Gorilla, ToolLLM, Toolformer · CAMEL, ChatDev, MetaGPT, Mixture-of-Agents
2023–2024 (benchmarks)	SWE-bench, AgentBench, WebArena, GAIA
2024–2025 (protocols)	MCP, A2A
2025–2026 (harness era)	LangChain Deep Agents, Anthropic Managed Agents, OpenAI harness engineering, Picrew survey

노트북: Agent Harness Engineering — A Survey (2026)
Notebook ID: 8e64ee73-5358-4ec2-a8d1-3b8d322c5228
PDF 사본 (vault): assets/papers/2026-li-agent-harness-engineering-survey.pdf (3.3MB)

추가된 소스 (4개):

Source	Type	ID
`2026-li-agent-harness-engineering-survey.pdf`	file	`8bae14d0-be73-463c-b5e1-20f61d76bb78`
Project page (picrew.github.io/LLM-Harness)	url	`7f808eff-f067-473d-8213-8760404285c8`
Awesome list (Picrew/awesome-agent-harness)	url	`4b27d8ee-3a8e-426e-8d20-64ac88d16072`
OpenReview forum	url	`0c6e2baa-003e-4759-bde2-f231d65e9af3`

처음 던지면 좋은 질문 (한국어):

“ETCLOVG와 기존 6-component framework들의 정확한 차이를 표로 정리해줘. 어떤 framework들이 비교 대상인가?”
“harness coupling problem을 보여주는 구체 사례를 본문에서 직접 인용으로 가져와줘.”
“Open Problem 2 (reliable state in long-running agents)가 권장하는 durable artifacts 패턴의 실제 구현 사례가 본문에 있나?”
“trace-native failure diagnosis가 가장 잘 구현된 오픈소스 프로젝트가 어떤 것이라고 본문에서 평가하나?”
“Anthropic의 ‘managed agents’ 글이 ETCLOVG 중 어느 레이어들과 가장 강하게 연결되는가?”
“이 서베이가 ‘agent framework → agent platform’ 이동을 말하는 부분을 그대로 인용해줘.”

자동화가 다시 가능해지면:

nlm login            # 재인증
# 그 후 Claude에게: "NotebookLM에 노트북 만들고 assets/papers/2026-li-...pdf 업로드해줘"