visible governance for agent work: goals, org charts, budgets, wakeups, tickets, audit.
the 5 core harnesses worth tracking now
this is the working shortlist. it mixes control plane, wrapper, continuity, memory, and cognition on purpose because those are the layers our product must eventually fuse.
shows that a strong engine becomes much better when wrapped in ritual, hooks, roles, teams, and local durable state.
the closest current runtime to “one agent that keeps living after the session dies.”
the memory layer that feels closest to our context-engineering direction: hydration, shared memory, plugins, mcp, and cross-runtime recall.
the strongest current proof trigger for `harness > model`: decomposition, orchestrator-subagent flow, and persistent runtime on hard reasoning tasks.
what each one contributes to the stack
the point is not to choose a winner. the point is to see which missing mechanic each source exposes and which loop it strengthens.
make the control plane visible
strongest proof that approvals, wakeups, issue threads, and governance should be product surface, not invisible plumbing.
the wrapper can beat the raw engine
best proof that workflow rituals, hooks, and team runtime can compound a strong base agent without replacing it.
continuity must outlive the laptop
best current pressure source for memory, messaging, scheduling, and long-lived runtime semantics.
memory should hydrate, not just store
best current pressure source for context packaging, shared recall across runtimes, and memory as living substrate instead of dead archive.
cognition is also a harness problem
best current pressure source for orchestrator-subagent reasoning, decomposition, persistent runtime, and compression over raw context.
what each live harness says right now
every card uses the same contract: thesis, mechanism, community read, transfer move, and rejection rule.
governed orchestration
turns a fleet of agents into a managed operating system instead of a bag of prompts.
- mechanism: org charts, budgets, goals, tickets, wakeups, traceable execution.
- borrow: visible approvals, issue-centric execution, explicit audit.
- reject: zero-human-company metaphysics and company-sim maximalism.
workflow-first augmentation
assumes the engine already works, then upgrades how planning, coordination, and memory behave around it.
- mechanism: hooks, roles, skills, team mode, persistent local runtime layer.
- borrow: codified rituals, agent teams, local state, workflow hardening.
- reject: coding-task bias and feature-surface inflation.
personal continuity runtime
closest current pressure source for a persistent personal agent that spans memory, messaging, schedules, and subagents.
openclaw users who want lower maintenance or better continuity.
- mechanism: persistent memory, user model, gateway surfaces, schedules, skills, parallel subagents.
- borrow: continuity outside sessions, messaging-native runtime, living agent home.
- reject: self-improvement mystique without inspectable learning.
memory as hydration layer
the closest current memory stack to our framing: not only recall, but a distributed context layer across tools and agent runtimes.
- mechanism: memory api, developer plugins, claude integration, mcp bridge, shared recall across runtimes.
- borrow: hydration, memory portability, context packaging, cross-runtime memory layer.
- reject: the idea that memory substrate alone solves judgment, trust, or doctrine.
cognition through orchestration
important because it proves the model is not the full unit of intelligence. the harness can massively change reasoning quality on hard tasks.
- mechanism: orchestrator-subagent architecture, compressed briefs, persistent runtime objects, code-mode reasoning.
- borrow: decomposition, adaptive specialist routing, runtime-as-context, problem-solving harness logic.
- reject: assuming benchmark transfer without local workflow replication.
the sources that should not be mixed into the same bucket
these matter a lot, but they are principle layers or pressure layers, not the same class as the core harnesses.
daniel miessler
not a harness. category language, scaffolding-over-model doctrine, personal ai infrastructure framing.
symbolica
keep this as the principle layer behind arcgentica: runtime-as-context, typed tools, trace graphs, persistent objects, benchmark-driven proof.
openclaw
still huge in community pull, but useful here mainly as maintenance-tax and migration-pressure evidence, not as a clean inspiration source.
openclaw vs hermes vs secondme
this should become a permanent architecture section, because it explains both the market wedge and the trust difference.
| system | what it proved | where it breaks | what secondme changes |
|---|---|---|---|
| openclaw | showed the surface potential of a personal agent that can act across contexts and tools | maintenance tax, weak context management, goal forgetting, low trust in long loops | we shift from infra burden to bounded leverage: inspectable state, better context management, explicit approvals, lower operator pain |
| hermes-agent | showed a stronger continuity runtime and a more ambitious self-improvement story | self-improvement may still be too fuzzy; trust burden rises if learning is not inspectable enough | we care about inspectable memory governance and explicit learning loops tied to user-reported failures and accepted edits |
| secondme | target state: chief-of-staff harness for context management, bounded action, and compounding principal-specific continuity | not yet proven locally across a frozen workflow | must combine control plane, continuity, hydration, cognition, and inspectable self-improvement without inheriting the maintenance tax |
how to run the deep research per harness
each dossier should be comparable. otherwise we will get five different essays and no actual transfer logic.
same shape for every harness
- 1. what it is and who it is for.
- 2. why the community cares, with a distinction between repo traction and real user chatter.
- 3. the architectural primitives: runtime, memory, permissions, traces, orchestration.
- 4. the self-improvement loop: does it learn, how, and can we inspect it.
- 5. what secondme should steal, what it should reject, and what local spike tests transfer.
- 6. what evidence would promote it into doctrine instead of keeping it as pressure source only.
run these first
- now: `paperclip`, `hermes-agent`, `arcgentica`.
- next: `oh-my-codex`, `supermemory`.
- watch-only: `openclaw`, `symbolica`, `miessler`.
what this research is already saying
the point is not only to compare tools. the point is to extract the architecture claims strong enough to carry into the white paper.
the control plane is part of trust
paperclip is the strongest proof that tickets, approvals, wakeups, and traces should be visible product surface, not hidden backend machinery.
continuity beats session sharpness
hermes-agent and supermemory both point in the same direction: the user experiences value when the system starts from living context instead of cold prompts.
cognition is also a harness problem
arcgentica is the strongest current proof trigger that better decomposition, specialist routing, and runtime state can change reasoning quality materially.
self-improvement needs inspection
hermes-agent is useful precisely because it pressures this question: does the system improve only by writing skills, or by safely changing how it works over time?
memory is not just retrieval
supermemory is most useful where it distinguishes temporal memory and profile hydration from plain document search or generic rag.
maintenance tax kills the dream
openclaw still matters because it shows the gap between perceived potential and lived operator burden. that is a wedge, not a side note.
what secondme should actually take
the product should not clone any one source. it should take one hard mechanic from each winning layer.
| source | strongest mechanic | what we borrow | what we reject |
|---|---|---|---|
| paperclip control plane | ticketed, audited, wakeup-driven execution | visible approvals, traces, issue documents, governance | company-sim maximalism and zero-human rhetoric |
| oh-my-codex wrapper layer | workflow ritual around a strong base engine | hooks, roles, skills, plan discipline, team runtime | coding-shaped surface area as product identity |
| hermes-agent continuity layer | one runtime that survives across surfaces | persistent state, messaging surfaces, schedules, skills | fuzzy self-improvement story without inspectability |
| supermemory memory substrate | portable shared memory and context hydration across runtimes | hydration, portability, context packaging, plugin ecosystem | equating memory layer with full judgment layer |
| arcgentica cognition layer | decomposition and persistent-runtime problem solving | compressed specialist briefs, orchestration, runtime-as-context | assuming benchmark wins transfer without workflow replication |
repo metrics checked on 2026-04-09. community reads are a mix of github traction, official docs/sites, local secondme notes, and sampled public threads around `openclaw -> hermes`, `supermemory`, and `arcgentica` benchmark discourse.