secondme Harness Research Dashboard

core dashboard

the 5 core harnesses worth tracking now

this is the working shortlist. it mixes control plane, wrapper, continuity, memory, and cognition on purpose because those are the layers our product must eventually fuse.

control plane

paperclip

visible governance for agent work: goals, org charts, budgets, wakeups, tickets, audit.

transfer fit high 50.2k stars breakout

workflow wrapper

oh-my-codex

shows that a strong engine becomes much better when wrapped in ritual, hooks, roles, teams, and local durable state.

transfer fit high 19.3k stars docs-led

continuity runtime

hermes-agent

the closest current runtime to “one agent that keeps living after the session dies.”

continuity high 39.3k stars migration pull

memory substrate

supermemory

the memory layer that feels closest to our context-engineering direction: hydration, shared memory, plugins, mcp, and cross-runtime recall.

memory pressure 21.6k stars ecosystem

cognitive harness

arcgentica

the strongest current proof trigger for `harness > model`: decomposition, orchestrator-subagent flow, and persistent runtime on hard reasoning tasks.

cognition pressure 36% arc result repo signal weak

signals

what each one contributes to the stack

the point is not to choose a winner. the point is to see which missing mechanic each source exposes and which loop it strengthens.

paperclip

make the control plane visible

strongest proof that approvals, wakeups, issue threads, and governance should be product surface, not invisible plumbing.

oh-my-codex

the wrapper can beat the raw engine

best proof that workflow rituals, hooks, and team runtime can compound a strong base agent without replacing it.

hermes-agent

continuity must outlive the laptop

best current pressure source for memory, messaging, scheduling, and long-lived runtime semantics.

supermemory

memory should hydrate, not just store

best current pressure source for context packaging, shared recall across runtimes, and memory as living substrate instead of dead archive.

arcgentica

cognition is also a harness problem

best current pressure source for orchestrator-subagent reasoning, decomposition, persistent runtime, and compression over raw context.

research cards

what each live harness says right now

every card uses the same contract: thesis, mechanism, community read, transfer move, and rejection rule.

paperclip

governed orchestration

turns a fleet of agents into a managed operating system instead of a bag of prompts.

created 2026-03-02 50.2k stars typescript

community read breakout github signal is real. in this repo it is already more than inspiration because we use it for eval and writing loops.

mechanism: org charts, budgets, goals, tickets, wakeups, traceable execution.
borrow: visible approvals, issue-centric execution, explicit audit.
reject: zero-human-company metaphysics and company-sim maximalism.

trust fit high transfer fit high maintenance medium

repo site docs

oh-my-codex

workflow-first augmentation

assumes the engine already works, then upgrades how planning, coordination, and memory behave around it.

created 2026-02-02 19.3k stars typescript

community read the signal is mostly github and docs quality, not broad reddit chatter. that is fine. it still proves the runtime wrapper thesis.

mechanism: hooks, roles, skills, team mode, persistent local runtime layer.
borrow: codified rituals, agent teams, local state, workflow hardening.
reject: coding-task bias and feature-surface inflation.

transfer fit high trust fit medium maintenance low

repo site

hermes-agent

personal continuity runtime

closest current pressure source for a persistent personal agent that spans memory, messaging, schedules, and subagents.

created 2025-07-22 39.3k stars python

community read strong repo pull plus visible migration chatter from openclaw users who want lower maintenance or better continuity.

mechanism: persistent memory, user model, gateway surfaces, schedules, skills, parallel subagents.
borrow: continuity outside sessions, messaging-native runtime, living agent home.
reject: self-improvement mystique without inspectable learning.

continuity high transfer fit medium trust risk medium

repo docs

supermemory

memory as hydration layer

the closest current memory stack to our framing: not only recall, but a distributed context layer across tools and agent runtimes.

created 2024-02-27 21.6k stars typescript

community read good repo traction plus an integration ecosystem around Claude, MCP, and OpenClaw. useful because it is not only a memory api, but a context-distribution posture.

mechanism: memory api, developer plugins, claude integration, mcp bridge, shared recall across runtimes.
borrow: hydration, memory portability, context packaging, cross-runtime memory layer.
reject: the idea that memory substrate alone solves judgment, trust, or doctrine.

memory fit high trust fit medium maintenance low

repo site docs

arcgentica

cognition through orchestration

important because it proves the model is not the full unit of intelligence. the harness can massively change reasoning quality on hard tasks.

created 2026-02-12 176 stars python

community read weak github traction on the repo itself, but strong conceptual weight because of the benchmark result and the architectural lesson attached to it.

mechanism: orchestrator-subagent architecture, compressed briefs, persistent runtime objects, code-mode reasoning.
borrow: decomposition, adaptive specialist routing, runtime-as-context, problem-solving harness logic.
reject: assuming benchmark transfer without local workflow replication.

cognition fit high loop fit medium transfer proof low

repo blog

forgotten but relevant

the sources that should not be mixed into the same bucket

these matter a lot, but they are principle layers or pressure layers, not the same class as the core harnesses.

boundary source

daniel miessler

not a harness. category language, scaffolding-over-model doctrine, personal ai infrastructure framing.

source layer doctrine high

boundary source

symbolica

keep this as the principle layer behind arcgentica: runtime-as-context, typed tools, trace graphs, persistent objects, benchmark-driven proof.

architecture pressure runtime state

anti-pattern

openclaw

still huge in community pull, but useful here mainly as maintenance-tax and migration-pressure evidence, not as a clean inspiration source.

maintenance tax 352.5k stars

architecture contrast

openclaw vs hermes vs secondme

this should become a permanent architecture section, because it explains both the market wedge and the trust difference.

system	what it proved	where it breaks	what secondme changes
openclaw	showed the surface potential of a personal agent that can act across contexts and tools	maintenance tax, weak context management, goal forgetting, low trust in long loops	we shift from infra burden to bounded leverage: inspectable state, better context management, explicit approvals, lower operator pain
hermes-agent	showed a stronger continuity runtime and a more ambitious self-improvement story	self-improvement may still be too fuzzy; trust burden rises if learning is not inspectable enough	we care about inspectable memory governance and explicit learning loops tied to user-reported failures and accepted edits
secondme	target state: chief-of-staff harness for context management, bounded action, and compounding principal-specific continuity	not yet proven locally across a frozen workflow	must combine control plane, continuity, hydration, cognition, and inspectable self-improvement without inheriting the maintenance tax

dossier plan

how to run the deep research per harness

each dossier should be comparable. otherwise we will get five different essays and no actual transfer logic.

mandatory dossier contract

same shape for every harness

1. what it is and who it is for.
2. why the community cares, with a distinction between repo traction and real user chatter.
3. the architectural primitives: runtime, memory, permissions, traces, orchestration.
4. the self-improvement loop: does it learn, how, and can we inspect it.
5. what secondme should steal, what it should reject, and what local spike tests transfer.
6. what evidence would promote it into doctrine instead of keeping it as pressure source only.

recommended order

run these first

now: `paperclip`, `hermes-agent`, `arcgentica`.
next: `oh-my-codex`, `supermemory`.
watch-only: `openclaw`, `symbolica`, `miessler`.

why this order first resolve control plane, continuity, and cognition. then refine wrapper ritual and memory hydration.

white paper use

what this research is already saying

the point is not only to compare tools. the point is to extract the architecture claims strong enough to carry into the white paper.

claim 1

the control plane is part of trust

paperclip is the strongest proof that tickets, approvals, wakeups, and traces should be visible product surface, not hidden backend machinery.

claim 2

continuity beats session sharpness

hermes-agent and supermemory both point in the same direction: the user experiences value when the system starts from living context instead of cold prompts.

claim 3

cognition is also a harness problem

arcgentica is the strongest current proof trigger that better decomposition, specialist routing, and runtime state can change reasoning quality materially.

claim 4

self-improvement needs inspection

hermes-agent is useful precisely because it pressures this question: does the system improve only by writing skills, or by safely changing how it works over time?

claim 5

memory is not just retrieval

supermemory is most useful where it distinguishes temporal memory and profile hydration from plain document search or generic rag.

claim 6

maintenance tax kills the dream

openclaw still matters because it shows the gap between perceived potential and lived operator burden. that is a wedge, not a side note.

comparison map

what secondme should actually take

the product should not clone any one source. it should take one hard mechanic from each winning layer.

source	strongest mechanic	what we borrow	what we reject
paperclip control plane	ticketed, audited, wakeup-driven execution	visible approvals, traces, issue documents, governance	company-sim maximalism and zero-human rhetoric
oh-my-codex wrapper layer	workflow ritual around a strong base engine	hooks, roles, skills, plan discipline, team runtime	coding-shaped surface area as product identity
hermes-agent continuity layer	one runtime that survives across surfaces	persistent state, messaging surfaces, schedules, skills	fuzzy self-improvement story without inspectability
supermemory memory substrate	portable shared memory and context hydration across runtimes	hydration, portability, context packaging, plugin ecosystem	equating memory layer with full judgment layer
arcgentica cognition layer	decomposition and persistent-runtime problem solving	compressed specialist briefs, orchestration, runtime-as-context	assuming benchmark wins transfer without workflow replication

repo metrics checked on 2026-04-09. community reads are a mix of github traction, official docs/sites, local secondme notes, and sampled public threads around `openclaw -> hermes`, `supermemory`, and `arcgentica` benchmark discourse.

five core harnesses. two principle sources. one pressure map.

the 5 core harnesses worth tracking now

what each one contributes to the stack

make the control plane visible

the wrapper can beat the raw engine

continuity must outlive the laptop

memory should hydrate, not just store

cognition is also a harness problem

what each live harness says right now

governed orchestration

workflow-first augmentation

personal continuity runtime

memory as hydration layer

cognition through orchestration

the sources that should not be mixed into the same bucket

daniel miessler

symbolica

openclaw

openclaw vs hermes vs secondme

how to run the deep research per harness

same shape for every harness

run these first

what this research is already saying

the control plane is part of trust

continuity beats session sharpness

cognition is also a harness problem

self-improvement needs inspection

memory is not just retrieval

maintenance tax kills the dream

what secondme should actually take