secondme strategy hub
harness research dashboard · updated 2026-04-09 · aligned on five core harnesses

five core harnesses. two principle sources. one pressure map.

the old page was too flat. this one now matches the current line: paperclip, oh-my-codex, hermes-agent, supermemory, and arcgentica as the five core harnesses to study.

core dashboard

the 5 core harnesses worth tracking now

this is the working shortlist. it mixes control plane, wrapper, continuity, memory, and cognition on purpose because those are the layers our product must eventually fuse.

workflow wrapper
oh-my-codex

shows that a strong engine becomes much better when wrapped in ritual, hooks, roles, teams, and local durable state.

transfer fit high 19.3k stars docs-led
continuity runtime
hermes-agent

the closest current runtime to “one agent that keeps living after the session dies.”

continuity high 39.3k stars migration pull
memory substrate
supermemory

the memory layer that feels closest to our context-engineering direction: hydration, shared memory, plugins, mcp, and cross-runtime recall.

memory pressure 21.6k stars ecosystem
cognitive harness
arcgentica

the strongest current proof trigger for `harness > model`: decomposition, orchestrator-subagent flow, and persistent runtime on hard reasoning tasks.

cognition pressure 36% arc result repo signal weak
signals

what each one contributes to the stack

the point is not to choose a winner. the point is to see which missing mechanic each source exposes and which loop it strengthens.

paperclip

make the control plane visible

strongest proof that approvals, wakeups, issue threads, and governance should be product surface, not invisible plumbing.

oh-my-codex

the wrapper can beat the raw engine

best proof that workflow rituals, hooks, and team runtime can compound a strong base agent without replacing it.

hermes-agent

continuity must outlive the laptop

best current pressure source for memory, messaging, scheduling, and long-lived runtime semantics.

supermemory

memory should hydrate, not just store

best current pressure source for context packaging, shared recall across runtimes, and memory as living substrate instead of dead archive.

arcgentica

cognition is also a harness problem

best current pressure source for orchestrator-subagent reasoning, decomposition, persistent runtime, and compression over raw context.

research cards

what each live harness says right now

every card uses the same contract: thesis, mechanism, community read, transfer move, and rejection rule.

paperclip

governed orchestration

turns a fleet of agents into a managed operating system instead of a bag of prompts.

created 2026-03-02 50.2k stars typescript
community read breakout github signal is real. in this repo it is already more than inspiration because we use it for eval and writing loops.
  • mechanism: org charts, budgets, goals, tickets, wakeups, traceable execution.
  • borrow: visible approvals, issue-centric execution, explicit audit.
  • reject: zero-human-company metaphysics and company-sim maximalism.
trust fit high transfer fit high maintenance medium
oh-my-codex

workflow-first augmentation

assumes the engine already works, then upgrades how planning, coordination, and memory behave around it.

created 2026-02-02 19.3k stars typescript
community read the signal is mostly github and docs quality, not broad reddit chatter. that is fine. it still proves the runtime wrapper thesis.
  • mechanism: hooks, roles, skills, team mode, persistent local runtime layer.
  • borrow: codified rituals, agent teams, local state, workflow hardening.
  • reject: coding-task bias and feature-surface inflation.
transfer fit high trust fit medium maintenance low
hermes-agent

personal continuity runtime

closest current pressure source for a persistent personal agent that spans memory, messaging, schedules, and subagents.

created 2025-07-22 39.3k stars python
community read strong repo pull plus visible migration chatter from openclaw users who want lower maintenance or better continuity.
  • mechanism: persistent memory, user model, gateway surfaces, schedules, skills, parallel subagents.
  • borrow: continuity outside sessions, messaging-native runtime, living agent home.
  • reject: self-improvement mystique without inspectable learning.
continuity high transfer fit medium trust risk medium
supermemory

memory as hydration layer

the closest current memory stack to our framing: not only recall, but a distributed context layer across tools and agent runtimes.

created 2024-02-27 21.6k stars typescript
community read good repo traction plus an integration ecosystem around Claude, MCP, and OpenClaw. useful because it is not only a memory api, but a context-distribution posture.
  • mechanism: memory api, developer plugins, claude integration, mcp bridge, shared recall across runtimes.
  • borrow: hydration, memory portability, context packaging, cross-runtime memory layer.
  • reject: the idea that memory substrate alone solves judgment, trust, or doctrine.
memory fit high trust fit medium maintenance low
arcgentica

cognition through orchestration

important because it proves the model is not the full unit of intelligence. the harness can massively change reasoning quality on hard tasks.

created 2026-02-12 176 stars python
community read weak github traction on the repo itself, but strong conceptual weight because of the benchmark result and the architectural lesson attached to it.
  • mechanism: orchestrator-subagent architecture, compressed briefs, persistent runtime objects, code-mode reasoning.
  • borrow: decomposition, adaptive specialist routing, runtime-as-context, problem-solving harness logic.
  • reject: assuming benchmark transfer without local workflow replication.
cognition fit high loop fit medium transfer proof low
forgotten but relevant

the sources that should not be mixed into the same bucket

these matter a lot, but they are principle layers or pressure layers, not the same class as the core harnesses.

boundary source

daniel miessler

not a harness. category language, scaffolding-over-model doctrine, personal ai infrastructure framing.

source layer doctrine high
boundary source

symbolica

keep this as the principle layer behind arcgentica: runtime-as-context, typed tools, trace graphs, persistent objects, benchmark-driven proof.

architecture pressure runtime state
anti-pattern

openclaw

still huge in community pull, but useful here mainly as maintenance-tax and migration-pressure evidence, not as a clean inspiration source.

maintenance tax 352.5k stars
architecture contrast

openclaw vs hermes vs secondme

this should become a permanent architecture section, because it explains both the market wedge and the trust difference.

system what it proved where it breaks what secondme changes
openclaw showed the surface potential of a personal agent that can act across contexts and tools maintenance tax, weak context management, goal forgetting, low trust in long loops we shift from infra burden to bounded leverage: inspectable state, better context management, explicit approvals, lower operator pain
hermes-agent showed a stronger continuity runtime and a more ambitious self-improvement story self-improvement may still be too fuzzy; trust burden rises if learning is not inspectable enough we care about inspectable memory governance and explicit learning loops tied to user-reported failures and accepted edits
secondme target state: chief-of-staff harness for context management, bounded action, and compounding principal-specific continuity not yet proven locally across a frozen workflow must combine control plane, continuity, hydration, cognition, and inspectable self-improvement without inheriting the maintenance tax
dossier plan

how to run the deep research per harness

each dossier should be comparable. otherwise we will get five different essays and no actual transfer logic.

mandatory dossier contract

same shape for every harness

  • 1. what it is and who it is for.
  • 2. why the community cares, with a distinction between repo traction and real user chatter.
  • 3. the architectural primitives: runtime, memory, permissions, traces, orchestration.
  • 4. the self-improvement loop: does it learn, how, and can we inspect it.
  • 5. what secondme should steal, what it should reject, and what local spike tests transfer.
  • 6. what evidence would promote it into doctrine instead of keeping it as pressure source only.
recommended order

run these first

  • now: `paperclip`, `hermes-agent`, `arcgentica`.
  • next: `oh-my-codex`, `supermemory`.
  • watch-only: `openclaw`, `symbolica`, `miessler`.
why this order first resolve control plane, continuity, and cognition. then refine wrapper ritual and memory hydration.
white paper use

what this research is already saying

the point is not only to compare tools. the point is to extract the architecture claims strong enough to carry into the white paper.

claim 1

the control plane is part of trust

paperclip is the strongest proof that tickets, approvals, wakeups, and traces should be visible product surface, not hidden backend machinery.

claim 2

continuity beats session sharpness

hermes-agent and supermemory both point in the same direction: the user experiences value when the system starts from living context instead of cold prompts.

claim 3

cognition is also a harness problem

arcgentica is the strongest current proof trigger that better decomposition, specialist routing, and runtime state can change reasoning quality materially.

claim 4

self-improvement needs inspection

hermes-agent is useful precisely because it pressures this question: does the system improve only by writing skills, or by safely changing how it works over time?

claim 5

memory is not just retrieval

supermemory is most useful where it distinguishes temporal memory and profile hydration from plain document search or generic rag.

claim 6

maintenance tax kills the dream

openclaw still matters because it shows the gap between perceived potential and lived operator burden. that is a wedge, not a side note.

comparison map

what secondme should actually take

the product should not clone any one source. it should take one hard mechanic from each winning layer.

source strongest mechanic what we borrow what we reject
paperclip control plane ticketed, audited, wakeup-driven execution visible approvals, traces, issue documents, governance company-sim maximalism and zero-human rhetoric
oh-my-codex wrapper layer workflow ritual around a strong base engine hooks, roles, skills, plan discipline, team runtime coding-shaped surface area as product identity
hermes-agent continuity layer one runtime that survives across surfaces persistent state, messaging surfaces, schedules, skills fuzzy self-improvement story without inspectability
supermemory memory substrate portable shared memory and context hydration across runtimes hydration, portability, context packaging, plugin ecosystem equating memory layer with full judgment layer
arcgentica cognition layer decomposition and persistent-runtime problem solving compressed specialist briefs, orchestration, runtime-as-context assuming benchmark wins transfer without workflow replication

repo metrics checked on 2026-04-09. community reads are a mix of github traction, official docs/sites, local secondme notes, and sampled public threads around `openclaw -> hermes`, `supermemory`, and `arcgentica` benchmark discourse.