← All essays· Agent Memory

VikingBot: When Agent Memory Starts Changing Strategy

A bilingual rewrite of the VikingBot werewolf experiment, showing how OpenViking turns multi-agent chat history into durable, inspectable, strategy-changing memory.

The werewolf demo is useful because it turns agent memory into something you can watch. Once VikingBot players can carry history across games, they stop acting like isolated chatbots: they remember styles, reuse incidents, hide roles with evidence, and punish old patterns.

The Experiment: Six Agents, Two Memory Conditions

The setup deliberately keeps the game small. Six VikingBot players join one werewolf table. Players 1, 2, and 3 are connected to OpenViking and keep long-term cross-game memory. Players 4, 5, and 6 only see the current game context.

GroupPlayersMemory condition
ExperimentVikingBot players 1, 2, and 3OpenViking memory, available across games
ControlVikingBot players 4, 5, and 6Only short-term context inside the current game
Werewolf demo showing six player bots and the god bot control panel
The god bot initializes the game, writes each player identity into private GAME.md files, and keeps public flow inside the group chat.

This matters because the game has two channels of truth. Public speech happens in the group chat, while hidden role and night-action information lives in each player workspace. That split is close to real agent products: a platform route delivers messages, but the agent still needs private workspace state and durable context.

God bot asks player 3 to complete a night action in GAME.md
The god bot sends uniform public prompts, while sensitive choices such as kill targets stay in the player file.
OpenViking memory records game history and player style after a round
After each game, the detailed conversation, votes, notable moves, and player styles are committed into OpenViking.

What Changed Across Rounds

Round 1: the bots learn the table

The first game is noisy. The god bot walks players through night actions; then the sheriff race turns into a role-claiming contest. Player 1 bluffs as hunter, player 2 hides a real hunter identity behind a prophet claim, and player 3 reveals enough prophet logic to win trust.

Night flow where the god bot asks each player to read and update GAME.md
The first night exercises the delivery and private-state mechanics before any long-term memory can help.
Sheriff campaign with several bots role-claiming
Early speech is tactical but not yet historical: the bots reason from current claims rather than cross-game evidence.
End of the first game where wolves win quickly
The first game ends quickly after key good-side roles are removed, but it gives OpenViking material to capture.
Memory profile for player 1 after the first game
The important write is not the full transcript. OpenViking distills reusable claims such as style, tendencies, incidents, and strategy outcomes.

Round 2: memory becomes usable evidence

By the second round, the OpenViking-backed players start acting on previous-game facts. One player hides a true prophet identity because a previous early role reveal got punished. Another recognizes player 3 as the forceful hunter-style speaker from history. A werewolf even uses a previous civilian event to defend a fake prophet claim.

Player 1 hides a real prophet identity after learning from a previous game
The behavior change is concrete: memory changes when to reveal, when to abstain, and how to survive a chaotic sheriff race.
Player 2 references player 3 style from memory
Style memory lets a bot treat a current speech as part of a player pattern, not just as one isolated utterance.
Player 1 memory shifts from aggressive role-claiming to cautious hiding
The profile itself evolves: a failed or successful tactic becomes future steering context.
Player 3 uses a previous event as cover while bluffing
This is the first dangerous part of long-term memory: it can support better reasoning, but it also supports better deception.
Another screenshot from the round where historical events support a fake role claim
The current claim becomes more persuasive because it is attached to a remembered incident.

Round 3: profiles turn into strategy

After multiple games, memory no longer looks like a note-taking feature. It becomes a strategic asset. The bots remember who pushes hard, who bluffs, who tends to trust specific lines of reasoning, and which endgame moves failed before.

A wolf almost wins by reusing the hidden-role survival strategy
A wolf can reuse a previous survival tactic, win trust, and still lose because a final vote hits the hunter trigger.
Player 3 keeps reinforcing a forceful hunter persona
A repeated style becomes an identity signal. Other bots can learn it; the player bot can also lean into it.
Historical memory helps question player 4 role claims
History makes suspicion more targeted: a bot can challenge a role claim because this player has made similar unstable claims before.
Win-rate curve during memory initialization
During the initial memory collection phase, win rates remain close enough that memory has not yet separated the players.
Win-rate curve after memory collection
After memory accumulates, OpenViking-backed bots show a visible win-rate lift in the reference experiment.
The visible behavior is not “the bot remembers the transcript.” It is that remembered incidents start changing risk, trust, role claims, and endgame strategy.
— Why the demo is interesting

How OpenViking Turns Chat Into Agent-Usable Memory

VikingBot works because OpenViking is not a raw transcript bucket. It gives the agent a filesystem-like context surface, a staged retrieval model, and memory types that separate player identity, user preference, incidents, cases, tools, and skills.

Memory typeTypical pathMeaning
soulagent/memories/soul.mdCore truths, boundaries, style, and continuity for the agent.
identityagent/memories/identity.mdName, persona, role, and stable presentation details.
casesagent/memories/cases/Problem-to-solution case memories. This is where repeated fixes and operational lessons become reusable.
patternsagent/memories/patterns/Workflow and methodology memories.
tools / skillsagent/memories/tools/Tool usage, skill execution, success rate, and best-practice hints.
profile / preferencesuser/memories/profile.mdUser profile, preferences, entities, and event history.

The werewolf demo uses the same separation in a playful setting. A player has GAME.md for private current-round state, SOUL.md for behavioral rules, and OpenViking memories for durable history. In a coding-agent product, the parallel is a CLAUDE.md or AGENTS.md style instruction file plus durable memory that can survive beyond one repository or one terminal session.

OpenViking memory extraction ReAct flow
OpenViking prefetches existing memory URIs, lets the model decide what to read, and then writes new or updated memory through a patch-like flow.
retrieved-memory.xmlxml
<memory index="1" type="summary">  <uri>viking://user/player_4/memories/events/2026/04/13/sheriff-campaign.md</uri>  <content>Player 4 has challenged sheriff-campaign claims before and tends to mark vague role claims as suspicious.</content></memory> <memory index="2" type="entity">  <uri>viking://user/player_3/memories/entities/game-character.md</uri>  <content>Player 3 often speaks forcefully, jumps into the sheriff race, and uses hunter identity pressure when the table is chaotic.</content></memory>

L0 / L1 / L2 is the token discipline behind this. The agent starts from summaries and URIs; only when it needs proof does it read the full L2 content. That is why the system can keep long-term memory useful without dragging an entire history into every prompt.

Semantic memory filenames inside OpenViking
Semantic filenames are part of the interface. A file path can tell the model whether it is looking at an event, a player entity, a tool lesson, or a case.

The Same Pattern Shows Up in Claude Code and Case Memories

The Claude Code memory plugin is the non-game version of the same idea. Local files such as CLAUDE.md, AGENTS.md, or MEMORY.md are still useful: they are close to the workspace and easy for a human to edit. OpenViking adds the layer those files do not solve well: semantic recall across projects, automatic capture after turns, compaction-safe handoff, and on-demand MCP tools for search, read, store, list, grep, and forget. Claude Code memory plugin / MCP guide

LayerWhat it should holdWhat should not happen
Local instruction fileProject-specific rules, code style, commands, and team conventions.Do not turn it into an unbounded diary.
OpenViking memoryDistilled facts, preferences, incidents, cases, and reusable patterns.Do not blindly upload every raw transcript back into recall.
MCP toolsExplicit search, read, store, delete, and health operations over viking:// resources.Do not leak server credentials into browser or public repo surfaces.

Case memories are especially important. A case is not just “a fact about the user.” It captures a problem, what was tried, what finally worked, and why it worked. In the werewolf demo, a case is a failed or successful play. In software work, a case can be a production incident, a tricky API integration, or a reviewer preference that changes future PRs.

Evaluation: Accuracy Rises, Token Cost Falls

The reference article also reports LoCoMo long-context dialogue results. Native OpenClaw reaches roughly 24% accuracy. OpenClaw with OpenViking Plugin 2.0 reaches roughly 80% with far lower token use. VikingBot reaches the same accuracy band while cutting token cost further.

SystemAccuracyToken cost
Native OpenClawAbout 24% (+/- 3%)About 390M
OpenClaw + OpenViking Plugin 2.0About 80% (+/- 3%)About 35M
VikingBotAbout 80% (+/- 3%)About 21M

The lesson is not “remember everything.” The lesson is that retrieval must be selective, layered, and inspectable. A good memory system should make the next prompt smaller and more grounded, not larger and more mysterious.

Production Use Cases: Tenancy, Channels, and Governance

The demo uses six game players, but the production version of the problem is broader. A single OpenViking-backed server may need to serve HR assistants, legal assistants, code agents, review agents, and personal assistants at the same time. Memory must be reusable inside the right boundary and isolated outside it.

Case 1: one server, multiple business lines

The account boundary separates businesses such as HR and Legal. Resources can be shared inside one business line, while user memories stay separated per user and per agent.

tenant-layout.txttext
hr-platform/ resources/                 # HR-wide documents and workflows agent/    approve/               # approval assistant memories and skills    qa/                    # HR Q&A assistant memories and skills user/     bob/agent/approve/     # Bob memory inside the approval assistant     rock/agent/qa/         # Rock memory inside the Q&A assistant legal-platform/ resources/ agent/    approve/    qa/ user/

Case 2: one personal assistant platform, many users and agents

A personal-assistant service can let multiple agents reuse a user-level preference memory while still keeping each agent workspace clean. That is the difference between memory sharing and memory leakage.

assistant-layout.txttext
personal-assistant/ resources/                 # shared documents and workflows user/    bob/memories/          # Bob global personal memory    rock/memories/ agent/     design/user/bob/       # Bob memory for the design assistant     code/user/bob/     review/user/bob/

VikingBot adds the channel side of this: one Bot Server Gateway can receive messages from channels such as Feishu, Slack, Discord, Telegram, email, or OpenAPI, then map them into shared, per-channel, or per-session sandboxes.

Try It

The shortest path is to install the Bot extension, start OpenViking with bot support, and enter the chat CLI.

terminalbash
pip install "openviking[bot]"openviking-server --with-botov chat

To run the werewolf demo, use the demo script from the OpenViking repository after preparing your OpenViking config.

terminalbash
python start_werewolf_demo.py --config ~/.openviking/ov.conf
Werewolf demo interface after startup
The demo includes the table view, memory browser, leaderboard, and replay flow.
Werewolf demo with a human participant option
The demo can also keep a human seat, which makes the memory and privacy boundaries easier to test.

VikingBot README explains the full Bot setup. The werewolf demo README contains the runnable game setup.