VikingBot: When Agent Memory Starts Changing Strategy

The werewolf demo is useful because it turns agent memory into something you can watch. Once VikingBot players can carry history across games, they stop acting like isolated chatbots: they remember styles, reuse incidents, hide roles with evidence, and punish old patterns.

The Experiment: Six Agents, Two Memory Conditions

The setup deliberately keeps the game small. Six VikingBot players join one werewolf table. Players 1, 2, and 3 are connected to OpenViking and keep long-term cross-game memory. Players 4, 5, and 6 only see the current game context.

Group	Players	Memory condition
Experiment	VikingBot players 1, 2, and 3	OpenViking memory, available across games
Control	VikingBot players 4, 5, and 6	Only short-term context inside the current game

Werewolf demo showing six player bots and the god bot control panel — The god bot initializes the game, writes each player identity into private GAME.md files, and keeps public flow inside the group chat.

This matters because the game has two channels of truth. Public speech happens in the group chat, while hidden role and night-action information lives in each player workspace. That split is close to real agent products: a platform route delivers messages, but the agent still needs private workspace state and durable context.

God bot asks player 3 to complete a night action in GAME.md — The god bot sends uniform public prompts, while sensitive choices such as kill targets stay in the player file.

OpenViking memory records game history and player style after a round — After each game, the detailed conversation, votes, notable moves, and player styles are committed into OpenViking.

What Changed Across Rounds

Round 1: the bots learn the table

The first game is noisy. The god bot walks players through night actions; then the sheriff race turns into a role-claiming contest. Player 1 bluffs as hunter, player 2 hides a real hunter identity behind a prophet claim, and player 3 reveals enough prophet logic to win trust.

Night flow where the god bot asks each player to read and update GAME.md — The first night exercises the delivery and private-state mechanics before any long-term memory can help.

Sheriff campaign with several bots role-claiming — Early speech is tactical but not yet historical: the bots reason from current claims rather than cross-game evidence.

End of the first game where wolves win quickly — The first game ends quickly after key good-side roles are removed, but it gives OpenViking material to capture.

The important write is not the full transcript. OpenViking distills reusable claims such as style, tendencies, incidents, and strategy outcomes.

Round 2: memory becomes usable evidence

By the second round, the OpenViking-backed players start acting on previous-game facts. One player hides a true prophet identity because a previous early role reveal got punished. Another recognizes player 3 as the forceful hunter-style speaker from history. A werewolf even uses a previous civilian event to defend a fake prophet claim.

Player 1 hides a real prophet identity after learning from a previous game — The behavior change is concrete: memory changes when to reveal, when to abstain, and how to survive a chaotic sheriff race.

Player 2 references player 3 style from memory — Style memory lets a bot treat a current speech as part of a player pattern, not just as one isolated utterance.

Player 1 memory shifts from aggressive role-claiming to cautious hiding — The profile itself evolves: a failed or successful tactic becomes future steering context.

Player 3 uses a previous event as cover while bluffing — This is the first dangerous part of long-term memory: it can support better reasoning, but it also supports better deception.

Another screenshot from the round where historical events support a fake role claim — The current claim becomes more persuasive because it is attached to a remembered incident.

Round 3: profiles turn into strategy

After multiple games, memory no longer looks like a note-taking feature. It becomes a strategic asset. The bots remember who pushes hard, who bluffs, who tends to trust specific lines of reasoning, and which endgame moves failed before.

A wolf almost wins by reusing the hidden-role survival strategy — A wolf can reuse a previous survival tactic, win trust, and still lose because a final vote hits the hunter trigger.

Player 3 keeps reinforcing a forceful hunter persona — A repeated style becomes an identity signal. Other bots can learn it; the player bot can also lean into it.

Historical memory helps question player 4 role claims — History makes suspicion more targeted: a bot can challenge a role claim because this player has made similar unstable claims before.

Win-rate curve during memory initialization — During the initial memory collection phase, win rates remain close enough that memory has not yet separated the players.

Win-rate curve after memory collection — After memory accumulates, OpenViking-backed bots show a visible win-rate lift in the reference experiment.

The visible behavior is not “the bot remembers the transcript.” It is that remembered incidents start changing risk, trust, role claims, and endgame strategy.
— Why the demo is interesting

How OpenViking Turns Chat Into Agent-Usable Memory

VikingBot works because OpenViking is not a raw transcript bucket. It gives the agent a filesystem-like context surface, a staged retrieval model, and memory types that separate player identity, user preference, incidents, cases, tools, and skills.

Memory type	Typical path	Meaning
soul	agent/memories/soul.md	Core truths, boundaries, style, and continuity for the agent.
identity	agent/memories/identity.md	Name, persona, role, and stable presentation details.
cases	agent/memories/cases/	Problem-to-solution case memories. This is where repeated fixes and operational lessons become reusable.
patterns	agent/memories/patterns/	Workflow and methodology memories.
tools / skills	agent/memories/tools/	Tool usage, skill execution, success rate, and best-practice hints.
profile / preferences	user/memories/profile.md	User profile, preferences, entities, and event history.

The werewolf demo uses the same separation in a playful setting. A player has GAME.md for private current-round state, SOUL.md for behavioral rules, and OpenViking memories for durable history. In a coding-agent product, the parallel is a CLAUDE.md or AGENTS.md style instruction file plus durable memory that can survive beyond one repository or one terminal session.

OpenViking memory extraction ReAct flow — OpenViking prefetches existing memory URIs, lets the model decide what to read, and then writes new or updated memory through a patch-like flow.

retrieved-memory.xml

<memory index="1" type="summary">
  <uri>viking://user/player_4/memories/events/2026/04/13/sheriff-campaign.md</uri>
  <content>Player 4 has challenged sheriff-campaign claims before and tends to mark vague role claims as suspicious.</content>
</memory>

<memory index="2" type="entity">
  <uri>viking://user/player_3/memories/entities/game-character.md</uri>
  <content>Player 3 often speaks forcefully, jumps into the sheriff race, and uses hunter identity pressure when the table is chaotic.</content>
</memory>

L0 / L1 / L2 is the token discipline behind this. The agent starts from summaries and URIs; only when it needs proof does it read the full L2 content. That is why the system can keep long-term memory useful without dragging an entire history into every prompt.

Semantic memory filenames inside OpenViking — Semantic filenames are part of the interface. A file path can tell the model whether it is looking at an event, a player entity, a tool lesson, or a case.

The Same Pattern Shows Up in Claude Code and Case Memories

The Claude Code memory plugin is the non-game version of the same idea. Local files such as CLAUDE.md, AGENTS.md, or MEMORY.md are still useful: they are close to the workspace and easy for a human to edit. OpenViking adds the layer those files do not solve well: semantic recall across projects, automatic capture after turns, compaction-safe handoff, and on-demand MCP tools for search, read, store, list, grep, and forget. Claude Code memory plugin / MCP guide

Layer	What it should hold	What should not happen
Local instruction file	Project-specific rules, code style, commands, and team conventions.	Do not turn it into an unbounded diary.
OpenViking memory	Distilled facts, preferences, incidents, cases, and reusable patterns.	Do not blindly upload every raw transcript back into recall.
MCP tools	Explicit search, read, store, delete, and health operations over viking:// resources.	Do not leak server credentials into browser or public repo surfaces.

Case memories are especially important. A case is not just “a fact about the user.” It captures a problem, what was tried, what finally worked, and why it worked. In the werewolf demo, a case is a failed or successful play. In software work, a case can be a production incident, a tricky API integration, or a reviewer preference that changes future PRs.

Evaluation: Accuracy Rises, Token Cost Falls

The reference article also reports LoCoMo long-context dialogue results. Native OpenClaw reaches roughly 24% accuracy. OpenClaw with OpenViking Plugin 2.0 reaches roughly 80% with far lower token use. VikingBot reaches the same accuracy band while cutting token cost further.

System	Accuracy	Token cost
Native OpenClaw	About 24% (+/- 3%)	About 390M
OpenClaw + OpenViking Plugin 2.0	About 80% (+/- 3%)	About 35M
VikingBot	About 80% (+/- 3%)	About 21M

The lesson is not “remember everything.” The lesson is that retrieval must be selective, layered, and inspectable. A good memory system should make the next prompt smaller and more grounded, not larger and more mysterious.

Production Use Cases: Tenancy, Channels, and Governance

The demo uses six game players, but the production version of the problem is broader. A single OpenViking-backed server may need to serve HR assistants, legal assistants, code agents, review agents, and personal assistants at the same time. Memory must be reusable inside the right boundary and isolated outside it.

Case 1: one server, multiple business lines

The account boundary separates businesses such as HR and Legal. Resources can be shared inside one business line, while user memories stay separated per user and per agent.

tenant-layout.txt

hr-platform/
├── resources/                 # HR-wide documents and workflows
├── agent/
│   ├── approve/               # approval assistant memories and skills
│   └── qa/                    # HR Q&A assistant memories and skills
└── user/
    ├── bob/agent/approve/     # Bob memory inside the approval assistant
    └── rock/agent/qa/         # Rock memory inside the Q&A assistant

legal-platform/
├── resources/
├── agent/
│   ├── approve/
│   └── qa/
└── user/

Case 2: one personal assistant platform, many users and agents

A personal-assistant service can let multiple agents reuse a user-level preference memory while still keeping each agent workspace clean. That is the difference between memory sharing and memory leakage.

assistant-layout.txt

personal-assistant/
├── resources/                 # shared documents and workflows
├── user/
│   ├── bob/memories/          # Bob global personal memory
│   └── rock/memories/
└── agent/
    ├── design/user/bob/       # Bob memory for the design assistant
    ├── code/user/bob/
    └── review/user/bob/

VikingBot adds the channel side of this: one Bot Server Gateway can receive messages from channels such as Feishu, Slack, Discord, Telegram, email, or OpenAPI, then map them into shared, per-channel, or per-session sandboxes.

Try It

The shortest path is to install the Bot extension, start OpenViking with bot support, and enter the chat CLI.

terminal

pip install "openviking[bot]"
openviking-server --with-bot
ov chat

To run the werewolf demo, use the demo script from the OpenViking repository after preparing your OpenViking config.

terminal

python start_werewolf_demo.py --config ~/.openviking/ov.conf

Werewolf demo interface after startup — The demo includes the table view, memory browser, leaderboard, and replay flow.

Werewolf demo with a human participant option — The demo can also keep a human seat, which makes the memory and privacy boundaries easier to test.

VikingBot README explains the full Bot setup. The werewolf demo README contains the runnable game setup.