OpenViking starts from a plain problem: useful data exists, but agents still struggle to use it. A model needs an actor surface and a storage substrate; otherwise every task falls back to prompt stuffing.
On one benchmark, text-to-SQL accuracy moved from 0%, to 10% with RAG-style tricks, to roughly 35% when the prompt directly supplied the actual tables and joins.
The failure mode sits in the access plan around messy context: where to look, how far to search, which memory belongs to whom, and whether a write is safe. OpenViking frames that substrate as a context database.
Why A Filesystem-Shaped Interface
Most agent context is not born as clean relational records. It is code, documents, PDFs, images, tickets, meetings, chat logs, calendars, and memories. Using it is closer to search and recommendation than to normal transaction processing: first shrink a noisy corpus into a plausible scope, then rank, read, and refine.
Relational databases remain useful for metadata, billing, jobs, and structured state. They are a poor primary interface for agents because the agent must first discover schemas, tables, joins, and valid predicates before it can even ask for context. A path is a much cheaper control primitive: choose this project, this user memory space, this document subtree, this time bucket, then search inside it.
| Paradigm | What it solves | Where it breaks for agents |
|---|---|---|
| Relational schema | Precise operations over typed records. | The model must infer tables, joins, columns, and filters before retrieval starts. |
| Vector-only RAG | Semantic entry points over unstructured content. | As the corpus grows, embedding discrimination gets worse and small topK misses become fatal. |
| Scalar filters and rerankers | Useful narrowing and second-stage ordering. | They still need good candidate generation. A reranker cannot rescue evidence that never entered the candidate set, and it adds latency and cost. |
| Directory semantics | One compact scope parameter before vector search and rerank. | Ranking becomes more reliable after the search scope has already been narrowed. |
The Shape Of The System
The implementation is deliberately polyglot. Python owns the server because parsing, document processing, multimodal understanding, model SDKs, and AI dependencies live there; OpenViking is IO- and data-pipeline heavy before it is CPU-bound. Rust owns distribution- and latency-sensitive surfaces such as the CLI and RAGFS, where startup time and binary delivery matter. C++ carries the embedded vector database lineage from VikingDB so the project can reuse mature indexing code instead of rewriting the hardest part.
A database-shaped stack for agent context
That split defines the contract of each layer: agents speak in commands and URIs, the server enforces identity and jobs, AGFS/RAGFS gives context a traversable shape, and VikingDB plus file storage decide what can be retrieved or persisted.
Directory Semantics Are The Addressing Layer
Vector search has a scaling problem that matters more in RAG than in recommendation. Recommendation systems can recall thousands of candidates through multiple channels and then rely on coarse and fine ranking. An agent usually cannot pass thousands of chunks downstream. The final context window may only tolerate tens of chunks, and filling too much of it weakens the model before it starts reasoning.
Scalar filters are the first answer: tenant, owner, time, level, source type, and similar fields should prune the search space. Directory retrieval is the more general answer. A lot of useful context is already organized as a tree: code, calendars, wikis, books, service trees, category taxonomies, and geographies. VikingDB turned that observation into a path-aware vector index, and OpenViking exposes it through `viking://` URIs.
The important detail is that `path` is not stored as ordinary text. In VikingDB it is a `TYPE_PATH` index, so a query can choose a tree scope directly instead of scanning path strings as scalar metadata. That is what lowers filter-generation complexity for agents: one path plus a depth rule is much easier to produce than a hand-built predicate over unknown schema.
| Directory feature | Why prefix matching is not enough |
|---|---|
| Depth-aware retrieval | A query must mean current node, direct children, or entire subtree without rewriting string predicates. |
| Directory nodes can carry content | A wiki page can have its own body and child pages. Treating directories as empty prefixes loses that case. |
| Multiple roots and facets | The same kind of corpus may need project, calendar, category, or geography views; each root is a search boundary. |
| Index and permission boundary | The path participates in retrieval, cache, update, and authorization behavior. It is not only a display string. |
Multiple Roots Mean Multiple Logical Views
A multi-root tree is not multiple physical copies of the same file. It means the same object can be indexed under several logical trees, and each tree is a different way to narrow retrieval before vector search. A document may live in the project resource tree, appear again in a calendar tree by creation time, and also be reachable through a category or geography tree if the domain needs that view.
| Root | What it organizes | Agent query it simplifies |
|---|---|---|
viking://resources/... | Project, repository, document, or uploaded resource structure. | Search inside this product, repo, folder, or knowledge base. |
viking://calendar/2026/05/... | Time buckets such as day, month, quarter, or year. | Search memories or materials from last week, this month, or a known incident date. |
viking://geo/cn/zhejiang/... | Geography such as country, province, city, or site. | Search policies, assets, or events inside a location boundary. |
viking://category/infra/storage/... | Domain category, taxonomy, or service tree. | Search within a topic without asking the model to infer category fields. |
Directory depth selector
The buttons change only the visualization; the rule is still visible below.{ "op": "must", "field": "path", "conds": ["/user/shengmaojia/memories"], "para": "-d=1"}d=-1means global retrieval under the current directory.d=0matches the current node itself.d=xsearches downward by `x` levels.
Progressive Disclosure For Context
| Level | What it stores | Why agents need it |
|---|---|---|
L0 | Short summary | Fast orientation before spending tokens. |
L1 | Structure and fields | Enough shape to plan a query or traversal. |
L2 | Detailed source content | Only loaded when precision requires it. |
Files, Virtual URIs, And Multimodal Objects
`viking://` is a logical database namespace, not the physical storage path. The original source path can be preserved as provenance, while the physical AGFS/RAGFS or object-store key stays internal. The visible URI is chosen by the upload command, a user-specified parent path, or OpenViking defaults, and that URI links the stored object with rows in the vector index.
| Path type | Who sees it | Purpose |
|---|---|---|
| Source path | ./docs/images/demo.png | Provenance: where the content came from. |
| Physical storage key | Internal only | Placement in local FS, AGFS/RAGFS, S3-like storage, or cache. |
| Canonical URI | viking://resources/images/20260509/... | Stable identity for read, cite, permission, update, and delete. |
| Matched view URI | viking://calendar/2026/05/09/... | Explains which logical root made the result relevant; it may differ from the canonical URI. |
When there is only one logical view, the canonical URI and matched URI are usually the same. With multiple roots, retrieval should show the matched view so the agent understands why the item appeared, while read and write operations still target the canonical URI.
Multimodality is a separate axis from directory semantics. Text, code, PDFs, and images all benefit from path-scoped retrieval. Images simply make the difference obvious: a query may hit the textual L0/L1 abstract, the image embedding for the L2 object, or both. The directory decides where to search; the modality-specific embeddings decide what is similar inside that scope.
For example, when `ov add-resource ./docs/images/demo.png` runs, OpenViking creates a resource URI, stores the original image as L2, and generates L0 and L1 summaries so an agent can decide whether to inspect the full object.
ov add-resource ./docs/images/demo.png # creates a resource URI similar to:viking://resources/images/20260509/upload_321e98a827a0461f8721c683d726cbec_png| Input | Stored shape | Agent value |
|---|---|---|
demo.png | L0/L1 text abstracts plus L2 image embedding | Can be found through either abstract text or image similarity. |
| Code repository file | Original relative path preserved | Agents can navigate like code while retrieval stays semantic. |
Distributed By Decoupling Storage
The open-source distribution starts as a single-machine service, but the architecture is pointed at managed deployment. The important move is to run OpenViking instances without data disks: vector storage, filesystem storage, logs, and telemetry are abstracted behind middleware interfaces.
The open-source build also avoids nonessential dependencies such as Redis and Kafka. Account information, temporary working directories, transactions, task records, and work queues are kept behind the same filesystem abstraction. That makes the local path easy to operate, while leaving a clear place to swap in managed storage later.
| Mode | What happens | Why it matters | Current caveat |
|---|---|---|---|
| Full read-write | Every instance accepts reads and writes. | Simpler scaling model and likely default direction. | Heavy writes can occupy CPU in a Python single-process server. |
| Read-write separation | Write and read clusters are separated. | Better isolation and availability boundaries. | Currently manual and not the recommended default. |
| Layer | Consistency expectation | OpenViking responsibility |
|---|---|---|
| VikingDB | Eventual consistency in managed vector storage. | Design retrieval and retries around visibility delay. |
| Embedded vector database | Strong consistency on a single machine. | Keep the local mode simple and predictable. |
| Distributed filesystem | Usually strong, still with ordering edge cases. | Protect writes with file and directory locks. |
Where correctness has to be explicit
Identity: Treat Agents As Database Users
The hardest multi-tenant question is not accounts. It is whether an agent is subordinate to a human user, owns data by itself, or should be treated as a peer. OpenViking went through all three designs and is converging on the peer model.
Local multi-tenancy starts with a root API key and explicit user registration. Hosted OpenViking hides the root key and exposes user capacity through service tiers instead. The product surface changes, but the invariant stays the same: every read and write must carry a real identity before it touches private context.
Agent belongs to User
Simple RBAC, but one service agent cannot naturally serve many visitors with separate memory.
Agent can own data
More flexible, but the authorization graph becomes hard to explain and harder to secure.
Human and agent are peers
The target model: `user` is the only authenticated object besides root, and it may represent a human or an agent.
This is a privacy decision as much as a modeling decision. A customer-service agent may manage memories for visitors who are not registered OpenViking users. Forcing those visitors into the same `User` abstraction makes the authorization graph less true and less safe.
Identity flow decides what context can cross
Admin authority
Register users, rotate API keys, configure global policy.
Human and agent as peer users
Root is admin-only; every actor gets a scoped API key and a visible namespace.
Private scope
Index filters and read APIs enforce visible context.
Secrets stay protected
Skills receive restored placeholders only when policy allows.
# server ov.conf: configure root_api_key before startup# client ovcli.conf: configure the same root_api_keyov admin register-user default <your_name># client ovcli.conf: use the returned api_key for normal accessPerformance Is A Pipeline Problem
Once the storage model is distributed, capacity is mostly a deployment choice. Performance is harder because write requests touch parsing, splitting, VLM calls, embedding, summarization, memory extraction, IO movement, and locks.
Vector database
Use VikingDB DSL filters for shared pools; dedicate a vector database for large tenants.
Filesystem
Local FS is fast but fragile; S3/TOS scales but can slow the agent loop.
Write pipeline
Parsing, splitting, VLM calls, embeddings, summaries, and memory extraction dominate latency.
Locks
Directory/file locks protect conflicting writes; transaction semantics are still evolving.
| Layer | Lightweight mode | Heavy mode | Tradeoff |
|---|---|---|---|
| Vector database | Shared VikingDB pool with Account/User scalar filters. | Dedicated vector database per OpenViking instance. | Shared mode saves resources; dedicated mode removes the practical index ceiling. |
| Filesystem | Local FS, ByteNAS, or managed shared FS. | TOS/S3 or EFS-like remote storage. | Local is fast; object storage scales but slows agent loops. |
| Write pipeline | Queue model calls and embedding work. | Globally controlled parallel ingestion. | More throughput, but lock and ordering costs become visible. |
The bottleneck is a chain, not one database call
- ReceiveUpload, dedupe, place in work area.
- ParsePDF, Office, code, images, archives.
- Model callsVLM, embedding, summary, memory extraction.
- IndexVector write and scalar filters.
- PublishMove artifacts into visible namespace.
- ObserveLogs, metrics, traces, retries.
Current optimization directions
- Queue and parallelize model calls with global concurrency control.
- Replace the Go AGFS server path with embedded calls and Rust where transfer cost matters.
- Parallelize tree operations such as `find` and `tree`.
- Reduce copies across receive, work, and visible directories during upload.
Privacy: Context Is Plaintext
A context database stores the material an agent uses to reason. That material is often sensitive by definition. OpenViking handles this with API-key identity, root isolation, user-scoped `viking://user` visibility, optional file encryption, and experimental Skill privacy configs.
| Control | Purpose |
|---|---|
dev | Local development mode without authentication. |
api_key | Required when the service listens beyond localhost. |
ov --sudo | Root identity is explicit and limited to admin actions. |
viking://user | Private data scope filtered at the index layer. |
| Privacy configs | Store Skill secrets in protected storage and restore placeholders at read time. |
Encryption is implemented, but it is not free. Different tenants or accounts can use different keys, which improves blast-radius control, but remote storage has to be decrypted before operations such as `grep`. For a context database, privacy controls affect latency and operator ergonomics, not only compliance posture.
openviking privacy categoriesopenviking privacy list skillopenviking privacy upsert skill byted-viking-search-knowledgebase \ --values-json '{"api_key":"secret-2","base_url":"https://example.com"}'openviking privacy activate skill byted-viking-search-knowledgebase 2What To Remember
The critical architectural insight is that context is not a blob. It has paths, scopes, identities, consistency constraints, performance budgets, and privacy boundaries. OpenViking is useful because it lets agents consume those properties through an interface they can already navigate.
The architecture is still moving from concept to product construction. The open-source release has already produced enough usage, issues, and feedback to make capacity and performance the next hard priorities. The useful thing about the design is that OpenViking names the database properties context systems need to expose before agents can depend on them, while keeping consistency and latency work visible.
We are grateful to the 150+ contributors and participants, the work behind 1000+ merged changes, and the community that has pushed the project past 23k stars. That matters because the remaining questions are not slideware questions; they are the questions that show up when real agents, data, and users start sharing the same context substrate.

