Whoever Owns the Memory
Owns the AI Era
by Jacob Koenig
4/13/26

Everyone is converging on the same idea, and you don't have to be an engineer to build it.
Knowledge compounds into wisdom in ways that automating code and tasks never can. Andrej Karpathy, a founding member of OpenAI and former head of Tesla's self-driving AI, posted his personal AI workflow on X on April 3, and it took off among professionals and hobbyists. It's been quoted and re-posted across the web because his ideas have illuminated something so many of us have been thinking on our own but haven't had the words for. His core insight was that the most valuable thing he does with AI now isn’t writing code or automating tasks, it’s building and curating his own knowledge base.
The Databricks AI Research team published a paper on memory scaling days later, formalizing why that approach works. The same observations have surfaced across LinkedIn, podcasts, and builder communities: the hard part of working with AI isn’t getting it to generate, it’s deciding what to keep, what to encode, and what to throw away.
Memory, more than the model, is becoming the differentiator. It comes back to the same principle every time: when you encode your best thinking into the system, you accelerate the depth of insights that you and the AI can generate together.
I’ve been building a personal AI system where every session makes the next one smarter. In my previous posts, I described the system I call Eji, an AI coaching architecture that encodes my negotiation frameworks, writing voice, coaching philosophy, and the full context of every professional and personal thread I’m managing.
The concept I kept coming back to was stereoscopic vision: two lenses, mine and the AI’s, trained on the same situation, generating insights that neither could produce alone. Those insights then get embedded back into the system, driving deeper insights and better interactions the next time. The loop compounds the more you use it.
What I didn’t have was the vocabulary for what I was building. Databricks and Karpathy have given me better words to have this conversation with. The convergence from multiple directions tells me this isn’t a niche idea anymore.


What hit the wall before the system changed?
The system I described in “Amplify Your Edge” worked, but it hit two hard limits. Moving from project folders to modular skills gave me some freedom, but results were not consistent.
The first problem was context window bloat. Every session pulled in too much: skill instructions, relationship history, tactical frameworks, persona files. The model was spending half its capacity holding unrelated background noise, leaving less room for the actual conversation.
Even if you have infinite resources to spend on compute, a bloated context window still erodes the quality of the insights you can get from the AI. Models have improved to manage that, but if you can get it a cleaner picture, you’ll get better results faster.
The second was that context stayed siloed. A conversation about one relationship couldn’t surface patterns from another unless I manually bridged them or wasted even more context window capacity. The system had depth in individual threads but limited peripheral vision across them.
In the language Databricks uses, I had episodic memory, the raw records of past interactions, but no selective retrieval and no semantic memory layer. Everything loaded every time, and nothing connected across threads.

How did a retrieval layer fix the first problem?
The first fix was giving the system the ability to search for what it needed instead of loading everything up front. I broke my notes into smaller pieces and stored them in a searchable database, so instead of loading everything into every session, the AI could look up only the parts that mattered for whatever we were discussing. Two problems solved at once: the context window stayed lean, and cross-file search meant patterns could surface across threads.
Then I tried to use it from Claude’s chat interface on my phone and it couldn’t connect. The backend was local, which meant it only worked from the desktop environment (Claude Cowork or Claude Code or their equivalents). I work conversationally, iteratively, sometimes from my phone between meetings. A memory layer that only works from one interface isn’t useful for my needs.
So I moved the entire backend to Supabase, a hosted platform with built-in search. With Claude’s help it was no trouble to get a remote connector in place so that every interface, desktop, chat, mobile, and even scheduled overnight tasks, could reach it. This gave me access to the same memory store and the same retrieval layer, accessible from everywhere.
Databricks describes this exact architectural step. They note that file-based memory “works well at small scale and for individual users, but it lacks indexing, structured queries, and efficient similarity search.” The natural next step, they write, is a database-backed system with unified search. That’s what I built, using Supabase’s PostgreSQL engine for both structured queries and vector similarity in a single store.

What happens when retrieval can’t see the connections?
The RAG layer could retrieve facts about any person or situation in my context files. What it couldn’t do was see the pattern connecting two completely different threads. A conversation about a business partner might involve the same interpersonal dynamic I’m navigating with a co-founder in a separate thread. The shared structure lived in my head, not in the database, and no keyword search would surface it because the connection wasn’t just about shared words.
I opened Obsidian (Karpathy’s note-taking app of choice) and started mapping the relationships between my context files visually. Threads connected to people, people connected to patterns, patterns recurred across threads. The graph proved the concept: there were cross-thread dynamics that keyword search would never find.
I love the graph framework, but Obsidian was also locally stored. So I formalized the same patterns directly into Supabase. I compiled 20 short pattern articles, each describing a recurring dynamic across my contexts, organized into three categories: growth edges I’m working on, patterns that recur across relationships, and bridges connecting different areas of my life and work. Then I built a lightweight index, about 300 lines, that maps every pattern with a one-line description and its connections. The index loads at the start of every session so the system knows what it knows before it starts searching.
That solves a problem Databricks identifies as the main bottleneck for memory scaling: “The agent must decide what to ask its memory store before it knows what is in there.” Without the index, a search for a person’s name against the pattern articles returns nothing, because names don’t appear in the pattern text. With the index, the system sees which patterns relate to that person’s thread, searches for those directly, and gets strong results.
Databricks describes three types of memory that agents need: procedural, episodic, and semantic. Those map directly to the three layers in my system. Skills are procedural memory, the encoded instructions for how to act. Context is episodic memory, the record of what happened and where things stand. Concepts are semantic memory, the distilled understanding of what it all means.
But the system also needs to know which layer to reach for first. I’ve found the best results when it goes skills before context, context before concepts. When it was undefined, the system was reaching for concepts before understanding the nature of the problem or the facts involved. Once the hierarchy was in place, every layer started serving the one above it, and the quality of the output improved immediately.

Why did working through conversation produce a different architecture?
I should be direct about what makes my path different from Karpathy’s. I’m not an engineer. I don’t know Python and I don’t write my own retrieval pipelines or deployment scripts. The entire architecture, the memory layer, the concept layer, the voice calibration, was built through iterative conversation with the AI, bouncing ideas for how I operate off of the concepts that Karpathy introduced.
That’s the meta-proof of the thesis. The system that encodes my best thinking was itself built through the stereoscopic vision process I described in my earlier posts. It evolved naturally over the course of months. It wasn’t hard to set up once we dialed in on the direction. It just took multiple turns and an open mind, which is the same practice that makes it work once it’s running.
That constraint also forced a better solution for my use case. Karpathy loads his full context every time because his sessions are more structured. My sessions are long and exploratory, so I can’t afford to burn the context window on excess unrelated memory. I need that space to think through problems in real time.
But this then pushed me toward selective retrieval: a lightweight index that loads cheap, followed by targeted searches for only the relevant patterns. I have large files for context across different areas of my life, but can use two retrieval calls and minimal context size instead of loading each file in its entirety.
The architecture emerged from use. I think that makes it more practical for anyone who works with AI conversationally rather than programmatically, which is most people.

What does the compounding actually look like?
Here’s a concrete example. I was preparing for a negotiation where I expected pushback on timeline. The concept layer surfaced a pattern it had compiled across several past conversations: when I anticipate resistance, I tend to pre-concede before the other person even objects. I’d never named that tendency myself.
But the system had also flagged that the same dynamic was active in a completely separate relationship I was managing, one that had nothing to do with this negotiation. Seeing the pattern across both threads at once changed how I opened. The system connected something I couldn’t see from inside either thread on its own.
The compounding doesn’t come from any single session being brilliant. It comes from the practice: every coaching session, every conversation prep, every drafted email feeding back into the system. The context files get updated, new patterns get compiled into concept articles, and the index grows. And the next session is slightly more grounded than the last because the system remembers what I learned.
Karpathy suggests using hooks in Claude Code to automatically log each interaction. Claude’s chat interface doesn’t have that yet, so the intake layer had to be built differently in my case.
I can drop a file into a folder on my PC or send myself an email with a tag in the subject line, and it gets ingested into the database automatically. New raw material goes into an intake queue where it’s available for the next session to reference, process, or encode into the context files. The bar for getting information in is as low as I could make it.
An overnight pipeline keeps the rest current. A scheduled task runs while I sleep to sync the skill files, check data integrity, flag anything stale, and compile a summary of what changed. It then checks with me the next morning about what new information should be encoded into the context files, since the overnight task doesn’t have access to the full chat history from the previous day’s sessions. The context and concept files are monitored automatically for drift.
Karpathy’s setup is certainly more automated on the code side, but the curation step in mine is intentional. My context is subjective, relationship-heavy, and full of nuance that I don’t want misunderstood, so the human checkpoint is a feature. The result is mostly automated with just enough curation to make sure no misunderstandings get encoded about important context.

Where does this go from here?
Databricks closes their paper with a prediction: “As foundation models converge in capability, the differentiator for enterprise agents will increasingly be what memory they have accumulated rather than which model they call.”
I think that’s true for personal agents too. The model is a reasoning engine, and it’s swappable. The memory, the encoded frameworks, the compiled patterns, the relationship history, the voice calibration, is mine. It took over a year of daily use to build, and it compounds with every session.
The convergence from Karpathy, Databricks, and the broader builder community tells me this isn’t early anymore. The infrastructure matters more than the model and the curation matters more than the generation, but there’s an open question about who will own the memory and how.
Every major platform is trying to become the layer that remembers what users do, what they need, and how they work. Vertical players are building memory into their products because it’s the stickiest thing they can offer. Hobbyists and engineers are building their own because they understand what’s at stake and they don’t want to hand it to someone else. But building and maintaining personal memory is real work, and most people won’t do it. In the end, whoever owns the largest share of it, whether that’s a platform, a product, or an individual practice, will own the era.
The prism keeps refracting and the depth keeps growing, because the practice never stops feeding it.

If you want to try the universal Eji package or compare notes on what you’ve been building, reach out. jkoenig@komcp.com
This article was also posted separately on LinkedIn: https://www.linkedin.com/pulse/whoever-owns-memory-ai-era-jacob-koenig-wmiye