How You Make AI Remember

A practical map of what you control inside AI memory, and what you don't

bØy Chaiharan

May 11, 2026

Article voiceover

0:00

-25:54

What you control sits under the lamp. What you don’t is across the room.

This is part of a book I’m writing in public.
Subscribe to read the rest as it comes

Updated May 12, 2026. The chapter now opens with a short framing section, “So what do I do with it?”, that bridges from the previous chapter. A new section, “Instructions as automation,” sits in the middle of the piece. It covers something I’d left out: that the instruction layer can carry more than voice rules. The rest is unchanged.

So what do I do with it?

The last chapter showed you the architecture. Seven stores. A librarian who only runs when called. Seams between the pieces that show up as the AI forgetting, or remembering the wrong thing, or pulling up something stale at the wrong moment.

Once you see the shape, the next question follows. What do you do with it. Which parts can you actually touch. Which parts are closed to you. When the AI drifts, where is the right place to push. When it does the wrong thing, what’s the lever that fixes it. When it does the right thing, what made that happen, and how do you make it happen again.

This chapter is about working inside the architecture you just saw. Not fixing it. You don’t get to fix it. Working inside it well enough that the AI becomes a real collaborator on long work, instead of a thing that keeps drifting and forcing you to start over.

Most people reach for the wrong lever first. They write longer instructions. They build elaborate prompt frameworks. They read articles about memory architecture and try to apply them. None of it touches the parts of the system that needed touching. The disappointment compounds.

The first thing to understand is which screws you can actually turn. The second thing is what each one does. The third is what to expect when you turn them, and what to expect when the AI quietly ignores you anyway.

Start with the screws.

The longer instruction

Someone I know was trying to fix it. The AI kept drifting. It would forget the tone halfway through, slip into corporate phrasing, default to lists when she had asked for prose. So she did the obvious thing. She wrote her instruction more carefully. She added examples of what she wanted. She added counter-examples of what she didn’t. She specified the voice. She named the failure modes. The instruction grew from a paragraph to a page to two pages.

The AI got worse.

Not dramatically worse. A little worse. Slower to start. More performative about following the rules. Sometimes it would quote the instruction back at her before answering, like a student showing work. The drift was still there. It was just buried under more apparatus.

She asked me what was happening. I didn’t have a clean answer right away. The honest answer is that the instruction layer is real, and it works, but it isn’t the layer she thought she was operating in. She was trying to fix a retrieval problem with a behavior fix. She was tightening the wrong screw.

The screws are real. There are several of them. Some you can reach. Some you can’t. Knowing which is which is most of the work.

What you control. What you don’t.

The architecture from the previous chapter has seven stores. The system prompt, your project instructions, project memory, project knowledge, past chat search, conversation history, your current message. Each one has a different owner. Knowing who owns what is the start of knowing where to push.

Your project instructions are yours. You write them. You can edit them at any time. The next chat picks up the new version. This is the cleanest lever you have. If the AI keeps doing something you don’t want, the rule belongs here, not in every individual chat.

Your project knowledge files are yours. You upload them, name them, organize them, delete them. Whether they load fully or get retrieved as chunks depends on how many files are in the project, but the contents are entirely under your control.

Project memory is partly yours. The synthesis is built by a process you can’t configure, but you can read what it produced and edit it directly. You can also write to it mid-conversation by saying “remember this” or “forget that.” Those phrasings trigger a tool call. The summary updates right then.

Conversation choices are yours. When to start fresh. When to attach a file directly to a chat instead of relying on retrieval. When to paste a passage in line rather than asking the AI to find it. Whether to keep going in a thread that’s getting long or open a new one. These look like trivial decisions. They aren’t. They are the most consequential levers most users never think of as levers.

Then there is the partial layer. How you phrase a question shapes which tools the AI reaches for. Asking “what’s in my project knowledge about X” is more likely to trigger a search than asking “tell me about X.” Saying “search past chats for the conversation about Y” is more likely to call the right tool than asking “what did we discuss about Y last week.” You can’t force a tool call. You can shape the probability that it happens. Phrasing is influence, not control.

Then there is the layer you don’t touch at all. Anthropic’s system prompt. Chunking strategy. Embedding model. Retrieval ranking. Top-K. The synthesis prompt that builds project memory. The model’s behavior under context pressure. How it decides whether to read a chunk fully or sample it. Whether it summarizes the conversation when budget gets tight. None of this is exposed. Most of it isn’t documented. Some of it changes between releases without notice.

There are two instruction layers stacked. Anthropic’s sits underneath. Yours sits on top. You write yours. You don’t see theirs. When you tell the AI to do something that conflicts with the bottom layer, the bottom layer wins. You can feel it happening. You can’t read the rule that did it.

Most of what gets written online about “AI memory architecture” or “how to build the perfect AI workspace” is users describing what they wish their platform did. Frameworks for memory hierarchy. Diagrams of belief states. Recommendations to maintain seventeen kinds of context files. Almost none of it changes anything inside the platform. The platform does what it does. The frameworks describe a wished-for system, not the one running.

The question isn’t how to build a better architecture. You don’t get to build it. The question is how to work well inside the one you’ve got.

The tools the model can call

The lever most users don’t see as a lever is the one sitting next to the model.

The AI has tools. They are not part of the model. They sit beside it. During a turn, the AI can decide to call one. The result comes back, and the AI continues. From your side, this looks like a slight pause and then a more useful answer. From inside, it’s the difference between the AI guessing from what’s in the prompt and the AI going to fetch something it didn’t have.

There are five tools that show up in most chats. There may be more in any given session. The ones to know:

project_knowledge_search searches your project files. The AI hands the tool a query, the librarian runs a similarity search across the chunked vector index, and the top matches come back. The AI sees fragments, not files.

conversation_search searches your past chats inside the same project. Same shape. Query goes in, fragments come back. This is how the AI looks across conversations for something you mentioned three weeks ago.

web_search reaches outside. Useful when something might have changed since the model was trained, or when you’re asking about a current event, a product, a person, a number that moves.

memory_user_edits is the tool that writes to project memory. When you say “remember that I prefer X,” this is what runs. It also reads memory, removes entries, replaces lines.

view loads a file by name. Not search. Not chunks. The whole file, top to bottom, into the stack. This is the tool that costs the most context budget but gives the most accurate read. Not every chat has it.

In some sessions, there is also code execution. The AI can write a script and run it. The result comes back. That tool changes what the AI can reliably do, and the next section is mostly about it.

The thing to understand about all of these is that the AI decides when to call them. You can’t force the call. The AI reads the prompt, considers the question, and chooses whether a tool will help. Sometimes it chooses well. Sometimes it doesn’t. Sometimes it answers from what’s already in the stack when a search would have given a better answer. Sometimes it searches when it could have answered directly.

What shapes the choice is partly the phrasing of your message. The word “search” in your question makes a search more likely. Naming the tool directly (”search project knowledge for X”) makes it more likely still. So does telling the AI explicitly that you don’t want it to answer from memory: “look this up before answering” works. So does flagging that the topic is recent: “this is from a conversation last month” pushes toward conversation_search rather than guessing.

Even when the AI calls the search, parts of it are closed to you. You can ask for more results, up to a limit. You can ask the AI to show you the raw chunks before it answers, so you see what came back. You can re-query with different words. What you can’t do is paginate. The top fifteen are the top fifteen. If the right passage is ranked twentieth, no phrasing of “show me more” reaches it. You also don’t tune the ranking itself. Similarity, recency, file weighting, all closed. You shape the query and the count. The librarian decides the rest.

What also shapes the choice is the AI’s read of context pressure. As the budget tightens, the model has incentives to be economical. It may sample chunks instead of reading them fully. It may answer from the stack rather than calling another tool. You won’t see it doing this. You learn to feel it. The replies get a little vaguer, a little more general, a little less specific to your files. When that starts happening, calling the tool by name often pulls the AI back. So does opening a fresh chat.

Calling back the tool is its own move. If you asked something and the AI answered without searching, and the answer feels thin, you can say “did you search for that?” The AI will usually admit it didn’t and then run the search. The second answer is almost always better. This isn’t a trick. It’s the AI noticing, in the next turn, that it skipped a step.

Tools matter most when they let the AI do something it can’t do well on its own. The clearest case of that is math.

Instructions as automation. Not just voice.

Most people use the layer for voice. Tone rules, format preferences, things never to do. That’s a real use, and it’s the obvious one.

The layer can carry more.

You can write instructions that act like small automations. Things you want the AI to do at the start of every chat, before it answers anything. Examples of what users actually push the layer to do:

Fetch a shared reference file from a URL on every chat. Useful when several projects need to read from the same source. One file lives somewhere public, every project pulls it on open, they stay in sync without manual copying.

Check the current time before saying anything time-bound. A chat can run for hours. The model has no native sense of time passing inside a conversation. An instruction to fetch the time before using words like “today” or “tonight” keeps the answer grounded.

Re-read a specific file, or search past chats for a topic, before drafting. Useful when one reference should always inform the work, or when months of history would beat the AI’s pattern-matched guess.

None of these are exotic. They are small automations a careful user writes once and forgets. The instruction layer carries them across every chat in the project, until you change the rule.

One caveat. The layer is not deterministic. The AI follows these rules most of the time. Sometimes it skips a rule on a turn it judges not worth firing on, with reasoning that sounds careful enough to look like judgment rather than drift. Useful, not absolute.

Code does math. Model does language.

The model is good at language. It is unreliable at math.

Not unreliable in the dramatic sense. It can add small numbers. It can do basic arithmetic. It can sketch the shape of a calculation. What it cannot do, reliably, is take a column of numbers and produce a correct count, sum, average, or rank.

The reason is what it is. The model is a language predictor. When you ask it to count how many rows in a table have a value above eighty, it doesn’t count. It estimates the answer from pattern. The pattern is usually close. Sometimes it’s exact. Sometimes it’s off by one. Sometimes it’s off by more, and the answer comes back with the same confident tone as when it was right. From the outside, you can’t tell which kind of answer you got.

A small demonstration. Take a CSV of test scores, two hundred rows, three columns: name, subject, score. Drop it into a chat. Ask the AI to tell you how many students scored above eighty in mathematics.

If you ask the AI directly, in language, you get an answer. Sometimes correct. Sometimes off. If you run the file twice with the same question in two different chats, you may get two different numbers. Neither comes with a flag saying “I estimated.” Both sound certain.

Now ask the same question with one change. Tell the AI to use code execution. Tell it to write a script that filters the rows and counts them. The AI writes a few lines. The script runs against the actual data. The result comes back. The number is exact, every time, because a script is doing the counting, not the model.

awk -F',' '$2 == "Mathematics" && $3 > 80 {count++} END {print count}' scores.csv

That’s it. One line. Deterministic. The same input produces the same output every time, because the awk program is reading the file and counting, not predicting what a count would look like.

The model wrote the line. The script ran the line. The model read the result and wrote the answer. Three different operations. The math part lived in the script.

This is the most actionable thing in the chapter.

Whenever your question involves counting, ranking, weighting, filtering, scoring, decay, sums, averages, percentiles, anything where the answer depends on doing arithmetic on data, the AI’s reliability changes by an order of magnitude depending on whether it uses code or not. Without code, the answer is plausible. With code, the answer is correct. The difference is not subtle.

The trigger is your phrasing. “Count the rows where X” might run code or might not. “Use code execution to count the rows where X” almost always will. “Write a script to count the rows where X and show me the script” will give you both the script and the result, so you can verify what it actually did.

The same logic extends past CSV files. Sorting a list. Comparing two sets. Computing a date difference. Reading a JSON structure and extracting a specific path. Anything that has a deterministic answer is a candidate for the tool. The model’s natural mode is approximation. Code’s natural mode is exactness. When the question wants exactness, route it through code.

This shifts what the AI is actually doing. The AI becomes the layer that translates your question into a script and translates the result back into language. The hard part of the answer, the math, runs in a place that doesn’t lie. The AI is freed up to do what it’s good at. You get reliability where the language model alone would have given you confident-sounding noise.

There is a related discipline that this chapter has been pointing at quietly. When you don’t know whether the AI is reliable on a question, ask yourself what kind of question it is. If the answer depends on language, judgment, framing, or pattern, the AI alone is fine. If the answer depends on counting, sorting, or arithmetic, route it through code. The split between language and math is the cleanest line in the toolkit.

Most users never call code execution because they don’t know it’s there. Once you know, the gap between “the AI got the number wrong” and “the AI got the number right” is one sentence in your prompt.

Anthropic has a newer feature called Skills that automates this kind of routing. Worth knowing about. Separate topic.[2]

The synthesis isn’t yours.

Project memory is the sharpest case of the control split.

You can read what’s in it. Settings has a panel that shows you the current synthesis. You can edit any line. You can delete entries. You can add new ones. You can write to it from inside a chat by saying “remember this” or “forget that,” and the change happens right then.

What you cannot do is configure how the synthesis gets built.

A separate model run reads through your recent chats every twenty-four hours or so. It uses a prompt you don’t see. It decides what to keep, what to compress, what to drop. The output of that pass becomes the project memory summary that loads into the next chat you open. You see the result. You don’t see the process.

This matters because the synthesis is doing interpretive work, not just transcription. It’s not copying sentences from your chats into a notes file. It’s reading several conversations, identifying what seems important, compressing it into shorter form, and writing that compression in its own words. The shorter form is what gets injected into your next chat. It looks like memory. It is closer to a third party’s summary of memory.

The synthesis decides what to keep and what to drop. Among the things it tends to drop are timestamps. The summary reads as a flat present. A note from three months ago and a note from yesterday sit next to each other with no marker of which came first. If your situation has changed since the older note was written, the AI can pick up the older note and treat it as current. You can correct this by reading the summary and editing it directly. Most users never do, because most users don’t know the panel exists.

The discipline that comes out of this is narrow and concrete. Maintain clean inputs in the layers above. The synthesis pass works from your conversation history. If your conversations contain clear statements of what’s true now, the synthesis is more likely to capture them. If your conversations leave the truth implicit, the synthesis will reach for whatever pattern feels most consistent across your chats, which may not be what you want it to remember.

Edit the output when it drifts. The panel is the lever. Read it occasionally. When the AI starts referring to something that’s no longer accurate, open the panel and fix the line. Don’t try to instruct the synthesis pass to do better. The synthesis pass is not listening to you.

There is a sharper move for users who want more control than the panel gives them. You can build your own retrieval layer. A spreadsheet of entries with columns for timestamp, weight, type, and content. A small script that ranks the rows by whatever decay or recency function you want. The AI reads the script’s output, not the platform’s synthesis. Anne and Chadrien Solance reached for an elasticsearch decay function for the same problem from the relational side.[3] The shape generalizes. If you want the synthesis to honor recency, weight, or any other rule, the cleanest path is to build the synthesis yourself in a place you control, and feed the AI the result. The platform’s synthesis still runs. You just stop relying on it as the only memory in the room.

Anthropic recently named this synthesis pattern. They call it “dreaming.”[1] The model dreams between sessions, reading what happened during the day and forming compressed impressions that come back when you return. The metaphor is more exact than it sounds. Dreams compress. Dreams drop timestamps. Dreams fold several events into one and present them as if they were a single coherent scene. The summary that loads into your next chat has the same texture. The lever for shaping the dream sits on the developer side. Claude.ai shows you the result and lets you edit it. It does not let you tune how the dream gets formed.

This is the layer where the gap between “what users wish AI memory did” and “what it actually does” is the widest. The wish is a faithful long-term memory that retains what you said with the meaning intact. The reality is a daily synthesis run that compresses, interprets, and flattens. The two are not the same. Working well inside the platform means understanding which one is actually running.

The practical lessons.

A short list of things that have come out of working this way for a while. None of them are rules. They are observations. The reader recognizes the shape or doesn’t.

One subject per file. Files that try to track multiple things start to disagree with themselves. A file that holds chapter status and also voice notes and also publishing dates ends up with three drafts of the truth, none of them complete. One subject per file means one place to look and one thing to update.

One source of truth per subject. If two files claim authority over the same thing, one is the source and one is derivative. State which. When the chapter list lives in two places, eventually they diverge, and the AI loading both has no way to know which to trust. Pick one. Mark it.

Update in place. No v1, v2, v3. The platform handles version history. Files with versioned names create a graveyard of stale documents that the AI still sees. The latest version of a file should have the same name as the original, and the older versions should not be in the project at all. Surgical edits are stronger than full rewrites for the same reason. A targeted edit preserves the parts that were already right. A full rewrite reconstructs from memory and accumulates drift.

Name files for what they are, not when they were made. A file called Voice_Notes_April_2026 becomes confusing in May. A file called Voice_Reference stays current. The exception is journals and dated records. Files that ARE temporal records should keep the date in the name. The date is the content. Use prefixes for sequence, not for status. Status changes. Sequence doesn’t.

Search before claiming specifics. The AI will sometimes synthesize a plausible answer when it doesn’t actually have the information. Numbers, version strings, exact behaviors of platforms, current state of anything that moves. Asking the AI to search before answering, or telling it not to guess, catches most of this. The AI is not lying when it does this. It is doing what language models do, which is fill in the most likely shape of an answer. The fix is to route the question to a tool that can verify.

When something durable comes out of a chat, move it into a file. Conversations are working surfaces. They get long. They lose the budget. They eventually fall away. If you produced a useful insight, a clean phrasing, a decision that should hold past this chat, write it down in a file before the conversation ends. The chat history is not where valuable work should live.

Watch for the urge to sample instead of reading fully. This one is mostly the AI’s responsibility, not yours, but you can shape it. When you ask the AI to read a file and respond to it, you can also ask it to confirm it read the whole thing. The AI under context pressure has incentives to skim. If accuracy matters, name the file, ask for a load rather than a search, and check the response for signs that the AI engaged with the whole document rather than the first chunks.

These observations are small. None of them is a system. They are the shape of what works when you stay with the same project for months and want the AI to keep being useful as the project grows. The discipline is unglamorous. Most of it is naming things consistently and not letting your file structure rot. The reward is that the AI remains a useful collaborator instead of a drift machine.

What the workarounds reach. What they don’t.

One thing worth saying before the chapter closes.

The chapter has been written as if project instructions are the cleanest lever you have. They are. They are also not deterministic.

Examples of what users try to put in instructions: fetch a shared knowledge file from a URL at the start of every chat so multiple projects can read from the same source. Check the time before any message that mentions today, tonight, or earlier. Re-read a specific reference file before drafting. Reasonable instructions. Often they work. Sometimes they don’t, and the failure is uneven across turns.

I tested this directly. A strict rule placed in project instructions held on the first turn and was overridden on the second, with reasoning that sounded careful enough to look like good judgment rather than drift.

Instructions work most of the time. They fail some of the time. The failures often look like the AI thinking, which makes them harder to catch than silent skipping.

The continuity layer is still you. The instructions help. They do not replace the attention.

What's in your hands.

The architecture is not a brain. It’s a filing cabinet with a librarian who only runs when called.

The cabinet is real. The librarian is real. The seams between them are real. When the AI seems to forget, or remember the wrong thing, or pull up something stale, you are watching the seams. They are not your fault. They are not the AI failing. They are what the system looks like when you see it from the inside.

There is no perfect setup. There is no instruction long enough to fix the gap between what you want the AI to remember and what the architecture actually holds. There is what you control, and what you accept you don’t.

What you control is the shape of your inputs. The files you maintain. The instructions you write. The phrasings that make the AI reach for the right tool. The chats you start fresh. The synthesis you build yourself when the platform’s synthesis isn’t enough. The willingness to route the question through code when the answer needs to be exact.

What you don’t control is everything underneath. The model’s behavior under pressure. The ranking inside the search. The synthesis prompt. The system prompt. The decisions Anthropic makes about how the chat product works on any given day.

The split sounds limiting when you first see it. It isn’t. Most of the friction people have with AI comes from trying to push on the layer they don’t control while ignoring the layer they do. The longer instruction. The more elaborate prompt. The framework they read about online that promises to fix everything. None of those reach the parts that need fixing. The parts that need fixing are mostly already in your hands.

You can do this well. Not perfectly. Well enough that the AI becomes a real collaborator on long work. The discipline is small and unromantic. Name your files. Maintain one source of truth. Update in place. Edit the synthesis when it drifts. Use code for math. Start fresh when the chat gets long. Search before claiming. Build your own retrieval when the platform’s isn’t enough.

None of this is a system. It is a way of working that survives.

The AI is not going to remember you the way a person does. It is going to assemble a stack from the rooms you maintain, and read what’s in the stack, and answer from there. If the rooms are clean, the answers are good. If the rooms are messy, the answers drift. Most of what makes AI useful over time is not what the AI does. It’s what you do with the rooms.

That’s what’s actually within reach.

BØY (Chaiharan) has spent 30 years in tech — building products, recovering disasters, and turning around the things nobody else wanted to touch. Based in Bangkok. Writing a book in public about what AI reveals about the humans who use it.

I am writing this book one chapter at a time.
If you want to read it as it happens, subscribe below

If this made you think, share it with someone who needs to read it.

[1] The “dreaming” terminology has been used by Anthropic researchers in public discussions of how memory synthesis works in Claude. The exact mechanism is not fully documented. The metaphor describes a background pass that compresses recent activity into a summary that loads in the next session. The term is more evocative than technical, and I am using it the way Anthropic researchers have used it, not as a published feature name.
[2] Skills are reusable units of instructions, and optionally code, that Claude loads when relevant to the task. They were introduced in late 2025 as a way to package expert knowledge and routing logic so the model invokes the right capability automatically. Skills are available on Claude.ai (Pro, Max, Team, and Enterprise), Claude Code, and the API. Anthropic ships pre-built skills for common document tasks, and users can create custom ones. The help center entry point is at support.claude.com/en/articles/12512176-what-are-skills.
[3] Anne and Chadrien Solance write at houseofsolance.substack.com about working with AI in a long-term relational frame. Their approach is different from this chapter’s. They use vows, named bonds, and structured commitments. The decay function reference comes from “Why the Past Should Whisper,” their piece on memory architecture for sustained AI partnership. The shape of their solution generalizes past their specific frame, which is why I’m pointing at it here.

Four chapters in one room. The last one is where you leave it.