Giving NPCs Memory and Context

Updated June 2026

Memory is what transforms an LLM NPC from a novelty into a compelling game character. Without memory, every conversation starts from scratch and the NPC has no awareness of past interactions. With memory, characters can reference previous discussions, track evolving relationships, remember the player's name and choices, and create the feeling of a genuine ongoing connection.

Memory systems for LLM NPCs operate at multiple timescales. Short-term memory covers the current conversation, maintaining continuity as the dialogue progresses. Long-term memory spans across sessions, allowing the NPC to recall interactions from hours, days, or weeks earlier. Contextual awareness connects the NPC to the game world, ensuring they react to events and changes happening around them. Each layer requires different technical approaches, and a production-quality system typically combines all three.

Step 1: Implement a Conversation Buffer

The conversation buffer is the simplest and most essential form of NPC memory. It stores the recent messages exchanged between the player and the NPC, keeping them in the model's context window so each response builds naturally on what came before.

A buffer of 10 to 20 message pairs typically provides enough history for natural conversational flow. This means the NPC can reference something the player said several exchanges ago, follow up on earlier topics, and maintain logical consistency within the current conversation. The buffer is stored as a list of messages with role labels (player and NPC) and timestamps.

When the buffer exceeds its capacity, the oldest messages must be handled. The simplest approach is to drop them, but this creates an abrupt cutoff where the NPC suddenly loses awareness of earlier conversation topics. A better approach is to compress the dropped messages into a brief summary paragraph that captures the key points, then insert this summary at the beginning of the conversation history. The NPC retains awareness of earlier topics without consuming the full token cost of the original messages.

The summary can be generated by the same language model used for dialogue, using a secondary prompt that says something like "Summarize the key points of this conversation in 2-3 sentences." This adds a small cost and latency overhead, but only when the buffer overflows, which is infrequent for most NPC conversations. Alternatively, rule-based extraction can pull out names, topics, and decisions without a model call.

Step 2: Add Vector-Based Long-Term Memory

Long-term memory allows NPCs to recall interactions from previous conversation sessions. When a player returns to an NPC after leaving and coming back, the character can reference past discussions as a real person would. This is where the relationship between the player and the character starts to feel persistent and meaningful.

The standard approach uses a vector database such as ChromaDB, Pinecone, Qdrant, or Weaviate. After each conversation session ends, the system generates an embedding of the conversation, a numerical vector that captures the semantic content, and stores it in the database along with a text summary and metadata like timestamps and the NPC's identifier.

When a new conversation begins, the system embeds the player's opening message and searches the vector database for the most similar past interactions. The retrieved memories are injected into the prompt as additional context, formatted as brief recaps that the NPC can reference naturally. For example: "You previously spoke with this player about their search for a cure for their sister's illness. They seemed determined but worried about the cost."

Retrieval quality depends on embedding model selection and the granularity of stored memories. Storing entire conversations as single embeddings provides broad coverage but imprecise retrieval. Storing individual message pairs provides precise matching but creates a larger database that is slower to search. A middle ground is to store conversation segments, blocks of 3 to 5 message pairs organized around coherent topics, which balances precision with manageable database size.

Relevance thresholds prevent the system from injecting irrelevant memories. Not every past interaction is relevant to the current conversation, and injecting too many memories wastes tokens and can confuse the model. Set a minimum similarity score and a maximum number of retrieved memories, typically 2 to 4, to keep the context focused and useful.

Step 3: Extract and Store Structured Facts

Vector retrieval is powerful for finding thematically related past interactions, but it can miss specific facts that are always relevant regardless of the current conversation topic. The player's name, their relationship status with the NPC, key promises or agreements, and important events should always be available to the character.

Structured fact storage solves this by extracting key information from conversations and storing it as labeled records. After each conversation, a fact extraction step, which can use the language model itself, identifies important new information: "Player's name: Kael. Player is searching for the Silver Crown. Player promised to bring iron ore from the northern mines."

These facts are stored in a simple key-value store or relational database, associated with the NPC and player identifiers. Before each conversation, the system retrieves all stored facts for this NPC-player pair and includes them in the prompt as a "known facts" section. This ensures the NPC always remembers the player's name and important context, even if the vector search does not surface these details.

Fact management includes updating and removing outdated information. If the player completed a quest, the fact "Player is searching for the Silver Crown" should be updated to "Player found the Silver Crown." If the NPC learns new information that contradicts an earlier fact, the old fact should be replaced. This can be automated by running fact extraction after each conversation and merging new facts with existing ones, or handled through explicit game state updates triggered by quest completion events.

Step 4: Build Episodic Memory

Episodic memory stores significant interactions as discrete narrative events that the NPC experienced. Unlike vector retrieval, which finds similar content, and structured facts, which store key-value data, episodic memory captures the narrative arc of past encounters in a form that the NPC can reference as personal experience.

Each episode is a short narrative summary written from the NPC's perspective: "The traveler named Kael came to my shop asking about enchanted weapons. They seemed desperate. I told them about the old forge in the mountains, and they left immediately. I hope they are careful up there." These summaries capture not just what happened but how the NPC felt about it, which gives the model emotional context for future interactions.

Episodes are generated automatically after significant conversations using a summarization prompt that asks the model to write a brief first-person account of what just happened from the NPC's perspective. They are stored with timestamps and significance scores so the retrieval system can prioritize the most important memories.

When building the conversation prompt, the system retrieves the 2 to 3 most relevant episodes and includes them as a "Your memories" section. The NPC can then naturally reference past events: "I remember when you came through here looking for that forge. Did you find it?" This creates a powerful sense of continuity and relationship that players find genuinely engaging.

Memory decay can be applied to episodes to simulate natural forgetting. Recent and highly significant episodes remain vivid, while older, less important ones gradually fade. This prevents the memory system from accumulating so many episodes that retrieval becomes noisy, and it creates a realistic effect where NPCs remember important events clearly but are fuzzy on minor interactions from long ago.

Step 5: Inject Game State Context

Game state context connects the NPC to the world around them, allowing them to react to events, environmental conditions, and the player's actions that happen outside of direct conversation. Without game state injection, the NPC exists in a conversational bubble, aware only of what was said in dialogue but oblivious to everything else happening in the game.

The context block should include the NPC's current location, time of day, weather or environmental conditions, recent notable events in the game world, the player's visible equipment or status, and any active quest states relevant to this NPC. Format this information concisely as labeled data: "Location: Village market. Time: Late evening. Weather: Heavy rain. Recent event: Wolves attacked the farm this morning. Player is carrying a damaged sword."

Update the context block before each API call so the NPC's awareness stays current. This allows the NPC to make natural observations ("Terrible weather to be out in, you should get inside"), react to recent events ("Did you hear about the wolves at the farm? Terrible business"), and comment on the player's visible state ("That sword has seen better days, I could repair it for you").

Keep the game state context lean. Every token spent on context reduces the budget available for conversation history and model output. Include only information that the NPC would plausibly notice and that might be relevant to conversation. A blacksmith should be aware of nearby market activity and the quality of the player's equipment, but not the diplomatic status of a distant kingdom unless they have a reason to know about it.

Key Takeaway

Effective NPC memory combines multiple layers: conversation buffers for short-term continuity, vector databases for retrieving relevant past interactions, structured fact storage for always-available key information, episodic memory for narrative-rich recollections, and game state context for world awareness. Each layer serves a distinct purpose, and together they create NPCs that feel like they genuinely know the player and live in the game world.