Solving the Context Window Problem: Another Step towards Artificial General Intelligence

10/29/23

Editorial team at Bits with Brains

Artificial intelligence has made tremendous strides in recent years, but one key limitation is the inability to effectively retain long-term memory and context. Most AI systems today, including large language models like GPT-4, have limited memory and can only reason within a small context window.

The context window sets the upper limit on the complexity of a problem that can be solved using a LLM, as well as the duration of a chat before the LLM starts to lose track of the conversation. Most LLM context windows range from ~2k-4k tokens, and up 8k-32k for GPT4.

Anthropic’s Claude model has the largest context window to date at 100k tokens. A 100k context limit translates into approximately 75k words, and while this is adequate for many applications, it is very inadequate when trying to parse, for example, a corporate database.

There are two obvious ways of addressing this limitation. The first is to increase the size of the context window by implementing some form of long-term memory. The second is to utilize the available context window more effectively.

Increasing the Context Window via Long-Term Memory

New research from UC Berkeley proposes a system that mimics how computers manage memory to give AI expansive, long-term memory. The key innovation is treating an AI system's limited context window as the equivalent of a computer's RAM, while adding an unlimited external database as the hard drive equivalent. The AI then uses functions to move information in and out of its working memory as needed, just as an operating system swaps data between RAM and hard drives.

Called MemGPT, this virtual memory system could enable AI agents to remember details across thousands or even millions of conversational turns or text documents, massively improving their reasoning abilities. The researchers demonstrated the system’s capabilities on two essential tasks - multi-session dialog and document analysis. For both applications, the system maintained high accuracy even when drawing on hundreds of documents, while regular AI's performance degraded rapidly.

MemGPT could represent an important step towards human-like memory for AI, as current models tend to "forget" earlier information, making consistency challenging in long conversations. MemGPT agents can explicitly store facts, retrieve memories, and correct outdated information, behaving more like humans do. This could significantly improve performance on real-world applications like customer service chatbots that require long, coherent dialog.

One of the best things is that the UC Berkeley team has open-sourced MemGPT to allow others to build on their work. Next steps include supporting more diverse AI architectures, as well as integration with simulation frameworks like Anthropic's Autogen.

While there are still challenges to overcome, such as the computational overhead of MemGPT's memory management, the foundations are set for AI systems that learn and reason well beyond current models.

Sparse Priming Representations: Better Use of Available Memory

Sparse Priming Representations (SPRs) are a method of priming language models to think in a certain way, similar to how the human brain works. The concept is based on the idea of semantic associations, where a few words or phrases can trigger a cascade of related ideas, facts, and images.

This is akin to how the human brain forms rich mental models, and just a few related words can activate these models. For example, the “Golden Age of Rome" conjures history, art, architecture and philosophy.

Large language models, like GPT-4, have been proven to have these mental models. They possess capabilities such as theory of mind, planning, reasoning, and logic. Despite some flaws in their logic, these models are still comparable to human brains, which also exhibit flawed logic!

The power of SPRs lies in their ability to remind these models of information they already know, using shorthand notes. Advocates believe this is a more efficient approach compared to models like MemGPT, which involve complex loops to distill and retrieve information.

SPRs rely on making use of an important large language model feature: latent space. Latent space refers to the embedded knowledge, abilities, and concepts in these models, which can be activated with the correct series of words as inputs. This creates a useful internal state in the neural network, like how the right shorthand cues can prime a human mind to think in a certain way.

SPRs compress huge blocks of information into succinct statements, assertions, associations, concepts, analogies, and metaphors. The idea is to capture as much conceptually as possible but with as few words as possible. For instance, a large amount of text can be distilled into a list of statements, which can then be used to prime the model. This is essentially a form of semantic compression. The model can then reconstruct the original idea from these statements.

While MemGPT and similar models are the focus of much research right now, SPRs may offer an equally powerful and efficient solution. They leverage the associative nature of large language models, providing a token efficient way of conveying complex concepts. As large language models continue to evolve, it's clear that SPRs are a promising direction to explore.

If long-term memory and more efficient memory usage are the missing pieces that are holding AI back from more general intelligence, these two approaches could provide one of the final pushes we need.

Sources:

https://memgpt.ai/

What Every Senior Decision-Maker Needs to Know About AI and its Impact

Solving the Context Window Problem: Another Step towards Artificial General Intelligence

Sources