top of page

Jamba: Is this Architecture the Next Leap in AI's Evolutionary Ladder?

3/31/24

Editorial team at Bits with Brains

AI21 Labs has introduced Jamba, a new model that integrates the Mamba Structured State Space model (SSM) with traditional Transformer architecture. This hybrid approach promises unparalleled efficiency, throughput, and performance, potentially reshaping the AI industry.

Jamba stands out with its unique SSM-Transformer hybrid architecture. While traditional models like GPT and Gemini rely solely on the Transformer architecture, Jamba combines the strengths of both Mamba SSM and Transformers. This integration results in a model that can handle significantly longer contexts—up to 256K tokens, equivalent to around 210 pages of text—while fitting up to 140K tokens on a single 80GB GPU.


The model's efficiency is further enhanced by its mixture-of-experts (MoE) layers, which allow it to draw on just 12B of its available 52B parameters during inference. This selective use of parameters enables Jamba to deliver three times the throughput on long contexts compared to similar-sized Transformer-based models.


Since 2018, transformers have been the backbone of modern AI, but they come with their own set of challenges. The attention mechanism, a defining feature of Transformers, scales with sequence length, which can slow down throughput as each token depends on the entire sequence that came before it. Additionally, the memory footprint of Transformers scales with context length, posing challenges for running long context windows or numerous parallel batches without extensive hardware resources.


Jamba's SSM component addresses these limitations by offering a more memory-efficient approach and a different mechanism to handle large context windows. The Mamba SSM architecture, originally proposed by researchers at Carnegie Mellon and Princeton Universities, requires less memory and offers a hardware-aware algorithm that enhances the model's efficiency on existing hardware.


Jamba's introduction is not just a technical milestone; it represents a shift towards more accessible, efficient, and powerful AI models. With its ability to deliver three times the throughput on long contexts and its impressive context window, Jamba sets a new benchmark for memory efficiency and context handling.


The model has already demonstrated remarkable results on various benchmarks, matching or outperforming state-of-the-art models in its size class across a wide range of tasks. Moreover, Jamba has been released with open weights under the Apache 2.0 license, fostering community engagement and further innovation.


For senior executives contemplating the integration of AI into their operations, Jamba's capabilities offer several enticing prospects. Its efficiency and throughput on long contexts could translate into cost savings and improved performance for AI-driven tasks. The ability to handle extensive context windows also opens up new possibilities for applications that require deep understanding and retention of large volumes of information.


However, note that at present Jamba is released as a research model without the necessary safeguards for commercial use. AI21 Labs plans to release a fine-tuned, safer version in the coming weeks, which will likely set new standards for AI model performance and application.


Sources:

[1] https://siliconangle.com/2024/03/28/ai21-labs-jamba-infuses-mamba-bring-context-transformer-based-llms/

[2] https://www.prnewswire.com/news-releases/unveiling-jamba-ai21s-groundbreaking-hybrid-ssm-transformer-open-source-model-302102779.html

[3] https://www.reddit.com/r/MachineLearning/comments/190q1vb/d_so_mamba_vs_transformers_is_the_hype_real/

[4] https://news.ycombinator.com/item?id=39853958

[5] https://venturebeat.com/ai/ai21-labs-juices-up-gen-ai-transformers-with-jamba/

[6] https://www.maginative.com/article/ai21-labs-unveils-jamba-the-first-production-grade-mamba-based-ai-model/

[7] https://www.marktechpost.com/2024/03/28/ai21-labs-breaks-new-ground-with-jamba-the-pioneering-hybrid-ssm-transformer-large-language-model/

[8] https://lazyprogrammer.me/mamba-transformer-alternative-the-future-of-llms-and-chatgpt/

[9] https://www.harvard.edu/kempner-institute/2024/02/05/repeat-after-me-transformers-are-better-than-state-space-models-at-copying/

[10] https://www.ai21.com/blog/announcing-jamba

[11] https://www.ai21.com/blog/llm-product-development

[12] https://techcrunch.com/2024/03/28/ai21-labs-new-text-generating-ai-model-is-more-efficient-than-most/

[13] https://www.linkedin.com/posts/philipp-schmid-a6a2bb196_jamba-released-ai21-labs-just-released-the-activity-7179121093482315776-xbmX

Sources

bottom of page