Magic is a San Francisco-based startup focused on developing AI assistants for software development tasks. In their recent research update, Magic shared that they have achieved a major milestone by training their first model to handle a context window of 100 million tokens. Let’s take a deeper look at what this means and how Magic achieved this frontier-level breakthrough.
Table of Contents
Magic First 100 Million Token Model: LTM-2-mini
Magic reveals that they have successfully trained their initial 100 million token context model called LTM-2-mini. This is the equivalent of 10 million lines of code or 750 novels worth of information. This clearly shows the potential of such a large context window for software development-related tasks.
Performance Evaluation of LTM-2-mini: HashHop
Current methods for evaluating long context models like “Needle in Haystack” are imperfect as they give hints to the models. Magic has created a new benchmark called “HashHop” that uses random hashes to remove any semantic clues, truly testing the models’ ability to store and recall information from maximum context.
HashHop evaluates single inference hops as well as “chain of thought” recall over multiple chained inferences. Magic shares some preliminary results showing the model can chain inferences over hashes and complete simple code generation tasks using an in-context framework.
Efficiency Gains of Magic’s Architecture
Magic notes that for each decoded token, the sequence-dimension algorithm used in LTM-2-mini is approximately 1000 times more computationally efficient than the attention mechanism used in the OpenAI GPT-3.5 model. Their approach also requires vastly less memory, allowing them to process this huge context size on a small fraction of the hardware.
Building Supercomputers with Google Cloud
To train their larger LTM models, Magic has partnered with Google Cloud to build two new supercomputers – Magic-G4 using NVIDIA H100 GPUs and Magic-G5 using the new NVIDIA GB200 GPUs. Google Cloud will provide the infrastructure and tools to train models at scale. Magic believes these supercomputers can enable the next breakthroughs in AI by training models at a scale not possible before.
Roadmap to Larger Models and Applications
While LTM-2-mini showed promising results on hashes, Magic acknowledges it was still limited for complex tasks like code generation. However, it proved their approach is viable. Their new supercomputers will enable training far larger models that can tackle nuanced software development problems.
Going Forward
Magic recently raised $320M in funding, led by Eric Schmidt, Jane Street, and Sequoia Capital. This brings their total funding to date to $465M. The new funds will support scaling their AI infrastructure through the Google Cloud partnership and developing new safety-focused research.
The company is hiring across various engineering and research roles to expand their 23-person team. This includes supercomputing experts to manage their growing GPU clusters as well as key roles in security, distributed systems and more. The ultimate goal is to deploy AI assistants capable of massive contextual understanding to transform software development.
To get more technical details, please visit their blog.
| Latest From Us
- Google Knows Where You Are By Tracking Your Location Even With GPS Disabled
- Nvidia’s New Open Model NVLM 1.0 Takes On GPT-4o in Multimodal AI
- Do AI Coding Assistants Really Improve Developer Productivity?
- Nintendo Is Going Against Popular YouTube Channels That Show Its Games Being Emulated
- By 2027, 79 Percent of CEOs Expect Remote Jobs to Be Extinct