Digital Product Studio

DeepSeek-V3 on M4 Mac: Blazing Fast Inference on Apple Silicon

We just witnessed something incredible: the largest open-source language model flexing its muscles on Apple Silicon. We’re talking about the massive DeepSeek-V3 on M4 Mac, specifically the 671 billion parameter model running on a cluster of 8 M4 Pro Mac Minis with 64GB of RAM each – that’s a whopping 512GB of combined memory!

This isn’t just about bragging rights. It opens up new possibilities for researchers, developers, and anyone interested in pushing the boundaries of AI. Let’s dive into the details and see why DeepSeek-V3 on M4 Mac is such a big deal.

The Results Are In: DeepSeek V3 671B Performance on the M4 Mac Mini Cluster

You want the numbers, right? Here’s how the DeepSeek-V3 on M4 Mac cluster performed, compared to some other well-known models:

DeepSeek-V3 on Apple’s M4 Mac Mini

The immediate takeaway? DeepSeek-V3 on M4 Mac, despite its immense size, isn’t just running – it’s running surprisingly well. The Time-To-First-Token (TTFT) is impressively low, and the Tokens-Per-Second (TPS) is solid.

But the real head-turner is this: Deepseek with 671B parameters is running faster than Llama 70B on the same M4 Mac setup? Yes, you read that correctly. Let’s break down why.

Why So Fast? Understanding the DeepSeek-V3 on M4 Mac Performance Advantage

To understand this surprising result, we need to take a step back and look at how Large Language Models (LLMs) work during inference – the process of generating text. Think of it as the model “thinking” and producing its output.

While we’re excited to share these initial findings about DeepSeek-V3 on M4 Mac, the full story of how the software used in it, distributes models is a bit more complex. For now, let’s focus on the big picture of why DeepSeek-V3 on M4 Mac performs so well.

LLM Inference Explained: A Systems Perspective on Running Large Models

Imagine an LLM as a giant recipe book filled with billions of ingredients (parameters). When you ask it a question, it needs to find the right ingredients and combine them in the correct order to give you an answer (generated text).

At its heart, an LLM is a massive collection of these parameters, billions of numbers that determine how the model behaves. LLMs are “autoregressive,” meaning they generate text one token (a word or part of a word) at a time, and each token depends on the previous ones.

For each token generated, the model performs a lot of calculations using these parameters. These calculations happen on powerful processors, typically GPUs, which are designed for this kind of heavy lifting.

Here’s the crucial point for standard LLMs, generating each token requires accessing all those billions of parameters. Think of it as needing to flip through the entire recipe book for each word you write.

So, what happens for each token?

  1. Load the Model Parameters: The model’s instructions (parameters) need to be loaded onto the processor.
  2. Perform Calculations: The processor performs mathematical operations using these parameters.
  3. Sample the Next Token: Based on the calculations, the model chooses the next word or part of a word.
  4. Repeat: This process repeats, feeding the newly generated token back into the model to generate the next one.

Steps 1 and 2 are the most time-consuming, so let’s focus on them. How quickly we can load the parameters and perform calculations determines how fast the model can generate text.

Memory Bandwidth vs. Compute: The Bottlenecks in LLM Inference

There are two main things that can slow down this process:

  • Memory Bandwidth: How fast can we move those billions of parameters from memory to the processor? Think of this as the width of the highway delivering the ingredients. If the highway is narrow, it takes longer to get everything there.
  • Compute: How fast can the processor perform the calculations once it has the parameters? This is like how quickly the chef can chop, mix, and cook the ingredients.

Whether inference is limited by memory bandwidth or compute depends on the relationship between these two factors. We can express this relationship using a ratio: C / M

Where:

  • C (Compute Rate): How many parameters can the processor work on per second? This is calculated as: FLOPS/second ÷ (FLOPS/parameter)
    • FLOPS/second: The total number of floating-point operations the processor can do per second (its raw processing power).
    • FLOPS/parameter: The number of floating-point operations needed for each parameter.
  • M (Memory Transfer Rate): How many parameters can we move to the processor per second? This is calculated as: Memory bandwidth ÷ (Bytes/parameter)
    • Memory bandwidth: How much data can be moved from memory to the processor each second.
    • Bytes/parameter: How much memory each parameter takes up (this depends on the model’s precision, like 4-bit in the example).

If C / M > 1, we’re limited by memory bandwidth – the highway is too narrow. If C / M < 1, we’re limited by compute – the chef isn’t fast enough, even with all the ingredients ready.

Interestingly, this relationship changes depending on how many requests the model is processing at once (the batch size). For generating one sequence at a time (batch size = 1), like in the tests with DeepSeek-V3 on M4 Mac, inference is often limited by memory bandwidth.

Apple Silicon’s Secret Weapon: Unified Memory and High Bandwidth for DeepSeek-V3 on M4 Mac

This is where Apple Silicon shines. It’s particularly good at running LLMs with a batch size of 1, like when you’re having a conversation with an AI. Why? Two key reasons:

  1. Unified Memory: Apple Silicon uses a “unified memory” architecture. Imagine the processor and the memory living on the same chip, with incredibly fast connections between them. This allows the GPU to access the full 192GB of memory on a single chip at very high speeds. It’s like having all the ingredients right next to the chef.
  2. High Memory Bandwidth to FLOPS Ratio: The ratio of memory bandwidth to processing power is very high in Apple Silicon, especially in the latest M4 chips. For example, the M4 Max has a memory bandwidth of 546GB/s and roughly 34 TFLOPS of processing power (FP16). This translates to a ratio of approximately 8.02. In comparison, an NVIDIA RTX 4090 has a ratio of around 1.52.

This means Apple Silicon is exceptionally good at quickly feeding the processor with the data it needs for single requests, making it surprisingly efficient for running large models like DeepSeek-V3 on M4 Mac when you’re generating one response at a time.

Mixture-of-Experts (MoE) Models: The Key to DeepSeek V3’s Efficiency

Now, let’s bring Mixture-of-Experts (MoE) models into the picture. This is the architecture used by DeepSeek V3 671B, and it’s crucial to understanding its performance on the DeepSeek-V3 on M4 Mac cluster.

Think of an MoE model as having multiple specialized “expert” models within it. For each input, only a small subset of these experts is activated to process the information.

So, while DeepSeek-V3 on M4 Mac has a massive 671 billion parameters, it doesn’t use all of them for every token generation. It only activates a smaller group of experts. However, the catch is, the model needs to have all the parameters readily available because it doesn’t know in advance which experts will be needed.

DeepSeek-V3 on M4 Mac: Why This Setup Works So Well for MoE Models

This is where the combination of DeepSeek-V3 on M4 Mac really shines:

  • Ample Memory: The 512GB of combined memory in the M4 Mac Mini cluster allows us to load all 671 billion parameters of DeepSeek V3. All the “experts” are ready and waiting.
  • Efficient Inference: Because Apple Silicon is so good at quickly accessing data, the model can efficiently load the parameters needed for the activated experts.

In the case of DeepSeek V3, while it has 671 billion parameters, it might only use around 37 billion for generating a single token. Compare this to a dense model like Llama 70B, which uses all 70 billion parameters for every token. As long as we can keep all the parameters in memory, DeepSeek-V3 on M4 Mac can generate a single response faster because it’s only doing calculations on a smaller subset of its total parameters.

Exploring Key Considerations: Power, Cost, and Alternative Setups for Running DeepSeek-V3

Power Consumption:

The impressive performance of DeepSeek-V3 on an M4 Mac cluster naturally leads to some important questions about the practicalities of running such powerful models. One immediate consideration is power consumption. Running large AI models can be energy-intensive, and understanding the power requirements is crucial for planning and budgeting.

In the setup, the cluster of eight Mac Minis has a maximum power draw of around 1120W, with a minimum idle draw of about 40W. Of course, this doesn’t account for the power needed for networking and any client devices involved. It’s interesting to note that this level of power consumption might seem relatively modest when compared to some high-end GPU-based systems often used for similar tasks.

Efficiency:

Another key area of interest is the cost and performance comparison with alternative hardware. Many are curious about how a Mac cluster stacks up against GPU-based setups, like those using NVIDIA 3090s. For models that fit comfortably within the VRAM of a 3090, those setups can be very effective. However, for larger models like DeepSeek V3, which exceed the memory capacity of a single 3090, the Mac cluster demonstrates a compelling level of performance.

The discussion around cost-effectiveness is multi-faceted. While the initial investment for a Mac cluster needs to be considered, the second-hand market for GPUs like the 3090 offers another angle, providing potentially attractive price-to-performance ratios. Furthermore, the ongoing cost of electricity plays a significant role, especially in regions with higher energy prices. For individual users or smaller-scale deployments, the lower power consumption of the Mac setup can be a considerable advantage over time. However, it’s also important to acknowledge that the physical design and form factor of Mac Minis might not be ideal for traditional data center environments.

Older But Beefier Hardware:

Beyond dedicated clusters, there’s also the question of utilizing existing or more affordable hardware. The possibility of running large language models like DeepSeek V3 on used servers with substantial amounts of RAM is an interesting one. Systems with hundreds of gigabytes of RAM, paired with older server CPUs, could potentially house these large models.

Old Server System With 256 GB Ram
Old Server System With 256 GB Ram

The primary limitation in such setups would likely be the processing power of the CPU. While the model might fit in memory, the speed at which computations can be performed would likely be significantly lower compared to GPU-accelerated systems. This could result in slower token generation speeds. However, for specific use cases where real-time responsiveness isn’t paramount, exploring these more budget-friendly hardware options could be a worthwhile endeavor.

Conclusion: The Future of LLM Inference on Apple Silicon with DeepSeek-V3 on M4 Mac

Running DeepSeek-V3 on M4 Mac is more than just a technical achievement. It signifies a shift in how we can approach large language models. The unified memory architecture and the impressive memory bandwidth of Apple Silicon make it a surprisingly capable platform for running massive MoE models.

While GPU clusters remain powerful, the DeepSeek-V3 on M4 Mac example highlights the potential of Apple’s hardware, especially for research, development, and potentially even edge deployments where power efficiency and ease of use are important.

This opens the door for more individuals and smaller teams to experiment with cutting-edge AI models without requiring massive and power-hungry server infrastructure.

| Latest From Us

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Cohere AI Drops Command A, The AI That’s Smarter, Faster and More Affordable

In this fast-moving world of AI today, a powerful AI system needs tons of expensive computer equipment to run properly. Companies have to spend a fortune on hardware just to keep these advanced AI systems running. So, they are always looking for technology that works great without breaking the bank. They need AI that can do impressive things with minimal computing needs. This balance is tricky to get right. But what if there is an AI model that is just as smart and fast but needs way less computing power? That’s exactly what Cohere AI has accomplished with its newest model, Command A.

Meet Command A by Cohere AI

Command A is the newest and most impressive AI model by Cohere AI. It is super smart, really fast, and more secure than earlier versions, like Command R and Command R+. What makes it special is that it works similar to or even better than famous AI models like GPT-4o and DeepSeek-V3 but doesn’t need nearly as much computing power. This gives businesses powerful AI without the huge electric bills and expensive computer equipment.

Key Features of Command A for Enterprises

This model is designed with businesses in mind. It has several features that make it perfect for companies:

1. Command A’s Chat Capabilities

Out of the box, Command A works as a conversational AI with interactive behavior. This setup is perfect for chatbots and other dialogue applications. The model takes text inputs and creates text outputs using an optimized architecture. It has two safety modes: contextual mode allows wider-ranging interactions while maintaining core protections, and strict mode avoids all sensitive topics.

2. 256k Context Window

Under the hood, it has some impressive specs. It has 111 billion parameters and can handle really long texts – up to 256,000 characters at once. Most competing AIs can only handle half that amount.

3. Advanced RAG Capabilities

Command A comes with “retrieval-augmented generation” (RAG). It can look up information and include references for its answers. People who tested found it better than GPT-4o at this task. Its answers were smoother, more accurate, and more useful.

4. Multilingual Excellence

Global companies need AI that works in many languages. Command A supports 23 languages spoken by most of the world’s population. It consistently answers in any of the 23 languages you ask for. In tests, people preferred it over DeepSeek-V3 across most languages for business tasks.

5. Enhanced Code Generation Capabilities

Command A is much better at coding tasks than previous models, outperforming similar-sized models on business-relevant tasks like SQL generation and code translation. Users can ask for code snippets, explanations, or rewrites and get better results by using certain settings for code-related requests.

6. Enterprise-Grade Security

Command A has strong security features to protect sensitive business information. It can also connect with other business tools and apps, making it a versatile addition to existing systems.

7. Agentic Tool Use

The real magic happens when Command A powers AI agents within a company. It works seamlessly with North, Cohere’s platform for secure AI agents. This lets businesses build custom AI helpers that can work inside their secure systems, connecting to customer databases, inventory systems, and search tools.

How Well Command A Performs

When tested side-by-side with the biggest names in AI, like GPT-4o and DeepSeek-V3, Command A holds its own and often comes out on top. It performed better on business tasks, science problems, and computer coding challenges. 

Cohere AI Drops Command A, The AI That’s Smarter, Faster and More Affordable

The model matches or beats the bigger and slower AI models while working much more efficiently.

  • Command A processes information up to 156 tokens per second – that’s 1.75 times faster than GPT-4o and 2.4 times faster than DeepSeek-V3.
  • It only needs two GPUs to run, while other AIs might need up to 32!

Moreover, this tool does great on standard tests for following instructions, working with other tools, and acting as a helpful assistant.

Cohere AI Drops Command A, The AI That’s Smarter, Faster and More Affordable

How to Get Started With Command A

Command A is available right now through several channels. You can try it chat in the Conhere AI’s playground here. You can also try it out through the Hugging Face Space demo here. Soon, it will be available through major cloud providers. Companies that want to install it on their own servers can contact Cohere’s sales team.

Command A Pricing Structure

Cohere AI has set competitive prices for using Command A:

  • Input tokens: $2.50 per million
  • Output tokens: $10.00 per million

This pricing lets businesses predict costs based on how much they’ll use the system, making budget planning easier.

The Command A Advantage

Cohere AI worked hard to make Command A super efficient. They wanted it to be powerful but not power-hungry. The result? An AI that gives answers much faster than its competitors. For businesses thinking about installing Command A on their own computers instead of using it through the internet, they can save up to 50% on costs compared to paying for each use. What does this mean in real life? Businesses using Command A can:

  • Get answers for customers more quickly
  • Spend less money on fancy computers
  • Grow their AI use without huge cost increases
  • Save money overall

Wrapping Up

As more businesses bring AI into their daily operations, tools like Command A will become more important. In a crowded AI market, its ability to deliver great results with minimal resources addresses one of the biggest challenges in business AI adoption.

By putting efficiency first without sacrificing performance, Cohere AI has created a solution that fits perfectly with what modern businesses actually need. For sure, this practical tool can help businesses stay competitive in our AI-powered world.

| Latest From Us

Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Google Launches Gemma 3, A Powerful Yet Lightweight Family of AI Models

Google has just launched the latest addition to the Gemma family of generative AI models, Gemma 3. It is a collection of lightweight, super-smart AI models based on Gemini 2.0. With a remarkable 100 million downloads within its first year and an impressive community that has crafted over 60,000 variants, Gemma has established itself as a cornerstone in the realm of AI development. Gemma 3 is specially designed to run directly on your devices, including phones, laptops, and desktop computers. This means you don’t need expensive cloud servers to use powerful AI models. 

Gemma 3 AI Models

These models comes in four sizes (1B, 4B, 12B, and 27B) and five precision levels, from full 32-bit down to 4-bit. Bigger models with higher precision generally work better but need more computing power and memory. Smaller models with lower precision use fewer resources but might not be quite as capable. You can pick the one that works best for your device and what you want to do.

The memory needed varies a lot depending on which model you choose. The smallest version (Gemma 3 1B in 4-bit precision) needs only about 861 MB of memory – less than a typical smartphone has! The largest version (Gemma 3 27B in full 32-bit precision) needs about 108 GB – that’s like needing a high-end server.

Key Features of Gemma 3

1. Run on a Single GPU

The Gemma models work better than much bigger models like Llama-405B, DeepSeek-V3, and o3-mini. This means these can run on just one GPU or TPU, making good AI cheaper and more accessible for everyone.

2. Multimodal Capabilities

The models (except the smallest 1B size) can understand both pictures and text. This lets apps do cool things like recognize objects in photos, read text from images, and answer questions about pictures.

3. Expanded Context Window

With a 128k-token context window, Gemma 3 can remember and understand lots of information at once. This is 16 times bigger than older Gemma models! You could feed it several multi-page articles, larger single documents, or hundreds of images in a single prompt.

4. Multilingual Support

The models can speak over 35 languages right out of the box and has been trained on more than 140 languages in total. This lets users build apps that can talk to users in their own language, which opens up their apps to many more people.

5. Function Calling Support

Gemma 3 supports “function calling,” which means it can trigger other programs to do things. This facilitates the automation of complex tasks, enhancing the overall functionality and utility of applications built with it.

6. Quantization Support

The models come in “quantized” versions that use less memory and computing power while still being accurate. These versions range from full 32-bit precision down to tiny 4-bit versions, so developers can choose what works best for their needs.

7. Easy Integration with Existing Tools

It plays nicely with lots of popular development tools like Hugging Face Transformers, Ollama, JAX, Keras, PyTorch, Google AI Edge, UnSloth, vLLM, and Gemma.cpp. 

8. Easy to Customize

It comes with recipes for fine-tuning and running it efficiently. Developers can train and adapt the model using platforms like Google Colab, Vertex AI, or even a gaming GPU. 

9. Works Great on NVIDIA GPUs

NVIDIA has specially optimized these models to work well on all their GPUs, from the small Jetson Nano to their newest Blackwell chips. 

How Gemma 3 Compares to Other AI Models

This family has scored impressively on AI benchmarks. The 27B version scored 1338 on the Chatbot Arena Elo leaderboard, putting it in the same league as much bigger models. What’s really amazing is that while some competing models need up to 32 huge NVIDIA H100 GPUs (which cost thousands of dollars each), the 27B variant needs just one GPU. That’s like getting sports car performance for the price of a compact car!

Real-World Uses for Gemma 3

1. Smart Apps on Your Phone

Gemma 3’s efficiency makes it perfect for creating smart apps that run directly on your phone. Developers can build AI assistants, language translators, content creators, and image analyzers that work quickly without needing to connect to the cloud all the time.

2. Edge Computing

For Internet of Things (IoT) devices and edge computing, it lets AI processing happen right where the data is collected. This reduces the need to send data back and forth, which saves bandwidth and keeps private data local.

3. AI for Small Businesses

Gemma 3 makes advanced AI available to organizations with limited resources. Small and medium businesses can now use sophisticated AI without spending a fortune on cloud computing. They can run its applications on the computers they already have.

4. Educational Tools

Schools and universities can use it to help students learn about AI. Students can experiment with cutting-edge AI on regular school computers, and researchers can innovate without needing super expensive systems.

Getting Started With Gemma 3

Developers can try them instantly in their web browser using Google AI Studio. No complicated setup needed! They can also get an API key from Google AI Studio to use it with Google’s GenAI SDK.

For those who want to adapt it to their specific needs, the models are available for download from Hugging Face, Ollama, or Kaggle. You can easily fine-tune and adapt the model using Hugging Face’s Transformers library or other tools you prefer.

| Latest From Us

Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Alibaba Introduces VACE, The Ultimate AI Model That Takes Video Editing to the Next Level

Alibaba is on fire when it comes to AI. The company keeps dropping one AI model after another, including image generators, video generators, chatbots, and much more. Now, they have introduced VACE, a super cool all-in-one AI model for creating and editing videos. Whether you want to generate new videos, edit existing ones, or manipulate specific parts of a clip, VACE has got you covered. Most AI video tools focus on just one or two tasks, maybe simple editing, image generation, basic animation, or color adjustments. But Alibaba’s VACE does it all in one place. 

Key Features of Alibaba VACE for Video Creation and Editing

VACE comes packed with amazing features that change how we make and edit videos. It handles tasks like reference-to-video generation (R2V), video-to-video editing (V2V), and masked video-to-video editing (MV2V). Moreover, it offers cool features like Move-Anything, Swap-Anything, Reference-Anything, Expand-Anything, and Animate-Anything.

1. Text-to-Video Generation (T2V)

VACE includes an amazing Text-to-Video Generation (T2V) feature, which is one of the most basic yet powerful video creation capabilities. You just provide a text prompt, and the video is generated accordingly.

2. Reference-to-Video Generation Feature

VACE’s Reference-to-Video (R2V) feature lets users generate new videos based on reference images. If you have a certain style or aesthetic in mind, VACE can analyze that and create videos that match it.

2. Video-to-Video Editing Feature

This feature lets users make changes to existing videos. It can help you apply a new visual style, change elements in a scene, or tweak colors. The best part? It does all of this while keeping edits smooth and natural, with no weird jumps or inconsistencies.

3. Masked Video-to-Video Editing Feature

This feature lets you edit specific parts of a video. You can define a specific area in the video and make changes to just that part, leaving the rest untouched. This makes it perfect for everything from fixing mistakes to adding new creative elements.

Alibaba Introduces VACE, The Ultimate AI Model That Takes Video Editing to the Next Level

4. Move-Anything Feature

This feature lets users grab objects in a video and move them around while keeping everything looking smooth and natural. Just select, move, and watch the AI do the heavy lifting. It even understands perspective and occlusions, so objects blend right into their new spots without looking out of place. 

5. Swap-Anything Feature

This feature swaps anything out of a video without it looking fake. Whether it’s changing a person’s outfit, replacing a background, or switching out objects, the AI ensures the new elements match the original’s motion, lighting, and surroundings. This is a game-changer, especially for virtual try-ons.

6. Reference-Anything Feature

This feature takes style transfer to the next level. Instead of just applying a filter, VACE lets users bring in colors, textures, and even composition elements from one video or image and apply them to another.

7. Expand-Anything Feature

This feature helps you adjust a video’s aspect ratio without awkward cropping or stretching. It extends the frame, generating new visuals that match the existing scene. Whether you’re repurposing a landscape video for a vertical format or adjusting a shot to fit different screens, this feature makes sure everything looks natural and cohesive. 

8. Animate-Anything Feature

This feature turns still images into moving visuals. With Animate-Anything, VACE analyzes a static image, figures out what could move naturally, and creates realistic motion sequences. You can add subtle movement or full-blown animations. This is perfect for breathing life into any photo.

Performance Evaluation of VACE

What makes VACE stand out? Most AI models focus on just one or two specific tasks. VACE is being built to unify multiple video-editing functions within a single framework. To test its performance, researchers developed the VACE-Benchmark, a framework designed to evaluate video generation quality across multiple factors. 

Compared to task-specific models like I2VGenXL, CogVideoX-I2V, ProPainter, and Control-A-Video, VACE has demonstrated competitive or even superior results in human and automated evaluations. The model showed impressive performance across aesthetic quality, background consistency, dynamic degree, imaging quality, motion smoothness, overall consistency, subject consistency, and temporal flickering, marking it as the best all-in-one tool.

Alibaba Introduces VACE, The Ultimate AI Model That Takes Video Editing to the Next Level

Potential Applications of VACE

VACE has the potential to shake up multiple creative fields. Here’s how it could be used:

1. Film and Video Production

It can help streamline post-production workflows by enabling seamless editing and video generation.

2. Advertising

The Alibaba VACE can create high-quality video ads with specific reference materials and controlled stylistic elements.

3. Gaming and Animation

It can generate animated sequences or game cinematics based on reference imagery or existing footage.

4. Social Media Content

This video model can help creators quickly produce and edit high-quality videos for various platforms.

5. Virtual Reality

It can expand the possibilities for creating immersive visual experiences.

By combining multiple video editing and generation tools into one model, VACE could become a go-to solution for industries that need speed, quality, and creative flexibility

Accessibility and Availability

While VACE has been introduced, it’s not publicly available yet. But, the model and code are expected to be released soon, along with support for ComfyUI workflow, VACE-Benchmark, Wan-VACE Model Inference, and LTX-VACE Model Inference. If the early tests are any indication, this could be one of the biggest leaps in AI-driven video editing yet. Stay tuned for updates!

For more technical details, you can check the model paper.

| Latest From Us

Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.