Site icon DigiAlps LTD

Reka Flash 3, The 21B AI Reasoning Model That Excels at Chat, Coding and More

Reka Flash 3, The 21B AI Reasoning Model That Excels at Chat, Coding and More

Reka Flash 3, The 21B AI Reasoning Model That Excels at Chat, Coding and More

Reka AI just released a new AI Model, Reka Flash 3. This AI packs powerful thinking abilities into a smaller package of 21 billion parameters. It’s good at chatting, writing code, following instructions, and connecting with other programs. What makes Reka Flash 3 special is that it works just as well as expensive private models like OpenAI’s o1-mini, even though it’s available for everyone to use. It’s currently the best model of its size anywhere. If you need a fast AI that doesn’t use too many resources, it is perfect for the job.

How Reka Flash 3 Was Trained

It is built from scratch using a three-step process. 

1. Pretraining: First, researchers fed it loads of information from public sources and datasets. 

2. Instruction Tuning: Next, they taught it to follow instructions using carefully selected, high-quality examples. 

3. Reinforcement Learning: Finally, they used a smart learning method called REINFORCE Leave One-Out (RLOO) that gave the model rewards when it did well. 

Unlike some models that only focus on math or coding, Reka Flash 3 was trained to be good at many different things. 

Reasoning Process of Reka Flash 3

It shows you its thinking process using <reasoning> and </reasoning> tags. This lets you see exactly how it solves problems. For complicated questions, it might think for a long time, but you can tell it to wrap up its thinking after a certain number of steps. This is called “budget forcing.”

Even with limited thinking time, Reka Flash 3 still gives good answers. Tests on math problems (AIME-2024) show that while it does better with more thinking time, it can still perform well with budget constraints.

The budget-forcing feature is a clever way to balance deep thinking with efficiency. By using tags to mark the thinking process, developers can control how much time the model spends reasoning.

Key Features of Reka Flash 3

1. Advanced Reasoning Capabilities

As discussed above, the AI model uses reasoning tags (<reasoning> and </reasoning>) to think through problems step-by-step before delivering answers. 

2. Compact Architecture

With 21B parameters, Reka Flash 3 is smaller than many competitors but rivals larger models in performance. 

3. Long Context Window

A 32k context length allows it to handle lengthy documents, multi-turn conversations, and detailed instructions.

4. On-Device Deployment Potential

Its efficient size and strong performance make it ideal for on-device deployment, ensuring privacy, local processing, and usability in low-connectivity environments.

5. Open-Source Availability

Available under the Apache 2.0 license, Reka Flash 3 lets developers freely download, modify, and use the model weights. 

6. Llama-Compatible Format

Released in a Llama-compatible format, it integrates seamlessly with tools like Hugging Face Transformers and vLLM. 

7. Multilingual Understanding

While primarily trained in English, it demonstrates impressive capabilities in understanding and communicating in other languages.

Comparison With Other AI Models

When tested against other AI models, Reka Flash 3 holds its own really well. It was compared directly with OpenAI’s o1-mini and Alibaba’s QwQ-32B. While QwQ-32B does better on some math tests (AIME’24), Reka Flash 3 matches it on newer tests (AIME’25). Despite the smaller size, it still keeps up with the bigger competitors.

Moreover, Reka Flash 3 is much better than its previous version, Reka Flash 2.5.

How to Use Reka Flash 3

You can try the AI model in a few ways. Right now, the easiest is to visit Reka Space and start chatting with the model. If you’re a developer or researcher, you can download the model weights under the Apache 2.0 license, which lets you modify and use them freely.

Developers will be happy to know that it works with existing Llama-compatible libraries, making it easy to implement. There are two main ways to start using it.

You can use Hugging Face by loading the model and tokenizer from “RekaAI/reka-flash-3” with the right settings. Or you can use vLLM with Docker to serve the model. These options make it accessible to developers so they can quickly add it to their projects.

1. Prompt Formatting

Reka Flash 3 uses the cl100k_base tokenizer without extra special tokens. It follows a specific format for prompts: “human: [prompt] <sep> assistant: [response] <sep>”. It stops generating text when it sees “<sep>” or “<|endoftext|>”.

You can add system prompts by putting them before the first user message. For conversations with multiple turns, it’s best to remove the reasoning traces from previous responses to save tokens. If you’re using Hugging Face or vLLM, the chat_template handles formatting automatically.

2. Quantization

This AI model is great for apps that need to work quickly or run directly on your device. With 21 billion parameters, it’s 35% smaller than QwQ-32B, so it needs less computing power and memory.

The full version needs 39GB of memory (fp16), but you can shrink it down to just 11GB using 4-bit quantization, and it will still work well. Compare that to QwQ-32B, which needs 64GB at bf16 and 18GB with 4-bit quantization.

Real-World Uses

Reka Flash 3’s versatile abilities make it useful for many practical applications. This is a true general-purpose model that performs impressively in:

This versatility makes it suitable for building a wide range of applications without needing multiple specialized models.

Organizations can use it to boost productivity across departments, from marketing content creation to technical support and data analysis, all while keeping control of their AI systems by running them on their own computers if needed.

What’s Next for Reka Flash 3

Reka Flash 3 is just the beginning of Reka AI vision for accessible, powerful AI models. But, despite being impressive, it has some limitations you should know about. Since it’s a smaller model, it’s not the best at answering questions that need lots of specific knowledge. Its score of 65.0 on the MMLU-Pro test is good for its size but shows room for improvement.

Overall, the model shows that efficient AI design can deliver amazing capabilities in a compact package. Its strong performance against larger models proves that smart training and architecture can overcome size limitations. For users who want powerful AI without the resource demands of huge models, Reka Flash 3 is an ideal solution. 

| Latest From Us

Exit mobile version