Large reasoning models (LRMs), like OpenAI’s o1, Qwen-QwQ and DeepSeek-R1, show impressive abilities in performing complex step-by-step reasoning. However, these models frequently confront the challenge of knowledge insufficiency, which can lead to errors and uncertainties during their reasoning processes. To counteract this issue, the team at Renmin University of China has developed an AI framework called Search-o1 to enhance LRMs by integrating an agentic search workflow into their reasoning process.
Table of Contents
Problems With Large Reasoning Models
Large reasoning models use reinforcement learning to handle complex tasks in areas like math, science, and coding. They rely on a “slow thinking” approach, enabling deep, logical reasoning. However, these models often face knowledge gaps, which might result in a high frequency of uncertain terms, such as “perhaps” or “possibly”. As LRMs attempt to reason through complex problems, they often encounter situations where they lack sufficient knowledge. This can cause cascading errors when solving complex problems. For example, experiments show models like OpenAI-o1 can express uncertainty over 30 times in a single reasoning sequence. Search-o1 has the necessary capabilities to address these issues.
Introducing Search-o1 Framework
To address these challenges, the Search-o1 framework introduces a transformative approach that integrates an agentic search workflow into the o1-like reasoning process of LRMs. This framework enables LRMs to dynamically retrieve external knowledge when encountering uncertain points in their reasoning. By using an agentic retrieval-augmented generation mechanism, Search-o1 empowers models to autonomously seek information, thereby minimizing the impact of knowledge gaps on reasoning accuracy. Overall, Search-o1 is the first AI framework that enhances LRMs’ reasoning to achieve autonomous knowledge supplementation.
Core Components of Search-o1
1. Agentic Retrieval-Augmented Generation (RAG)
The agentic RAG component allows the reasoning model to autonomously determine when to initiate a retrieval step during the reasoning process. As the model generates reasoning sequences, it may produce search queries encapsulated within special symbols. Upon detecting these queries, the model pauses its reasoning, retrieves relevant documents, and incorporates the newfound knowledge into its reasoning chain. This iterative retrieval process ensures that the model can continue generating coherent reasoning steps while accessing external knowledge.
2. Reason-in-Documents Module
The Reason-in-Documents module within the Search-o1 framework operates independently from the main reasoning chain and addresses the challenge of redundant and lengthy retrieved documents. When external knowledge is retrieved, the Reason-in-Documents module analyzes the information and condenses it into concise, relevant knowledge. This refined information is then seamlessly integrated into the existing reasoning chain, preserving coherence and enhancing the overall reasoning quality.
3. Knowledge Refinement Process
The knowledge refinement process ensures that the information injected into the reasoning chain is relevant and enhances logical consistency. By analyzing retrieved documents based on previous reasoning steps and the current search query, the framework extracts and refines knowledge that directly contributes to advancing the reasoning process. This ensures that the reasoning model can effectively use external knowledge while maintaining focus on the original question.
Search-o1 Inference Process
Search-o1 uses a batch generation mechanism with an interleaved search strategy. It begins by combining task instructions with input questions to create reasoning sequences. Tokens are generated for all sequences simultaneously, and search queries retrieve relevant documents in batches. These documents are refined and added back into the reasoning chains, repeating the process until completion. This approach enhances accuracy, reliability, and the model’s ability to handle complex reasoning tasks.
Experimental Results
1. Performance Analysis on Complex Reasoning Tasks
The effectiveness of Search-o1 has been evaluated through comprehensive experiments on challenging reasoning tasks, including PhD-level science questions, mathematics, coding benchmarks, and open-domain question-answering tasks. The results consistently demonstrate Search-o1’s superior performance, affirming that its search mechanism effectively meets the knowledge requirements during reasoning. It significantly outperformed traditional reasoning models. This success is attributed to its ability to autonomously retrieve and integrate knowledge, resulting in enhanced accuracy and reliability in responses.
2. Case Studies
Multiple case studies further illustrate the capabilities of Search-o1 AI framwork. For instance, on the GPQA dataset, the model demonstrated remarkable proficiency in handling complex questions by effectively utilizing the RAG mechanism and the Reason-in-Documents module.
Similarly, on the HotpotQA dataset, Search-o1 excelled in both single-hop and multi-hop question-answering tasks, further validating its robustness.
How to Get Started With Search-o1
1. Environment Setup
To utilize Search-o1 effectively, users must first set up their environment. This process begins with creating a new conda environment specifically for Search-o1:
conda create -n search_o1 python=3.9
conda activate search_o1
Once the environment is activated, the next step is to install the required dependencies by navigating to the Search-o1 directory and executing the following command:
cd Search-o1
pip install -r requirements.txt
2. Data Preparation
Users can preprocess their datasets using the provided Jupyter Notebook located in data/data_pre_process.ipynb
. The datasets are categorized into two main types: Challenging Reasoning Tasks and Open-domain QA Tasks. Each dataset requires specific preprocessing steps to convert raw data into the standardized JSON format that Search-o1 utilizes. To prepare your data, load your dataset into the notebook. Then, use the preprocessing cells to convert the raw data into the standardized JSON format required by Search-o1. Finally, save the processed datasets in the data/ directory. If your data doesn’t fit the predefined categories, format it as {'Question': str, 'answer': str}
. Use {'Question': str, 'Correct Choice': str}
for multi-choice tasks.
3. Running Inference with Search-o1
python scripts/run_search_o1.py \
--dataset_name aime \
--split test \
--max_search_limit 5 \
--max_turn 10 \
--top_k 10 \
--max_doc_len 3000 \
--use_jina True \
--model_path "YOUR_MODEL_PATH" \
--jina_api_key "YOUR_JINA_API_KEY" \
--bing_subscription_key "YOUR_BING_SUBSCRIPTION_KEY"
4. Evaluation
Search-o1’s inference scripts automatically save the model’s input and output texts for evaluation. Users can implement a backoff strategy to utilize direct generation results when retrieval methods do not yield conclusive answers. This strategy involves providing the path to the direct generation results in the scripts/evaluate.py file and executing the evaluation command:
python scripts/evaluate.py --output_path outputs/... --apply_backoff
Concluding Remarks
By integrating autonomous knowledge retrieval through the Agentic RAG mechanism and the Reason-in-Documents module, the Search-o1 AI Framework enhances the reliability and applicability of LRMs in complex reasoning tasks. This innovative approach addresses the challenges of knowledge insufficiency and can develop more versatile and reliable intelligent systems. The exceptional results achieved by Search-o1 across a diverse range of challenging reasoning tasks and open-domain QA benchmarks underscores its potential to transform the landscape of artificial intelligence. To learn more technical details, please visit the model’s arXiV paper. You can visit the model project page here.
| Latest From Us
- NoLiMa Reveals LLM Performance Drops Beyond 1K Contexts
- InternVideo2.5, The Model That Sees Smarter in Long Videos
- SYNTHETIC-1 Uses DeepSeek-R1 for Next-Level Base Model Cold Start
- Microsoft Study Reveals How AI is Making You Dumber
- Clone Any Voice in Seconds With Zonos-v0.1 That Actually Sounds Human