Digital Product Studio

DeepSeek-Prover: First LLM That Trained on Synthetic Data That Outperform GPT-4 in Math

Automated theorem proving has come a long way with the help of deep learning models. While AI models like GPT-4 have shown significant promise, their effectiveness remains restricted due to the lack of high-quality training data in formal languages. Researchers at DeepSeek have proposed DeepSeek-Prover, an approach that generates extensive Lean 4 proof data using synthetic data with just an LLM as the starting point. This approach has shown remarkable success and outperforms AI models like GPT-4 in mathematical problem-solving and theorem-proving tasks.

What is DeepSeek-Prover?

DeepSeek-Prover is the key AI system developed by DeepSeek to tackle automated theorem proving through self-supervised learning. It is built using DeepSeekMath-Base 7B, a large language model pre-trained on over 120 billion tokens of mathematical text. However, to excel at formal proof generation, DeepSeek recognized it needed extensive supervised training data, something that was scarce in formal theorem-proving domains.

So, the resulting DeepSeek-Prover acts as a specialized version of DeepSeekMath to automate theorem-proving tasks. Where other AI provers struggle without human data, DeepSeek-Prover shows how powerful models can theoretically teach themselves complex mathematical skills through carefully designed self-supervision methods.

How DeepSeek-Prover Works

The model uses DeepSeekMath to do four main things:

1. Automatic Formalization

It takes informal math word problems from the internet and turns them into formal math statements that a computer can understand.

2. Quality Filtering

Not all automatically formalized statements can be good quality. So, it develops methods to check statement quality and remove poor ones.

3. Statements Proving

For each statement, it tries to prove both the statement and its opposite at the same time. This helps prove wrong statements faster.

4. Fine-Tuning Prover

It proves statements to create training data. Then, it trains DeepSeekMath on this data to make it better at proving. 

5. Repetition

The whole process repeats in cycles to constantly improve the model.

DeepSeek-Prover: An LLM That Trains on Synthetic Data Produced by Another LLM Outperforms GPT-4 in Math

Impressive Performance of DeepSeek-Prover

The researchers evaluated the performance of this model on benchmark datasets. 

1. DeepSeek-Prover vs GPT-4

On the miniF2F benchmark, DeepSeek-Prover proves its superiority over other methods with remarkable cumulative scores. On the miniF2F-valid dataset, the model achieves an impressive score of 60.2%, surpassing other methods by a significant margin. In comparison, GPT-4 only manages to score 25.41%. Similarly, on the miniF2F-test dataset, DeepSeek-Prover shines with a score of 52.0%, outperforming GPT-4, which trails behind at 22.95%. Even the best tree search method, Hypertree Proof Search, with a 600M model, falls short with scores of 58.6% on miniF2F-valid and 41.0% on miniF2F-test. 

DeepSeek-Prover: An LLM That Trains on Synthetic Data Produced by Another LLM Outperforms GPT-4 in Math

Additionally, this model successfully proved 5 out of 148 problems in the Lean 4 Formalized International Mathematical Olympiad (FIMO) benchmark, while GPT-4 failed to prove any.

2. Effectiveness of DeepSeek-Prover in Theorem Proving

The scalability of DeepSeek-Prover is a testament to its effectiveness in handling complex proof scenarios. As computational resources increase, so does its performance. Starting with a modest 30.03% score using a greedy approach, its performance rises to an impressive 50.0% at 65536 generation times. This scalability showcases its power in tackling challenging proof scenarios and further solidifies its position as a leading method in theorem proving.

Concluding Thoughts

In conclusion, DeepSeek-Prover demonstrates that with enough synthetic data, LLMs can surpass even the largest models without any human data at all. By using large-scale synthetic datasets, researchers can overcome data scarcity and enhance the capabilities of LLMs. This will pave the way for more efficient and effective automated theorem proving. Overall, this research study provides a blueprint for using self-supervised learning to scale up specialized AI models. For more details, you can read the full paper here.

| Also Read Latest From Us

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.