Synthetic Data Revolutionizes Math Reasoning in Large Language Models
The impending scarcity of high-quality internet data has prompted researchers to explore alternative training methods for large language models (LLMs). One promising approach is the use of synthetic data, which can be generated in abundance and tailored to specific tasks. However, the quality and integrity of synthetic data are crucial factors in determining the performance of LLMs.
Illustration of math reasoning capabilities in LLMs
Researchers from Carnegie Mellon University, Google DeepMind, and MultiOn have conducted a study to investigate the impact of synthetic data on LLM math reasoning capabilities. The study examines both positive and negative synthetic data, finding that positive data improves performance but with slower scaling rates than pretraining. Notably, self-generated positive responses often match the effectiveness of twice the amount of data from larger models.
The Challenge of Synthetic Data
The core challenge lies in designing synthetic data that effectively addresses data scarcity without compromising the quality and integrity of the resulting models. This task is particularly daunting due to the current lack of understanding regarding how synthetic data influences LLM behavior.
Mitigating Biases and Amplification
To mitigate these issues, researchers are investigating the use of negative model-generated responses to identify and unlearn problematic patterns in training data. This approach has shown promise in scaling efficiency up to eight times compared to using only positive data.
The Study’s Findings
The study develops scaling laws for both data types on common reasoning benchmarks, offering valuable insights into optimizing synthetic data use for enhancing LLM performance in math reasoning tasks. The detailed architecture of the proposed method involves several key components, including a synthetic data pipeline, dataset construction, and learning algorithms.
Illustration of synthetic data pipeline
The study reveals significant insights into synthetic data scaling for LLM math reasoning. Positive data scaling shows improvement but with slower rates than pre-training. Surprisingly, self-generated positive data outperforms data from more capable models, doubling efficiency. The most striking result comes from strategically using negative data with per-step Direct Preference Optimization, which increases data efficiency by 8x compared to positive data alone.
Conclusion
This study explores the impact of synthetic data on improving LLMs’ math reasoning capabilities. It reveals that traditional methods using positive solutions from advanced models show limited efficiency. Self-generated positive data improves efficiency by 2x but can amplify reliance on spurious steps. Surprisingly, incorporating negative (incorrect) traces addresses these limitations. By using negative data to estimate step-wise advantages and applying reinforcement learning techniques, the research demonstrates an 8x improvement in synthetic data efficiency.
Illustration of math reasoning improvement with synthetic data
This approach, utilizing preference optimization objectives, significantly enhances LLMs’ mathematical reasoning abilities by effectively balancing positive and negative synthetic data.