The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

The emergence of large language models (LLMs) has sparkedthe discussion on Artifi cial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence. Though ASI is still hypothetical and far from current AI capabilities, existing alignment methods struggle to guide such advanced AI ensure its safety in the future. It is essential to dis cuss the alignment of such AI now. Super alignment, the alignment of AI at superhuman levels of capability systems with human val ues and safety requirements, aims to address two primary goals: scalability in supervision to provide high-quality guidance signals and robust governance to ensure alignment with human values. In this survey, we review the original scalable oversight problem and corre sponding methods and potential solutions for superalignment. Specifically, we introduce the challenges and limitations of current alignment paradigms in addressing the superalignment problem. Then we review scalable oversight methods for superalignment. Finally, we dis cuss the key challenges and propose pathways for the safe and continual improvement of fu ture AI systems. By comprehensively review ing the current literature, our goal is provide a systematical introduction of existing methods, analyze their strengths and limitations, and dis cuss potential future directions.

[Read Paper]