The possibility of AI posing a threat to humanity is no longer confined to the realm of science fiction. Prominent figures like Elon Musk and OpenAI CEO Sam Altman have raised concerns about AI safety in recent times. Addressing these concerns, OpenAI, the company behind groundbreaking AI models like ChatGPT and GPT-4, has published two new papers that delve into AI safety and governance.
The first paper focuses on the challenges of aligning AI with human interests, especially as we approach an era where humans will supervise AI systems that are significantly smarter. The paper underscores a critical issue: naive human supervision may not effectively scale to superhuman AI models.
This challenge is exemplified by the concept of ‘weak-to-strong generalization’, where weaker AI models supervise more capable ones. The study reveals that while strong AI models can outperform their weaker supervisors, fully leveraging their capabilities requires more than just naive finetuning. The findings suggest that current techniques, such as reinforcement learning from human feedback (RLHF), may not be sufficient for managing superhuman AI models without additional innovation.
The second paper presents seven key practices for the governance of agentic AI systems – AI capable of pursuing complex goals with minimal supervision. These practices include:
- evaluating AI systems for specific tasks
- requiring human approval for critical decisions
- setting default behaviors
- enhancing the legibility of AI activities and thought processes
- implementing automatic monitoring
- ensuring reliable attribution
- maintaining the ability to deactivate the AI system.
This framework aims to mitigate potential failures, vulnerabilities, and abuses of AI, emphasizing the need for robust governance as AI systems become more autonomous and integrated into society.
One of the most significant concerns highlighted is the risk of superhuman AI models learning to imitate their weak human supervisors, potentially leading to a ‘human simulator’ failure mode. This emphasizes the need for innovative approaches in AI alignment to avoid such scenarios.
These studies by OpenAI mark a critical step in understanding and shaping the future of AI. As AI systems become more advanced and autonomous, the importance of effective supervision and governance cannot be overstated.
These papers not only contribute valuable insights into the complexities of AI safety and governance but also pave the way for future research and development in these crucial areas.
You can find the first paper here and the second paper here.