A quick overview of ChatGPT’s architecture…
The AI Powering ChatGPT: A clever combination of the InstructGPT architecture with reinforcement learning models.
"The main ideas behind ChatGPT were pioneered by another OpenAI’s , InstructGPT which was released earlier this year. InstructGPT fine tunes GPT to follow instructions which opens the door to a wider set of human interactions . ChatGPT takes some of the ideas pioneered by InstructGPT to a whole new level with a very novel architecture and training process.
Similarly to InstructGPT, the core architecture of ChatGPT relies on a “human-annotated data + reinforcement learning” (RLHF) methods. The main idea of using RLHF is to continuously fine-tine the underlying language model to understand the meaning of human commands. However, ChatGPT includes some differences in the data collection setup by including supervised fine-tuning with human AI trainers for both the user and an AI assistant. The core ChatGPT training process is segmented in three main phases:
Phase 1: Supervised Policy Model
Phase 2: Reward Model Training
Phase 3: Reinforcement Learning Enhancement