20 December, 2022 09:51

A quick overview of ChatGPT’s architecture…

The AI Powering ChatGPT: A clever combination of the InstructGPT architecture with reinforcement learning models.

"The main ideas behind ChatGPT were pioneered by another OpenAI’s , InstructGPT which was released earlier this year. InstructGPT fine tunes GPT to follow instructions which opens the door to a wider set of human interactions . ChatGPT takes some of the ideas pioneered by InstructGPT to a whole new level with a very novel architecture and training process.

Similarly to InstructGPT, the core architecture of ChatGPT relies on a “human-annotated data + reinforcement learning” (RLHF) methods. The main idea of using RLHF is to continuously fine-tine the underlying language model to understand the meaning of human commands. However, ChatGPT includes some differences in the data collection setup by including supervised fine-tuning with human AI trainers for both the user and an AI assistant. The core ChatGPT training process is segmented in three main phases:

Phase 1: Supervised Policy Model
Phase 2: Reward Model Training
Phase 3: Reinforcement Learning Enhancement

https://jrodthoughts.medium.com/the-ai-powering-chatgpt-68968d452d79


Publié

dans

par

Étiquettes :