Catastrophic Forgetting in LLMs
Recent research has shed light on a critical challenge facing Large Language Models (LLMs): the phenomenon known as catastrophic forgetting (CF). This issue, also referred to as model drift, describes the tendency of LLMs to lose previously acquired knowledge as they assimilate new information.
Recent Studies Highlight Performance Degradation
Two recent publications have brought attention to a concerning trend in LLMs: not only do these models exhibit drift, but they also experience a decline in performance over time. This revelation has significant implications for Generative Applications (Gen-Apps) and LLM-based Conversational UIs, which rely heavily on the stability and consistency of their underlying models.
The Non-Deterministic Nature of LLMs
While the non-deterministic behavior of LLMsâproducing varied outputs for identical inputsâis well-known, recent studies have demonstrated that models undergo more substantial changes over time. Contrary to expectations of improvement, these changes often result in performance degradation.
Defining Catastrophic Forgetting
Catastrophic forgetting refers to the LLMs' propensity to lose or forget previously learned information when trained on new data or fine-tuned for specific tasks. This phenomenon likely stems from limitations in the training process, which tends to prioritize recent data or tasks at the expense of earlier knowledge.
Evaluating Model Drift: GPT-3.5 and GPT-4
A comparative study conducted in March and June 2023 on GPT-3.5 and GPT-4 revealed significant variations in performance and behavior across diverse tasks. Notable findings include:
Chain-of-Thought Prompting and Model Performance
The study highlighted changes in the models' ability to leverage Chain-of-Thought (CoT) prompting:
Recommended by LinkedIn
The schematic below shows the fluctuation in model accuracy over a period of four months. In some cases the deprecation is quite stark, being more than 60% loss in accuracy.
Long-term Implications and Mitigation Strategies
The research underscores the necessity for continuous monitoring of LLMs due to their evolving behavior. Evidence suggests that GPT-4's ability to follow user instructions decreased over time, contributing to behavioural drift.
Conclusion: The Path Forward
The study on catastrophic forgetting during continual fine-tuning of LLMs revealed that CF is a pervasive issue, with larger models experiencing more severe forgetting in domain knowledge, reasoning, and reading comprehension. However, the research also indicates that instruction tuning may offer a potential strategy to mitigate the CF problem, opening avenues for future improvements in LLM stability and performance.
If you are an AI enthusiast who likes to read and learn more about nuances in the field of AI or venturing into this career field of AI, Data Science, Machine Learning and Generative AI, then this newsletter is for you. Subscribe to this newsletter and YouTube channel AccelerateAICareers to stay tuned for new content. Share it with your network if you like this edition of the newsletter!
Data Analyst | Data Science | SQL | Python | Power BI | Tableau | Data Modelling | Excel | Problem Solver | Lifelong Learner
3moMind opening...
Product @ Amazon | ð Follow for insights to accelerate your Product Management Career
3moThanks for sharing