Mutual information optimization for mitigating catastrophic forgetting in continual learning: An information-theoretic approach
DOI:
https://doi.org/10.54939/1859-1043.j.mst.106.2025.129-136Keywords:
Continual learning; Catastrophic forgetting; Mutual information; Information theory; Neural networks; Memory replay.Abstract
Continual learning systems encounter the critical challenge of catastrophic forgetting, where neural networks lose previously acquired knowledge when adapting to new tasks. In this paper, we propose Continual Mutual Information Preservation (CMIP), an information-theoretic approach that leverages Mutual Information (MI) optimization and entropy regularization to retain prior knowledge while learning compact and informative latent representations. CMIP uses an auxiliary network to estimate MI and a replay memory, in which each mini-batch comprises 50% current-task samples and 50% samples replayed from previous tasks. Experiments are conducted on the MNIST-Split and CIFAR-100-Split datasets for the class-incremental learning (Class-IL) setting. On MNIST-Split, CMIP achieves 90.97% accuracy with an 8.81% forgetting rate, outperforming EWC (20.64% accuracy, ~77% forgetting) and GEM (65.1% accuracy, ~33% forgetting). This method is applicable to real-world scenarios, such as robotic perception and real-time data streams.
References
[1]. R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in Cognitive Sciences, Vol. 3, No. 4, pp. 128–135, (1999). https://doi.org/10.1016/S1364-6613(99)01294-2
[2]. J. Kirkpatrick et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, Vol. 114, No. 13, pp. 3521–3526, (2017).
[3]. D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,” Advances in Neural Information Processing Systems, Vol. 30, pp. 6467–6476, (2017).
[4]. N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint, physics/0004057, (2000). https://arxiv.org/abs/physics/0004057
[5]. M. I. Belghazi et al., “Mutual Information Neural Estimation,” International Conference on Machine Learning (ICML), (2020). https://arxiv.org/abs/1801.04062
[6]. T. Chen et al., “A simple framework for contrastive learning of visual representations,” Proceedings of the 37th International Conference on Machine Learning, Vol. 119, pp. 1597–1607, (2020).
[7]. Y. Polyanskiy and Y. Wu, “Information theory and deep learning: A modern perspective,” Annual Review of Statistics and Its Application, Vol. 11, pp. 101–125, (2024).
[8]. T. Hospedales et al., “Meta-learning in neural networks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, No. 9, pp. 5149–5169, (2022).
[9]. Z. Mai et al., “Online Continual Learning in Image Classification: An Empirical Survey,” Neurocomputing, Vol. 512, pp. 177–196, (2022). https://doi.org/10.1016/j.neucom.2021.8.811
[10]. G. M. van de Ven, T. Tuytelaars, and A. S. Tolias, “Three types of incremental learning,” Nature Machine Intelligence, (2022). https://doi.org/10.1038/s42256-022-00568-3