Adaptive Fault Tolerance in Machine Learning Systems: A Self-Healing Framework

Faiza Fathima; Adwaith Raj; Jojo George; Nandana Babu; Ashmila K.P; Dr. S. Vadhana Kumari

doi:https://www.doi.org/10.59256/indjcst.20260501030

ARCHIVES

Original Article

Adaptive Fault Tolerance in Machine Learning Systems: A Self-Healing Framework

Faiza Fathima¹ Adwaith Raj² Jojo George³ Nandana Babu⁴ Ashmila K.P⁵ Dr. S. Vadhana Kumari⁶

¹ ² ³ ⁴ ⁵ ⁶ Computer Science and Engineering and Business Systems, Vimal Jyothi Engineering College, Kannur, Kerala, India.

Published Online: January-April 2026

Pages: 214-220

Cite this article

↗ https://www.doi.org/10.59256/indjcst.20260501030

Abstract

View PDF

This project presents a plugin-based self-healing machine learning framework aimed at improving the reliability and robustness of deployed ML models in dynamic real-world environments. The framework autonomously detects, classifies, and mitigates runtime errors by continuously monitoring model behavior using reliability signals such as data drift, prediction confidence, entropy, and label consistency. Detected anomalies are semantically classified into error types including data drift, overconfidence, and label noise, enabling a policy-driven healing mechanism to select appropriate corrective actions such as safe retraining or protective blocking. Safety guards and validation checks ensure that learning and adaptation occur only under reliable conditions, preventing harmful self-updates. Through a gated feedback loop and selective learning strategy, the framework maintains long-term model stability while reducing performance degradation. Its modular, plugin- based design allows seamless integration with existing machine learning models without modifying core model logic, thereby minimizing human intervention and providing a practical approach to fault-tolerant machine learning systems suitable for real-world deployment.

Quick Links

Download

Manuscript Template Copyright Form

Policies

Share Article

X

Facebook

Or copy link

https://test.indjcst.com/archives/10.59256/indjcst.20260501030

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.

ARCHIVES

Adaptive Fault Tolerance in Machine Learning Systems: A Self-Healing Framework

Cite this article

Abstract

Related Articles

Artificial Intelligence in Learning and Teaching

Admin Assist: An AI – Driven Configuration and Orchestration for Enterprise Application

Enhancing Blood Group Identification using pigeon inspired optimization: An Innovative Approach

Eco-Genius: Power Up Smart, Power Down Waste

Crowd-Sourced Disaster Response and Rescue Assistant

Unveiling Deepfake Detection Using Vision Transformers: A Survey and Experimental Study

PlumX Metrics

Dimension

Quick Links

Download

Policies

Share Article