ARCHIVES

Original Article

RATM: Reinforcement Learning For Co-Optimized CPU Scheduling and NUMA Memory Management

Dr. T C Mnajunath1 Sreerama M P2 Shrivatsa K S3 Tanzila Khanam4 Nandini Modi5
1 Dean, Department of Computer Science and Engineering, Rajarajeswari College of Engineering, R&D RRCE Bengaluru, Karnataka, India. 2 3 4 5 Department of Computer Science and Engineering, Rajarajeswari College of Engineering, R&D RRCE Bengaluru, Karnataka, India.

Published Online: September-December 2025

Pages: 350-362

Abstract

: Modern operating systems rely on static heuristics— carefully tuned at design time — to manage CPU scheduling and memory allocation. These heuristics fundamentally fail under the different dynamically shifting workloads characteristic of contemporary data centers, where batch processing, real-time analytics, and interactive services coexist. This paper presents RATM (Resource-Aware Adaptive Task Manager), a novel "Authoritative Controller" architecture implemented in Rust that replaces static policies with a Deep Q-Network (DQN) reinforcement learning agent capable of optimizing kernel behavior at runtime. Our system introduces a strict Policy-Mechanism Separation, a model-free DQN agent that observes continuous system state and selects actions, while the VRRP (Varying Response Ratio Priority) Scheduler and NAAT (NUMA-Aware Adaptive Tiered) Allocator execute commands as passive, tunable mechanisms. The RATM controller mediates between these layers, enforcing safety invariants and translating abstract actions into concrete API calls. The experimental results demonstrate that our RL-driven kernel achieves over up to 70% reduction in average wait latency in calibration scenarios compared to the static baseline, while maintaining high fairness. The RL agent learns to proactively trigger NUMA page migrations during workload phase transitions, effectively "flattening the curve" of latency spikes that plague traditional schedulers. The entire implementation — including lock-free data structures, atomic metrics collection, and the RL training loop — is realized in safe Rust, leveraging the language's ownership model and `Send`/`Sync` traits to eliminate data races by construction. This work demonstrates that adaptive, learning-based kernel subsystems are not only feasible but can be implemented with the same safety guarantees expected of production operating systems.

Related Articles

2025

Transforming Cyber-Physical Systems: Machine Learning for Secure and Efficient Solutions

2025

Exploring AI Techniques for Quantum Threat Detection and Prevention

2025

Maturity Models for Business Intelligence: An Overview

2025

INSPIRO: An AI Driven Institution Auditor

2025

Adaptive AI Framework for Anomaly Detection and DDoS Mitigation in Distributed Systems

2025

Predictive Modeling for College Admission Using Machine Learning and Statistical Methods

Share Article

X
LinkedIn
Facebook
WhatsApp

Or copy link

https://test.indjcst.com/archives/10.59256/indjcst.20250403054

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.