RATM: Reinforcement Learning For Co-Optimized CPU Scheduling and NUMA Memory Management

Dr. T C Mnajunath; Sreerama M P; Shrivatsa K S; Tanzila Khanam; Nandini Modi

doi:https://www.doi.org/10.59256/indjcst.20250403054

ARCHIVES

Original Article

RATM: Reinforcement Learning For Co-Optimized CPU Scheduling and NUMA Memory Management

Dr. T C Mnajunath¹ Sreerama M P² Shrivatsa K S³ Tanzila Khanam⁴ Nandini Modi⁵

¹ Dean, Department of Computer Science and Engineering, Rajarajeswari College of Engineering, R&D RRCE Bengaluru, Karnataka, India. ² ³ ⁴ ⁵ Department of Computer Science and Engineering, Rajarajeswari College of Engineering, R&D RRCE Bengaluru, Karnataka, India.

Published Online: September-December 2025

Pages: 350-362

Cite this article

↗ https://www.doi.org/10.59256/indjcst.20250403054

Abstract

View PDF

: Modern operating systems rely on static heuristics— carefully tuned at design time — to manage CPU scheduling and memory allocation. These heuristics fundamentally fail under the different dynamically shifting workloads characteristic of contemporary data centers, where batch processing, real-time analytics, and interactive services coexist. This paper presents RATM (Resource-Aware Adaptive Task Manager), a novel "Authoritative Controller" architecture implemented in Rust that replaces static policies with a Deep Q-Network (DQN) reinforcement learning agent capable of optimizing kernel behavior at runtime. Our system introduces a strict Policy-Mechanism Separation, a model-free DQN agent that observes continuous system state and selects actions, while the VRRP (Varying Response Ratio Priority) Scheduler and NAAT (NUMA-Aware Adaptive Tiered) Allocator execute commands as passive, tunable mechanisms. The RATM controller mediates between these layers, enforcing safety invariants and translating abstract actions into concrete API calls. The experimental results demonstrate that our RL-driven kernel achieves over up to 70% reduction in average wait latency in calibration scenarios compared to the static baseline, while maintaining high fairness. The RL agent learns to proactively trigger NUMA page migrations during workload phase transitions, effectively "flattening the curve" of latency spikes that plague traditional schedulers. The entire implementation — including lock-free data structures, atomic metrics collection, and the RL training loop — is realized in safe Rust, leveraging the language's ownership model and `Send`/`Sync` traits to eliminate data races by construction. This work demonstrates that adaptive, learning-based kernel subsystems are not only feasible but can be implemented with the same safety guarantees expected of production operating systems.

Quick Links

Download

Manuscript Template Copyright Form

Policies

Share Article

X

Facebook

Or copy link

https://test.indjcst.com/archives/10.59256/indjcst.20250403054

*Instagram doesn't support direct link sharing from web. Copy the link and share it in your Instagram story or post.

ARCHIVES

RATM: Reinforcement Learning For Co-Optimized CPU Scheduling and NUMA Memory Management

Cite this article

Abstract

Related Articles

Transforming Cyber-Physical Systems: Machine Learning for Secure and Efficient Solutions

Exploring AI Techniques for Quantum Threat Detection and Prevention

Maturity Models for Business Intelligence: An Overview

INSPIRO: An AI Driven Institution Auditor

Adaptive AI Framework for Anomaly Detection and DDoS Mitigation in Distributed Systems

Predictive Modeling for College Admission Using Machine Learning and Statistical Methods

PlumX Metrics

Dimension

Quick Links

Download

Policies

Share Article