Keywords:-

Keywords: Operating System Optimization; Deep Learning; GPU Scheduling; Tensor-Aware Memory Management; Heterogeneous Computing; AI Workloads; Kernel-Level Performance

Article Content:-

Abstract

Deep learning has rapidly reshaped modern computing, but it has also revealed serious gaps in how traditional operating systems (OS) manage today’s hardware. Most operating systems were built with CPU-focused workloads in mind, and as a result, they fall short when handling the growing mix of GPUs, TPUs, and NPUs that power current AI models. These systems often struggle with coordinating different accelerators, managing separate memory spaces, and keeping up with the fast and frequent data movement required by deep learning tasks. This mismatch leads to delays, inefficient scheduling, high context-switch overhead, and poor utilization of powerful hardware that should ideally boost performance.

To overcome these challenges, this paper introduces an OS-level optimization framework designed specifically for AI workloads. The proposed approach adds accelerator-aware scheduling and tensor-aware memory management directly into the kernel, enabling the OS to better understand and respond to the unique demands of deep learning applications. By improving data locality, predicting memory usage patterns, and balancing tasks across heterogeneous devices, the system ensures smoother and more coordinated execution.

Experiments conducted using TensorFlow and PyTorch show clear improvements: up to 25% higher throughput, 18% shorter training time, and a 36% reduction in context-switch latency for models such as ResNet-50 and BERT. These gains also translate into lower energy consumption, as accelerators spend less time idling and more time doing useful work. Overall, the findings highlight that making the OS “AI-aware” is not just beneficial—it is essential for achieving the performance, efficiency, and scalability required by modern deep learning systems and the next generation of intelligent computing environments.

References:-

References

1. G. Chen, H. Li, and X. Zhou, “Operating system support for deep learning workloads,” ACM SIGOPS Operating Systems Review, vol. 57, no. 3, pp. 45–58, 2023.
2. Y. He, Z. Wang, and J. Chen, “Efficient resource management for AI accelerators in heterogeneous systems,” IEEE Transactions on Computers, vol. 73, no. 1, pp. 120–133, Jan. 2024.
3. D. Li, S. Patel, and M. Zhang, “Tensor-aware memory management in deep learning systems,” in Proc. USENIX Annu. Tech. Conf. (ATC), 2023, pp. 421–435.
4. NVIDIA Corporation, “Multi-Process Service (MPS) for GPU scheduling,” NVIDIA Developer Documentation, 2023.
5. Google Research, “TPU system architecture and performance optimization,” Google Cloud Technical Report, 2023.
6. K. Hwang and J. Dongarra, Distributed and Cloud Computing: From Parallel Processing to the Internet of Things, 3rd ed. Amsterdam, Netherlands: Elsevier, 2021.
7. M. B. Taylor, “Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse,” in Proc. 49th Annu. IEEE/ACM Int. Symp. Microarchitecture, 2022.
8. S. Narayanan and P. Jain, “Energy-efficient scheduling techniques for deep neural network accelerators,” IEEE Access, vol. 11, pp. 15812–15825, Feb. 2023.
9. J. Dean and S. Ghemawat, “Machine learning systems and operating system co-design,” Commun. ACM, vol. 66, no. 4, pp. 82–91, Apr. 2023.
10. R. Krishnan and T. Singh, “Kernel-level optimization for AI workloads in edge devices,” J. Syst. Archit., vol. 140, no. 3, pp. 101–114, 2024.
11. Banerjee, L. Xu, and R. Gupta, “Heterogeneous compute scheduling for next-generation AI platforms,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), 2023, pp. 689–700.
12. H. Kim and S. Park, “Unified memory architectures for accelerator-driven machine learning,” ACM Trans. Archit. Code Optim., vol. 20, no. 2, pp. 1–22, 2023.
13. P. R. Fernando and G. Venkatesh, “Cross-device synchronization techniques for deep learning operating systems,” IEEE Trans. Parallel Distrib. Syst., vol. 35, no. 1, pp. 55–68, 2024.
14. L. Thompson and R. Elmore, “OS-level enhancements for high-bandwidth AI workloads in data centers,” in Proc. ACM/IEEE Symp. High-Perform. Interconnects, 2023, pp. 133–142.
15. S. Kumar and A. Deshmukh, “Adaptive kernel scheduling for GPU-intensive neural models,” J. Comput. Syst. Eng., vol. 119, no. 2, pp. 245–259, 2024.

Downloads

Citation Tools

How to Cite
Pawar, S., & Deobhankar, P. (2026). AI-Aware Operating System Architecture: Redefining Scheduling and Memory Management. International Journal Of Mathematics And Computer Research, 14(03), 96-100. https://doi.org/10.47191/ijmcr/v14iSPC3.20