Optimization in Artificial Intelligence and Deep Learning — Algorithms, Theory, and Practice

Nikumbha Neha Ravindranath; Kartha Pearly Prasoon

doi:10.47191/ijmcr/v14iSPC3.26

Keywords
Article Content
References
Downloads
Citation Tools

Keywords:-

Keywords: Deep learning, Stochastic Gradient Descent (SGD), Adaptive methods, Adam optimizer, Momentum, Second-order optimization, Natural gradient, K-FAC, Learning-rate schedules, Regularization, Sharpness-aware minimization (SAM), Batch size scaling, Neural network optimization, Weight decay, Practical training recipes.

Article Content:-

Abstract

The core of machine learning and deep learning models relates to optimization. The present paper summarizes the principal categories of optimization algorithms in AI and deep learning and describes the mathematical concepts underpinning them in non-technical language, and relates these mathematical concepts to training recipes and engineering considerations. We discuss first-order stochastic algorithms (SGD and variations), adaptive algorithms (AdaGrad, RMProp, Adam)(4)(5) momentum and acceleration, second-order concepts and approximations (Newton, quasi-Newton, natural gradient, K-FAC), and recent issues like learning-rate schedules, batch normalization behavior, sharp vs. flat minima, generalization, and training large models. Pseudo code, best practice hyper parameters, and practical traps are given to ensure that this paper serves as both a conceptual primer, as well as a practical guide.

References:-

References

Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics.

Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics.

Nesterov, Y. (1983). A method for solving the convex programming problem with convergence rate O(1/k2)O(1/k^2)O(1/k2).

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization (AdaGrad).

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization.

Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization (AdamW).

Keskar, N. S., et al. (2017). On large-batch training for deep learning: Generalization gap and sharp minima.

Zhang, Y., et al. (2020). A brief survey of adaptive optimization methods in deep learning (survey paper).

Foret, Pierre; Kleiner, Ariel; Mobahi, Hossein; Neyshabur, Behnam. Sharpness-Aware Minimization for Efficiently Improving Generalization. arXiv:2010.01412 (2020).

Kwon, Jungmin; Kim, Jeongseop; Park, Hyunseo; Choi, In Kwon. ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks. ICML 2021.

Du, Jiawei; Yan, Hanshu; Feng, Jiashi; Zhou, Joey Tianyi; Zhen, Liangli; Goh, Rick Siow Mong; Tan, Vincent Y. F. Efficient Sharpness-aware Minimization for Improved Training of Neural Networks. arXiv:2110.03141 (2021).

Yue, Yun; Jiang, Jiadi; Ye, Zhiling; Gao, Ning; Liu, Yongchao; Zhang, Ke. Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term (WSAM). KDD 2023.

Yu, Runsheng; Zhang, Youzhi; Kwok, James. Improving Sharpness-Aware Minimization by Lookahead. ICML 2024.

Zhou, Zhanpeng; Wang, Mingze; Mao, Yuchen; Li, Bingrui; Yan, Junchi. Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training. ICLR 2025.

Martens, James; Grosse, Roger. Optimizing Neural Networks with Kronecker-factored Approximate Curvature (K-FAC). arXiv:1503.05671 (2015).

Luk, Kevin; Grosse, Roger. A Coordinate-Free Construction of Scalable Natural Gradient. arXiv:1808.10340 (2018).

Bae, Juhan; Zhang, Guodong; Grosse, Roger. Eigenvalue Corrected Noisy Natural Gradient. arXiv:1811.12565 (2018).

Surianarayanan, Chellammal; Lawrence, John Jeyasekaran; Chelliah, Pethuru Raj; Prakash, Edmond; Hewage, Chaminda. A Survey on Optimization Techniques for Edge Artificial Intelligence (AI). Sensors, 2023.

Lee, Yu-Ang; Yi, Guan-Ting; Liu, Mei-Yi; Lu, Jui-Chao; Yang, Guan-Bo; Chen, Yun-Nung. Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future Directions. arXiv:2506.08234 (2025)

Downloads

Citation Tools

How to Cite

Ravindranath, N., & Prasoon, K. (2026). Optimization in Artificial Intelligence and Deep Learning — Algorithms, Theory, and Practice. International Journal Of Mathematics And Computer Research, 14(03), 123-127. https://doi.org/10.47191/ijmcr/v14iSPC3.26

Download Citation

International Journal of Mathematics And Computer Research

HTML

124

Total

85

Citations

Share

Peer Review*

Title : Optimization in Artificial Intelligence and Deep Learning — Algorithms, Theory, and Practice

Nikumbha Neha Ravindranath

Kartha Pearly Prasoon

Keywords:-

Article Content:-

Abstract

References:-

References

Downloads

Citation Tools

International Journal of Mathematics And Computer Research

HTML124 Total 85 Citations Share Peer Review* Title : Optimization in Artificial Intelligence and Deep Learning — Algorithms, Theory, and Practice

Nikumbha Neha Ravindranath Kartha Pearly Prasoon

Keywords:-

Article Content:-

Abstract

References:-

References

Downloads

Citation Tools

HTML

124

Total

85

Citations

Share

Peer Review*

Title : Optimization in Artificial Intelligence and Deep Learning — Algorithms, Theory, and Practice

Nikumbha Neha Ravindranath

Kartha Pearly Prasoon