Evaluating Machine Learning-Based Reinforcement Algorithms for Inverted Pendulum Stabilization

N. Venkateswaran; A V K Shanthi; M.Revathi; D.Shyamprakash; S. Muthuselvan; V G Pratheep; K.R.Prasanna Kumar; Shailendra Kumar Bohidar

doi:10.24237/djes.2026.19106

Authors

N. Venkateswaran Department of Master of Business Administration, Panimalar Engineering College, Chennai, Tamil Nadu 600123, India.
A V K Shanthi Department of Computer Science, Vel Tech Ranga Sanku Arts College, Chennai, Tamil Nadu 600062, India
M.Revathi Department of Artificial Intelligence and Data Science, St. Joseph’s Institute of Technology, Chennai, Tamil Nadu 600119, India
D.Shyamprakash Department of Computer Science and Business Systems, Sri Eshwar College of Engineering, Coimbatore, Tamil Nadu 641202, India
S. Muthuselvan Department of Information Technology, KCG College of Technology, Chennai, Tamil Nadu 600097, India
V G Pratheep Department of Electrical and Electronics Engineering, Velalar College of Engineering and Technology, Erode, Tamil Nadu 638012, India
K.R.Prasanna Kumar Department of Computer Science and Design, Kongu Engineering College, Erode,
Shailendra Kumar Bohidar Department of Mechanical Engineering, School of Engineering & I.T., MATS University, Raipur, Chhattisgarh 493441, India

DOI:

https://doi.org/10.24237/djes.2026.19106

Keywords:

Reinforcement Learning, Q-Learning, Hill Climbing, REINFORCE, Deep Q-Network (DQN)

Abstract

This work is based on classical and modern research on reinforcement learning, utilizing the CartPole-v0 environment for Q-learning, Hill Climbing, the REINFORCE algorithm, various versions of the Deep-Q-Network algorithm, and policy gradient approaches such as PPO and A2C. All the above models have been implemented and compared in terms of convergence speed, stability, efficiency, and robustness in noisy environments, as well as their generality in modified environments. In the experiments, the optimization of the hyperparameters and other algorithmic modifications, such as the Double DQN, Dueling DQN, and Reward Shaping, have also been attempted. When compared with Classical Control approaches like the Linear Quadratic Regulator, the flexibility, stability, and efficiency of Deep RL approaches prove to be far better. The use of reward curves, policy heatmaps, and trajectory plots has proved beneficial as a way of ensuring the algorithm's interpretability. The results demonstrated that the reinforcement learning algorithms PPO and Dueling DQN are the most efficient methods with good convergence rates and stability, while providing insights on computational efficiency. This research created a hybrid framework that combines the use of Hill Climbing and reinforcement learning as a way of improving stability while ensuring better efficiency through the introduction of stability reward shaping techniques, which proved to be better than the use of classical reinforcement learning and control approaches.

Downloads

Download data is not yet available.

References

[1] G. Rigatos, M. Abbaszadeh, P. Siano, G. Cuccurullo, J. Pomares, and B. Sari, "Nonlinear Optimal Control for the Rotary Double Inverted Pendulum," Advances in Control Applications, vol. 6, e140, 2024. https://doi.org/10.1002/adc2.140

[2] Y. Zheng, X. Li, and L. Xu, "Balance control for the first-order inverted pendulum based on the advantage actor-critic algorithm," International Journal of Control, Automation and Systems, vol. 18, pp. 1867–1876, 2020. https://doi.org/10.1007/s12555-019-0268-x

[3] S. Israilov, L. Fu, J. Sánchez Rodríguez, F. Fusco, G. Allibert, C. Raufaste, and M. Argentina, "Reinforcement learning approach to control an inverted pendulum: a general framework for educational purposes," PLoS ONE, vol. 18, e0280071, 2023. https://doi.org/10.1371/journal.pone.0280071

[4] P. Manzl, O. Rogov, J. Gerstmayr, K. Ayaz, S. Koppelstätter, and A. Schirrer, "Reliability evaluation of reinforcement learning methods for mechanical systems with increasing complexity," Multibody System Dynamics, vol. 64, pp. 335–359, 2025. https://doi.org/10.1007/s11044-024-10024-z

[5] D. Ju, J. Lee, and Y. S. Lee, "Sim-to-Real Reinforcement Learning for a Rotary Double-Inverted Pendulum Based on a Mathematical Model," Mathematics, vol. 13, no. 1996, 2025. https://doi.org/10.3390/math13121996

[6] A. Rai, B. Bhushan, and B. Jaint, "Stabilization and performance analysis of double link–rotary inverted pendulum using LQR-I controller," in Proc. 3rd IEEE Int. Conf. Industrial Electronics: Developments & Applications (ICIDeA), Bhubaneswar, India, Feb 2025, pp. 1–5. https://doi.org/10.1109/ICIDeA64098.2025.10915652

[7] R. Hernández, R. García-Hernández, and F. Jurado, "Stabilization of a two-wheeled self-balancing robot using reinforcement learning," in Proc. XXVI Robotics Mexican Congress (COMRob), Nov 2024, pp. 117–122. https://doi.org/10.1109/COMRob64075.2024.10825701

[8] V. Wiberg, E. Wallin, Å. Fälldin, T. Semberg, M. Rossander, E. Wadbro, and M. Servin, "Sim-to-real transfer of active suspension control using deep reinforcement learning," Robotics and Autonomous Systems, vol. 179, Art. no. 104731, 2024. https://doi.org/10.1016/j.robot.2024.104731

[9] N. Chukwurah, A. S. Adebayo, and O. O. Ajayi, "Sim-to-real transfer in robotics: Addressing the gap between simulation and real-world performance," International Journal of Robotics and Simulation, vol. 6, pp. 89–102, 2024. https://doi.org/10.5121/ijrs.2024.6101

[10] C. Lei, R. Li, and Q. Zhu, "Design and stability analysis of semi-implicit cascaded proportional–derivative controller for underactuated cart–pole inverted pendulum system," Robotica, vol. 42, pp. 87–117, 2024. https://doi.org/10.1017/S026357472300138X

[11] W. Hu, Y. Yang, and Z. Liu, "Deep deterministic policy gradient agent-based sliding mode control for quadrotor attitudes," Drones, vol. 8, Art. no. 95, 2024. https://doi.org/10.3390/drones8030095

[12] Y. Oh, T. Lee, S. Ryoo, J. Baek, and Y. S. Lee, "Reinforcement learning to achieve real-time control of a quadruple inverted pendulum," International Journal of Control, Automation and Systems, vol. 23, pp. 2797–2806, 2025. https://doi.org/10.1007/s12555-024-0511-9

[13] V. S. Donge, B. Lian, F. L. Lewis, and A. Davoudi, "Data-efficient reinforcement learning for complex nonlinear systems," IEEE Transactions on Cybernetics, vol. 54, no. 3, pp. 1391–1402, 2024. https://doi.org/10.1109/TCYB.2023.3275012

[14] J. Park, J. Lee, J. Kim, and S. Han, "Overcoming intermittent instability in reinforcement learning via gradient norm preservation," Information Sciences, vol. 709, Art. no. 122081, 2025. https://doi.org/10.1016/j.ins.2024.122081

[15] Z. Ben Hazem, "Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system," Discover Applied Sciences, vol. 6, no. 49, 2024. https://doi.org/10.1007/s42452-024-05689-5

[16] R. Hernandez, R. Garcia-Hernandez, and F. Jurado, "Modeling, Simulation, and Control of a Rotary Inverted Pendulum: A Reinforcement Learning-Based Control Approach," Modelling, vol. 5, pp. 1824–1852, 2024. https://doi.org/10.3390/modelling5040103

[17] J. Baek, C. Lee, Y. S. Lee, S. Jeon, and S. Han, "Reinforcement learning to achieve real-time control of triple inverted pendulum," Engineering Applications of Artificial Intelligence, vol. 128, Art. no. 107518, 2024. https://doi.org/10.1016/j.engappai.2023.107518

[18] T. Lee, D. Ju, and Y. S. Lee, "Transition Control of a Double-Inverted Pendulum System Using Sim2Real Reinforcement Learning," Machines, vol. 13, no. 186, 2025. https://doi.org/10.3390/machines13020186

[19] T.-N. Ho and V.-D.-H. Nguyen, "Model-Free Swing-Up and Balance Control of a Rotary Inverted Pendulum Using the TD3 Algorithm: Simulation and Experiments," Engineering, Technology & Applied Science Research, vol. 15, pp. 19316–19323, 2025. https://doi.org/10.48084/etasr.8412

[20] X. Bajrami, A. Pajaziti, R. Likaj, A. Shala, R. Berisha, and M. Bruqi, "Control theory application for swing up and stabilisation of rotating inverted pendulum," Symmetry, vol. 13, no. 8, p. 1491, 2021. https://doi.org/10.3390/sym13081491

[21] X. Bajrami, A. Shala, R. Likaj, D. Krasniqi, and E. Shala, "Utilizing linear quadratic regulator and model predictive control for optimizing the suspension of a quarter car vehicle in response to road excitation," Journal of Theoretical and Applied Mechanics, vol. 63, no. 1, pp. 75–89, 2025. https://doi.org/10.15632/jtam-pl/194411

[22] R. S. Bhourji, S. Mozaffari, and S. Alirezaee, "Reinforcement learning DDPG–PPO agent-based control system for rotary inverted pendulum," Arabian Journal for Science and Engineering, vol. 49, no. 2, pp. 1683–1696, 2024. https://doi.org/10.1007/s13369-023-08035-7

[23] J. Quer and E. Ribera Borrell, "Connecting stochastic optimal control and reinforcement learning," Journal of Mathematical Physics, vol. 65, no. 8, Art. no. 083512, 2024. https://doi.org/10.1063/5.0189012

[24] Z. Zhang, Z. Mo, Y. Chen, and J. Huang, "Reinforcement learning behavioral control for nonlinear autonomous system," IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 9, pp. 1561–1573, 2022. https://doi.org/10.1109/JAS.2022.105812

[25] S. Baek, J. Baek, J. Choi, and S. Han, "A reinforcement learning-based adaptive time-delay control and its application to robot manipulators," in Proc. 2022 Amer. Control Conf. (ACC), 2022, pp. 2722–2729. https://doi.org/10.23919/ACC53348.2022.9867200

Evaluating Machine Learning-Based Reinforcement Algorithms for Inverted Pendulum Stabilization

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

SidebarMenu

Information

Indexing

Keywords

Statistics