Evaluating Machine Learning-Based Reinforcement Algorithms for Inverted Pendulum Stabilization
DOI:
https://doi.org/10.24237/djes.2026.19106Keywords:
Reinforcement Learning, Q-Learning, Hill Climbing, REINFORCE, Deep Q-Network (DQN)Abstract
This work is based on classical and modern research on reinforcement learning, utilizing the CartPole-v0 environment for Q-learning, Hill Climbing, the REINFORCE algorithm, various versions of the Deep-Q-Network algorithm, and policy gradient approaches such as PPO and A2C. All the above models have been implemented and compared in terms of convergence speed, stability, efficiency, and robustness in noisy environments, as well as their generality in modified environments. In the experiments, the optimization of the hyperparameters and other algorithmic modifications, such as the Double DQN, Dueling DQN, and Reward Shaping, have also been attempted. When compared with Classical Control approaches like the Linear Quadratic Regulator, the flexibility, stability, and efficiency of Deep RL approaches prove to be far better. The use of reward curves, policy heatmaps, and trajectory plots has proved beneficial as a way of ensuring the algorithm's interpretability. The results demonstrated that the reinforcement learning algorithms PPO and Dueling DQN are the most efficient methods with good convergence rates and stability, while providing insights on computational efficiency. This research created a hybrid framework that combines the use of Hill Climbing and reinforcement learning as a way of improving stability while ensuring better efficiency through the introduction of stability reward shaping techniques, which proved to be better than the use of classical reinforcement learning and control approaches.
Downloads
References
[1] G. Rigatos, M. Abbaszadeh, P. Siano, G. Cuccurullo, J. Pomares, and B. Sari, "Nonlinear Optimal Control for the Rotary Double Inverted Pendulum," Advances in Control Applications, vol. 6, e140, 2024. https://doi.org/10.1002/adc2.140
[2] Y. Zheng, X. Li, and L. Xu, "Balance control for the first-order inverted pendulum based on the advantage actor-critic algorithm," International Journal of Control, Automation and Systems, vol. 18, pp. 1867–1876, 2020. https://doi.org/10.1007/s12555-019-0268-x
[3] S. Israilov, L. Fu, J. Sánchez Rodríguez, F. Fusco, G. Allibert, C. Raufaste, and M. Argentina, "Reinforcement learning approach to control an inverted pendulum: a general framework for educational purposes," PLoS ONE, vol. 18, e0280071, 2023. https://doi.org/10.1371/journal.pone.0280071
[4] P. Manzl, O. Rogov, J. Gerstmayr, K. Ayaz, S. Koppelstätter, and A. Schirrer, "Reliability evaluation of reinforcement learning methods for mechanical systems with increasing complexity," Multibody System Dynamics, vol. 64, pp. 335–359, 2025. https://doi.org/10.1007/s11044-024-10024-z
[5] D. Ju, J. Lee, and Y. S. Lee, "Sim-to-Real Reinforcement Learning for a Rotary Double-Inverted Pendulum Based on a Mathematical Model," Mathematics, vol. 13, no. 1996, 2025. https://doi.org/10.3390/math13121996
[6] A. Rai, B. Bhushan, and B. Jaint, "Stabilization and performance analysis of double link–rotary inverted pendulum using LQR-I controller," in Proc. 3rd IEEE Int. Conf. Industrial Electronics: Developments & Applications (ICIDeA), Bhubaneswar, India, Feb 2025, pp. 1–5. https://doi.org/10.1109/ICIDeA64098.2025.10915652
[7] R. Hernández, R. García-Hernández, and F. Jurado, "Stabilization of a two-wheeled self-balancing robot using reinforcement learning," in Proc. XXVI Robotics Mexican Congress (COMRob), Nov 2024, pp. 117–122. https://doi.org/10.1109/COMRob64075.2024.10825701
[8] V. Wiberg, E. Wallin, Å. Fälldin, T. Semberg, M. Rossander, E. Wadbro, and M. Servin, "Sim-to-real transfer of active suspension control using deep reinforcement learning," Robotics and Autonomous Systems, vol. 179, Art. no. 104731, 2024. https://doi.org/10.1016/j.robot.2024.104731
[9] N. Chukwurah, A. S. Adebayo, and O. O. Ajayi, "Sim-to-real transfer in robotics: Addressing the gap between simulation and real-world performance," International Journal of Robotics and Simulation, vol. 6, pp. 89–102, 2024. https://doi.org/10.5121/ijrs.2024.6101
[10] C. Lei, R. Li, and Q. Zhu, "Design and stability analysis of semi-implicit cascaded proportional–derivative controller for underactuated cart–pole inverted pendulum system," Robotica, vol. 42, pp. 87–117, 2024. https://doi.org/10.1017/S026357472300138X
[11] W. Hu, Y. Yang, and Z. Liu, "Deep deterministic policy gradient agent-based sliding mode control for quadrotor attitudes," Drones, vol. 8, Art. no. 95, 2024. https://doi.org/10.3390/drones8030095
[12] Y. Oh, T. Lee, S. Ryoo, J. Baek, and Y. S. Lee, "Reinforcement learning to achieve real-time control of a quadruple inverted pendulum," International Journal of Control, Automation and Systems, vol. 23, pp. 2797–2806, 2025. https://doi.org/10.1007/s12555-024-0511-9
[13] V. S. Donge, B. Lian, F. L. Lewis, and A. Davoudi, "Data-efficient reinforcement learning for complex nonlinear systems," IEEE Transactions on Cybernetics, vol. 54, no. 3, pp. 1391–1402, 2024. https://doi.org/10.1109/TCYB.2023.3275012
[14] J. Park, J. Lee, J. Kim, and S. Han, "Overcoming intermittent instability in reinforcement learning via gradient norm preservation," Information Sciences, vol. 709, Art. no. 122081, 2025. https://doi.org/10.1016/j.ins.2024.122081
[15] Z. Ben Hazem, "Study of Q-learning and deep Q-network learning control for a rotary inverted pendulum system," Discover Applied Sciences, vol. 6, no. 49, 2024. https://doi.org/10.1007/s42452-024-05689-5
[16] R. Hernandez, R. Garcia-Hernandez, and F. Jurado, "Modeling, Simulation, and Control of a Rotary Inverted Pendulum: A Reinforcement Learning-Based Control Approach," Modelling, vol. 5, pp. 1824–1852, 2024. https://doi.org/10.3390/modelling5040103
[17] J. Baek, C. Lee, Y. S. Lee, S. Jeon, and S. Han, "Reinforcement learning to achieve real-time control of triple inverted pendulum," Engineering Applications of Artificial Intelligence, vol. 128, Art. no. 107518, 2024. https://doi.org/10.1016/j.engappai.2023.107518
[18] T. Lee, D. Ju, and Y. S. Lee, "Transition Control of a Double-Inverted Pendulum System Using Sim2Real Reinforcement Learning," Machines, vol. 13, no. 186, 2025. https://doi.org/10.3390/machines13020186
[19] T.-N. Ho and V.-D.-H. Nguyen, "Model-Free Swing-Up and Balance Control of a Rotary Inverted Pendulum Using the TD3 Algorithm: Simulation and Experiments," Engineering, Technology & Applied Science Research, vol. 15, pp. 19316–19323, 2025. https://doi.org/10.48084/etasr.8412
[20] X. Bajrami, A. Pajaziti, R. Likaj, A. Shala, R. Berisha, and M. Bruqi, "Control theory application for swing up and stabilisation of rotating inverted pendulum," Symmetry, vol. 13, no. 8, p. 1491, 2021. https://doi.org/10.3390/sym13081491
[21] X. Bajrami, A. Shala, R. Likaj, D. Krasniqi, and E. Shala, "Utilizing linear quadratic regulator and model predictive control for optimizing the suspension of a quarter car vehicle in response to road excitation," Journal of Theoretical and Applied Mechanics, vol. 63, no. 1, pp. 75–89, 2025. https://doi.org/10.15632/jtam-pl/194411
[22] R. S. Bhourji, S. Mozaffari, and S. Alirezaee, "Reinforcement learning DDPG–PPO agent-based control system for rotary inverted pendulum," Arabian Journal for Science and Engineering, vol. 49, no. 2, pp. 1683–1696, 2024. https://doi.org/10.1007/s13369-023-08035-7
[23] J. Quer and E. Ribera Borrell, "Connecting stochastic optimal control and reinforcement learning," Journal of Mathematical Physics, vol. 65, no. 8, Art. no. 083512, 2024. https://doi.org/10.1063/5.0189012
[24] Z. Zhang, Z. Mo, Y. Chen, and J. Huang, "Reinforcement learning behavioral control for nonlinear autonomous system," IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 9, pp. 1561–1573, 2022. https://doi.org/10.1109/JAS.2022.105812
[25] S. Baek, J. Baek, J. Choi, and S. Han, "A reinforcement learning-based adaptive time-delay control and its application to robot manipulators," in Proc. 2022 Amer. Control Conf. (ACC), 2022, pp. 2722–2729. https://doi.org/10.23919/ACC53348.2022.9867200
Downloads
Published
Issue
Section
License
Copyright (c) 2026 N. Venkateswaran, A V K Shanthi, M.Revathi, D.Shyamprakash, S. Muthuselvan, V G Pratheep, K.R.Prasanna Kumar, Shailendra Kumar Bohidar

This work is licensed under a Creative Commons Attribution 4.0 International License.









