A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot

Jacky Baltes; Hanjaya Mandala; Saeed Saeedvand; Jacky Baltes; Hanjaya Mandala; Saeed Saeedvand

doi:10.1017/S0269888924000080

2024 Volume 39

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot

Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan

More Information

Corresponding author: Corresponding author: Saeed Saeedvand; Email: saeedvand@ntnu.edu.tw

Received: 12 June 2022
Revised: 18 September 2023
Accepted: 28 November 2023
Published online: 20 November 2024
The Knowledge Engineering Review 39, Article number: e7 (2024) | Cite this article

Abstract

Abstract: Recently, the field of robotics development and control has been advancing rapidly. Even though humans effortlessly manipulate everyday objects, enabling robots to interact with human-made objects in real-world environments remains a challenge despite years of dedicated research. For example, typing on a keyboard requires adapting to various external conditions, such as the size and position of the keyboard, and demands high accuracy from a robot to be able to use it properly. This paper introduces a novel hierarchical reinforcement learning algorithm based on the Deep Deterministic Policy Gradient (DDPG) algorithm to address the dual-arm robot typing problem. In this regard, the proposed algorithm employs a Convolutional Auto-Encoder (CAE) to deal with the associated complexities of continuous state and action spaces at the first stage, and then a DDPG algorithm serves as a strategy controller for the typing problem. Using a dual-arm humanoid robot, we have extensively evaluated our proposed algorithm in simulation and real-world experiments. The results showcase the high efficiency of our approach, boasting an average success rate of 96.14% in simulations and 92.2% in real-world settings. Furthermore, we demonstrate that our proposed algorithm outperforms DDPG and Deep Q-Learning, two frequently employed algorithms in robotic applications.
Rights and permissions
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.

References

Apvrille , L., Tanzi , T. & Dugelay , J. 2014. Autonomous drones for assisting rescue services within the context of natural disasters. In 2014 XXXIth URSI General Assembly and Scientific Symposium (URSI GASS).

Google Scholar

Baltes , J., Christmann , G. & Saeedvand , S. 2023. A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot. Engineering Applications of Artificial Intelligence 126, 106941.

Google Scholar

Batula , A. M. & Kim , Y. E. 2010. Development of a mini-humanoid pianist. In 2010 10th IEEE-RAS International Conference on Humanoid Robots, Nashville, TN, USA.

Google Scholar

Beeson , P. & Ames , B. 2015. TRAC-IK: An open-source library for improved solving of generic inverse kinematics. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), Seoul.

Google Scholar

Bochkovskiy , A., Wang , C.-Y. & Liao , H.-Y.M. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection.

Google Scholar

Bogue , R. 2015. Underwater robots: a review of technologies and applications. Industrial Robot: An International Journal 42(3), 186–191.

Google Scholar

Boularias , A., Bagnell , J. A. & Stentz , A. 2015. Learning to manipulate unknown objects in clutter by reinforcement. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

Google Scholar

Brockman , G., Sutskever , I. & Altman , S. 2020. (5/18/2020), [website], OpenAI, Retrieved from https://gym.openai.com/.

Google Scholar

Chen , X. & Guhl , J. 2018. Industrial robot control with object recognition based on deep learning. Procedia CIRP 76, 149–154.

Google Scholar

Colomé , A. & Torras , C. 2020. Inverse kinematics and relative arm positioning. In Reinforcement Learning of Bimanual Robot Skills, 25–52. Springer.

Google Scholar

Fang , K. et al. 2020. Learning task-oriented grasping for tool manipulation from simulated self-supervision. The International Journal of Robotics Research 39(2-3), 202–216.

Google Scholar

Finn , C., et al. 2016. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE.

Google Scholar

Gu , S., et al. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA).

Google Scholar

Guo , G. & Zhang , N. 2019. A survey on deep learning based face recognition. Computer Vision and Image Understanding 189, 102805.

Google Scholar

Hafner , D., et al. 2020. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193.

Google Scholar

Han , G.-J., et al. 2020. Curiosity-driven variational autoencoder for deep Q network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer.

Google Scholar

Hinton , G. E., Krizhevsky , A. & Wang , S. D. 2011. Transforming auto-encoders. In International Conference on Artificial Neural Networks. Springer.

Google Scholar

Hinton , G. E. & Salakhutdinov , R. R. 2006. Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507.

Google Scholar

Hoof , H. V., et al. 2015. Learning robot in-hand manipulation with tactile features. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

Google Scholar

Huang , S. H., et al. 2019. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning. arXiv preprint arXiv:1903.08542.

Google Scholar

Jiang , Y., Moseson , S. & Saxena , A. 2011. Efficient grasping from rgbd images: Learning using a new rectangle representation. In 2011 IEEE International Conference on Robotics and Automation. IEEE.

Google Scholar

Johnson , M. et al. 2015. Team IHMC’s lessons learned from the DARPA robotics challenge trials. Journal of Field Robotics 32(2), 192–208.

Google Scholar

Kajita , S., et al. 2014. Introduction to Humanoid Robotics. Springer.

Google Scholar

Kalakrishnan , M., et al. 2011. Learning force control policies for compliant manipulation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

Google Scholar

Khansari , M., et al. 2020. Action Image Representation: Learning Scalable Deep Grasping Policies with Zero Real World Data. arXiv preprint arXiv:2005.06594.

Google Scholar

Kim , C. & Park , J. 2019. Designing online network intrusion detection using deep auto-encoder Q-learning. Computers & Electrical Engineering 79, 106460.

Google Scholar

Kober , J., Bagnell , J. A. & Peters , J. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274.

Google Scholar

Kohlbrecher , S. et al. 2015. Human-robot teaming for rescue missions: Team ViGIR’s approach to the 2013 DARPA robotics challenge trials. Journal of Field Robotics 32(3), 352–377.

Google Scholar

Kurnia , D.W., et al. 2017. A control scheme for typist robot using Artificial Neural Network. In 2017 International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia.

Google Scholar

Kurniawan , D.A., et al. 2017. Comparison of extreme learning machine and neural network method on hand typist robot for quadriplegic person. In 2017 International Symposium on Electronics and Smart Devices (ISESD), Yogyakarta, Indonesia.

Google Scholar

Kurniawan , D.A., et al. 2017. Hand typist robot modelling for quadriplegic person using extreme learning machine. In 2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering, Nusa Dua, Indonesia.

Google Scholar

Laschi , C., et al. 2000. Grasping and manipulation in humanoid robotics. Scuola Superiore Sant Anna, Italia.

Google Scholar

LeCun , Y. et al. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324.

Google Scholar

Lenz , I., Lee , H. & Saxena , A. 2015. Deep learning for detecting robotic grasps. The International Journal of Robotics Research 34(4-5), 705–724.

Google Scholar

Levine , S. et al. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1), 1334–1373.

Google Scholar

Li , Y. & Chuang , L. 2013. Controller design for music playing robot — Applied to the anthropomorphic piano robot. In 2013 IEEE 10th International Conference on Power Electronics and Drive Systems (PEDS).

Google Scholar

Li , Z. et al. 2017. Brain-actuated control of dual-arm robot manipulation with relative motion. IEEE Transactions on Cognitive and Developmental Systems 11(1), 51–62.

Google Scholar

Lillicrap , T. P., et al. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

Google Scholar

Lin , J., et al. 2010. Electronic piano playing robot. In 2010 International Symposium on Computer, Communication, Control and Automation (3CA).

Google Scholar

Lioutikov , R., et al. 2016. Learning manipulation by sequencing motor primitives with a two-armed robot. In Intelligent Autonomous Systems, 13, 1601–1611. Springer.

Google Scholar

Liu , L. et al. 2020. Deep learning for generic object detection: A survey. International Journal of Computer Vision 128(2), 261–318.

Google Scholar

Liu , N., et al. 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE.

Google Scholar

Magid , E., et al. 2020. Artificial intelligence based framework for robotic search and rescue operations conducted jointly by international teams. In Proceedings of 14th International Conference on Electromechanics and Robotics “Zavalishin’s Readings”. Springer Singapore.

Google Scholar

Maier , D., Zohouri , R. & Bennewitz , M. 2014. Using visual and auditory feedback for instrument-playing humanoids. In 2014 IEEE-RAS International Conference on Humanoid Robots.

Google Scholar

Mandala , H., Saeedvand , S. & Baltes , J. 2020. Synchronous dual-arm manipulation by adult-sized humanoid robot. In 2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS). IEEE.

Google Scholar

Masci , J., et al. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks. Springer.

Google Scholar

Mnih , V. et al. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529–533.

Google Scholar

Moosavian , S. A. A., Semsarilar , H. & Kalantari , A. 2006. Design and manufacturing of a mobile rescue robot. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

Google Scholar

Murphy , R. R. 2012. A decade of rescue robots. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

Google Scholar

Ozawa , R. et al. 2005. Control of an object with parallel surfaces by a pair of finger robots without object sensing. IEEE Transactions on Robotics 21(5), 965–976.

Google Scholar

Pastor , P., et al. 2011. Skill learning and task outcome prediction for manipulation. In 2011 IEEE International Conference on Robotics and Automation.

Google Scholar

Piazzi , A. & Visioli , A. 2000. Global minimum-jerk trajectory planning of robot manipulators. IEEE Transactions on Industrial Electronics 47(1), 140–149.

Google Scholar

Qu , J. et al. 2019. Human-like coordination motion learning for a redundant dual-arm robot. Robotics and Computer-Integrated Manufacturing 57, 379–390.

Google Scholar

Rajeswaran , A., et al. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087.

Google Scholar

Ranzato , M.A., et al. 2007. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE.

Google Scholar

Ren , W., Han , D. & Wang , Z. 2022. Research on dual-arm control of lunar assisted robot based on hierarchical reinforcement learning under unstructured environment. Aerospace 9(6), 315.

Google Scholar

Saeedvand , S. et al. 2019. A comprehensive survey on humanoid robot development. The Knowledge Engineering Review 34, e20, 1–18.

Google Scholar

Saeedvand , S., Aghdasi , H. S. & Baltes , J. 2019. Robust multi-objective multi-humanoid robots task allocation based on novel hybrid metaheuristic algorithm. Applied Intelligence 49(12), 4097–4127.

Google Scholar

Saeedvand , S., Aghdasi , H. S. & Baltes , J. 2020. Novel hybrid algorithm for team orienteering problem with time windows for rescue applications. Applied Soft Computing 96, 106700.

Google Scholar

Saeedvand , S., Mandala , H. & Baltes , J. 2021. Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot. Applied Soft Computing 110, 107601.

Google Scholar

Saxena , A., Driemeyer , J. & Ng , A. Y. 2008. Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27(2), 157–173.

Google Scholar

Seker , M. Y., Tekden , A. E. & Ugur , E. 2019. Deep effect trajectory prediction in robot manipulation. Robotics and Autonomous Systems 119, 173–184.

Google Scholar

Silver , D., et al. 2014. Deterministic policy gradient algorithms.

Google Scholar

Stulp , F., Theodorou , E. A. & Schaal , S. 2012. Reinforcement learning with sequences of motion primitives for robust manipulation. IEEE Transactions on Robotics 28(6), 1360–1370.

Google Scholar

Sugano , S. & Kato , I. 1987. WABOT-2: Autonomous robot with dexterous finger-arm–Finger-arm coordination control in keyboard performance. In Proceedings. 1987 IEEE International Conference on Robotics and Automation. IEEE.

Google Scholar

Sutton , R. S., et al. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems.

Google Scholar

Vincent , P., et al. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning.

Google Scholar

Wang , C. et al. 2020. Learning mobile manipulation through deep reinforcement learning. Sensors 20(3), 939.

Google Scholar

Weng , C.-Y., Tan , W. C. & Chen , I.-M. 2019. A survey of dual-arm robotic issues on assembly tasks. In ROMANSY 22–Robot Design, Dynamics and Control, 474–480. Springer.

Google Scholar

Wu , Z., Shen , C. & Van Den Hengel , A. 2019. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition 90, 119–133.

Google Scholar

Yan , L., et al. 2016. Coordinated compliance control of dual-arm robot for payload manipulation: Master-slave and shared force control. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

Google Scholar

Ye , G., Thobbi , A. & Sheng , W. 2011. Human-robot collaborative manipulation through imitation and reinforcement learning. In 2011 IEEE International Conference on Information and Automation.

Google Scholar

Zhang , A., Malhotra , M. & Matsuoka , Y. 2011. Musical piano performance by the ACT Hand. In 2011 IEEE International Conference on Robotics and Automation, Shanghai, China.

Google Scholar

Zhang , D., et al. 2009. Design and analysis of a piano playing robot. In 2009 International Conference on Information and Automation.

Google Scholar

Zhang , F., et al. 2015. Towards vision-based deep reinforcement learning for robotic motion control. arXiv preprint arXiv:1511.03791.

Google Scholar

Zhu , H., et al. 2019. Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In 2019 International Conference on Robotics and Automation (ICRA). IEEE.

Google Scholar

About this article

Cite this article

Jacky Baltes, Hanjaya Mandala, Saeed Saeedvand. 2024. A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot. The Knowledge Engineering Review. 39: doi: 10.1017/S0269888924000080

Jacky Baltes, Hanjaya Mandala, Saeed Saeedvand. 2024. A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot. The Knowledge Engineering Review. 39: doi: 10.1017/S0269888924000080

Download PDF

Article Metrics

Article views(276) PDF downloads(948)

{{lists.name}}

A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors