Department of Electrical Engineering, National Taiwan Normal University, Taipei, Taiwan"/>
Search
2024 Volume 39
Article Contents
RESEARCH ARTICLE   Open Access    

A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot

More Information
  • Abstract: Recently, the field of robotics development and control has been advancing rapidly. Even though humans effortlessly manipulate everyday objects, enabling robots to interact with human-made objects in real-world environments remains a challenge despite years of dedicated research. For example, typing on a keyboard requires adapting to various external conditions, such as the size and position of the keyboard, and demands high accuracy from a robot to be able to use it properly. This paper introduces a novel hierarchical reinforcement learning algorithm based on the Deep Deterministic Policy Gradient (DDPG) algorithm to address the dual-arm robot typing problem. In this regard, the proposed algorithm employs a Convolutional Auto-Encoder (CAE) to deal with the associated complexities of continuous state and action spaces at the first stage, and then a DDPG algorithm serves as a strategy controller for the typing problem. Using a dual-arm humanoid robot, we have extensively evaluated our proposed algorithm in simulation and real-world experiments. The results showcase the high efficiency of our approach, boasting an average success rate of 96.14% in simulations and 92.2% in real-world settings. Furthermore, we demonstrate that our proposed algorithm outperforms DDPG and Deep Q-Learning, two frequently employed algorithms in robotic applications.
  • 加载中
  • Apvrille , L., Tanzi , T. & Dugelay , J. 2014. Autonomous drones for assisting rescue services within the context of natural disasters. In 2014 XXXIth URSI General Assembly and Scientific Symposium (URSI GASS).

    Google Scholar

    Baltes , J., Christmann , G. & Saeedvand , S. 2023. A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot. Engineering Applications of Artificial Intelligence 126, 106941.

    Google Scholar

    Batula , A. M. & Kim , Y. E. 2010. Development of a mini-humanoid pianist. In 2010 10th IEEE-RAS International Conference on Humanoid Robots, Nashville, TN, USA.

    Google Scholar

    Beeson , P. & Ames , B. 2015. TRAC-IK: An open-source library for improved solving of generic inverse kinematics. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), Seoul.

    Google Scholar

    Bochkovskiy , A., Wang , C.-Y. & Liao , H.-Y.M. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection.

    Google Scholar

    Bogue , R. 2015. Underwater robots: a review of technologies and applications. Industrial Robot: An International Journal 42(3), 186–191.

    Google Scholar

    Boularias , A., Bagnell , J. A. & Stentz , A. 2015. Learning to manipulate unknown objects in clutter by reinforcement. In Twenty-Ninth AAAI Conference on Artificial Intelligence.

    Google Scholar

    Brockman , G., Sutskever , I. & Altman , S. 2020. (5/18/2020), [website], OpenAI, Retrieved from https://gym.openai.com/.

    Google Scholar

    Chen , X. & Guhl , J. 2018. Industrial robot control with object recognition based on deep learning. Procedia CIRP 76, 149–154.

    Google Scholar

    Colomé , A. & Torras , C. 2020. Inverse kinematics and relative arm positioning. In Reinforcement Learning of Bimanual Robot Skills, 25–52. Springer.

    Google Scholar

    Fang , K. et al. 2020. Learning task-oriented grasping for tool manipulation from simulated self-supervision. The International Journal of Robotics Research 39(2-3), 202–216.

    Google Scholar

    Finn , C., et al. 2016. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE.

    Google Scholar

    Gu , S., et al. 2017. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA).

    Google Scholar

    Guo , G. & Zhang , N. 2019. A survey on deep learning based face recognition. Computer Vision and Image Understanding 189, 102805.

    Google Scholar

    Hafner , D., et al. 2020. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193.

    Google Scholar

    Han , G.-J., et al. 2020. Curiosity-driven variational autoencoder for deep Q network. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer.

    Google Scholar

    Hinton , G. E., Krizhevsky , A. & Wang , S. D. 2011. Transforming auto-encoders. In International Conference on Artificial Neural Networks. Springer.

    Google Scholar

    Hinton , G. E. & Salakhutdinov , R. R. 2006. Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507.

    Google Scholar

    Hoof , H. V., et al. 2015. Learning robot in-hand manipulation with tactile features. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

    Google Scholar

    Huang , S. H., et al. 2019. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning. arXiv preprint arXiv:1903.08542.

    Google Scholar

    Jiang , Y., Moseson , S. & Saxena , A. 2011. Efficient grasping from rgbd images: Learning using a new rectangle representation. In 2011 IEEE International Conference on Robotics and Automation. IEEE.

    Google Scholar

    Johnson , M. et al. 2015. Team IHMC’s lessons learned from the DARPA robotics challenge trials. Journal of Field Robotics 32(2), 192–208.

    Google Scholar

    Kajita , S., et al. 2014. Introduction to Humanoid Robotics. Springer.

    Google Scholar

    Kalakrishnan , M., et al. 2011. Learning force control policies for compliant manipulation. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

    Google Scholar

    Khansari , M., et al. 2020. Action Image Representation: Learning Scalable Deep Grasping Policies with Zero Real World Data. arXiv preprint arXiv:2005.06594.

    Google Scholar

    Kim , C. & Park , J. 2019. Designing online network intrusion detection using deep auto-encoder Q-learning. Computers & Electrical Engineering 79, 106460.

    Google Scholar

    Kober , J., Bagnell , J. A. & Peters , J. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274.

    Google Scholar

    Kohlbrecher , S. et al. 2015. Human-robot teaming for rescue missions: Team ViGIR’s approach to the 2013 DARPA robotics challenge trials. Journal of Field Robotics 32(3), 352–377.

    Google Scholar

    Kurnia , D.W., et al. 2017. A control scheme for typist robot using Artificial Neural Network. In 2017 International Conference on Sustainable Information Engineering and Technology (SIET), Malang, Indonesia.

    Google Scholar

    Kurniawan , D.A., et al. 2017. Comparison of extreme learning machine and neural network method on hand typist robot for quadriplegic person. In 2017 International Symposium on Electronics and Smart Devices (ISESD), Yogyakarta, Indonesia.

    Google Scholar

    Kurniawan , D.A., et al. 2017. Hand typist robot modelling for quadriplegic person using extreme learning machine. In 2017 15th International Conference on Quality in Research (QiR) : International Symposium on Electrical and Computer Engineering, Nusa Dua, Indonesia.

    Google Scholar

    Laschi , C., et al. 2000. Grasping and manipulation in humanoid robotics. Scuola Superiore Sant Anna, Italia.

    Google Scholar

    LeCun , Y. et al. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324.

    Google Scholar

    Lenz , I., Lee , H. & Saxena , A. 2015. Deep learning for detecting robotic grasps. The International Journal of Robotics Research 34(4-5), 705–724.

    Google Scholar

    Levine , S. et al. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research 17(1), 1334–1373.

    Google Scholar

    Li , Y. & Chuang , L. 2013. Controller design for music playing robot — Applied to the anthropomorphic piano robot. In 2013 IEEE 10th International Conference on Power Electronics and Drive Systems (PEDS).

    Google Scholar

    Li , Z. et al. 2017. Brain-actuated control of dual-arm robot manipulation with relative motion. IEEE Transactions on Cognitive and Developmental Systems 11(1), 51–62.

    Google Scholar

    Lillicrap , T. P., et al. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

    Google Scholar

    Lin , J., et al. 2010. Electronic piano playing robot. In 2010 International Symposium on Computer, Communication, Control and Automation (3CA).

    Google Scholar

    Lioutikov , R., et al. 2016. Learning manipulation by sequencing motor primitives with a two-armed robot. In Intelligent Autonomous Systems, 13, 1601–1611. Springer.

    Google Scholar

    Liu , L. et al. 2020. Deep learning for generic object detection: A survey. International Journal of Computer Vision 128(2), 261–318.

    Google Scholar

    Liu , N., et al. 2017. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE.

    Google Scholar

    Magid , E., et al. 2020. Artificial intelligence based framework for robotic search and rescue operations conducted jointly by international teams. In Proceedings of 14th International Conference on Electromechanics and Robotics “Zavalishin’s Readings”. Springer Singapore.

    Google Scholar

    Maier , D., Zohouri , R. & Bennewitz , M. 2014. Using visual and auditory feedback for instrument-playing humanoids. In 2014 IEEE-RAS International Conference on Humanoid Robots.

    Google Scholar

    Mandala , H., Saeedvand , S. & Baltes , J. 2020. Synchronous dual-arm manipulation by adult-sized humanoid robot. In 2020 International Conference on Advanced Robotics and Intelligent Systems (ARIS). IEEE.

    Google Scholar

    Masci , J., et al. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks. Springer.

    Google Scholar

    Mnih , V. et al. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529–533.

    Google Scholar

    Moosavian , S. A. A., Semsarilar , H. & Kalantari , A. 2006. Design and manufacturing of a mobile rescue robot. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

    Google Scholar

    Murphy , R. R. 2012. A decade of rescue robots. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

    Google Scholar

    Ozawa , R. et al. 2005. Control of an object with parallel surfaces by a pair of finger robots without object sensing. IEEE Transactions on Robotics 21(5), 965–976.

    Google Scholar

    Pastor , P., et al. 2011. Skill learning and task outcome prediction for manipulation. In 2011 IEEE International Conference on Robotics and Automation.

    Google Scholar

    Piazzi , A. & Visioli , A. 2000. Global minimum-jerk trajectory planning of robot manipulators. IEEE Transactions on Industrial Electronics 47(1), 140–149.

    Google Scholar

    Qu , J. et al. 2019. Human-like coordination motion learning for a redundant dual-arm robot. Robotics and Computer-Integrated Manufacturing 57, 379–390.

    Google Scholar

    Rajeswaran , A., et al. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087.

    Google Scholar

    Ranzato , M.A., et al. 2007. Unsupervised learning of invariant feature hierarchies with applications to object recognition. In 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE.

    Google Scholar

    Ren , W., Han , D. & Wang , Z. 2022. Research on dual-arm control of lunar assisted robot based on hierarchical reinforcement learning under unstructured environment. Aerospace 9(6), 315.

    Google Scholar

    Saeedvand , S. et al. 2019. A comprehensive survey on humanoid robot development. The Knowledge Engineering Review 34, e20, 1–18.

    Google Scholar

    Saeedvand , S., Aghdasi , H. S. & Baltes , J. 2019. Robust multi-objective multi-humanoid robots task allocation based on novel hybrid metaheuristic algorithm. Applied Intelligence 49(12), 4097–4127.

    Google Scholar

    Saeedvand , S., Aghdasi , H. S. & Baltes , J. 2020. Novel hybrid algorithm for team orienteering problem with time windows for rescue applications. Applied Soft Computing 96, 106700.

    Google Scholar

    Saeedvand , S., Mandala , H. & Baltes , J. 2021. Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot. Applied Soft Computing 110, 107601.

    Google Scholar

    Saxena , A., Driemeyer , J. & Ng , A. Y. 2008. Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27(2), 157–173.

    Google Scholar

    Seker , M. Y., Tekden , A. E. & Ugur , E. 2019. Deep effect trajectory prediction in robot manipulation. Robotics and Autonomous Systems 119, 173–184.

    Google Scholar

    Silver , D., et al. 2014. Deterministic policy gradient algorithms.

    Google Scholar

    Stulp , F., Theodorou , E. A. & Schaal , S. 2012. Reinforcement learning with sequences of motion primitives for robust manipulation. IEEE Transactions on Robotics 28(6), 1360–1370.

    Google Scholar

    Sugano , S. & Kato , I. 1987. WABOT-2: Autonomous robot with dexterous finger-arm–Finger-arm coordination control in keyboard performance. In Proceedings. 1987 IEEE International Conference on Robotics and Automation. IEEE.

    Google Scholar

    Sutton , R. S., et al. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems.

    Google Scholar

    Vincent , P., et al. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning.

    Google Scholar

    Wang , C. et al. 2020. Learning mobile manipulation through deep reinforcement learning. Sensors 20(3), 939.

    Google Scholar

    Weng , C.-Y., Tan , W. C. & Chen , I.-M. 2019. A survey of dual-arm robotic issues on assembly tasks. In ROMANSY 22–Robot Design, Dynamics and Control, 474–480. Springer.

    Google Scholar

    Wu , Z., Shen , C. & Van Den Hengel , A. 2019. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition 90, 119–133.

    Google Scholar

    Yan , L., et al. 2016. Coordinated compliance control of dual-arm robot for payload manipulation: Master-slave and shared force control. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE.

    Google Scholar

    Ye , G., Thobbi , A. & Sheng , W. 2011. Human-robot collaborative manipulation through imitation and reinforcement learning. In 2011 IEEE International Conference on Information and Automation.

    Google Scholar

    Zhang , A., Malhotra , M. & Matsuoka , Y. 2011. Musical piano performance by the ACT Hand. In 2011 IEEE International Conference on Robotics and Automation, Shanghai, China.

    Google Scholar

    Zhang , D., et al. 2009. Design and analysis of a piano playing robot. In 2009 International Conference on Information and Automation.

    Google Scholar

    Zhang , F., et al. 2015. Towards vision-based deep reinforcement learning for robotic motion control. arXiv preprint arXiv:1511.03791.

    Google Scholar

    Zhu , H., et al. 2019. Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In 2019 International Conference on Robotics and Automation (ICRA). IEEE.

    Google Scholar

  • Cite this article

    Jacky Baltes, Hanjaya Mandala, Saeed Saeedvand. 2024. A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000080
    Jacky Baltes, Hanjaya Mandala, Saeed Saeedvand. 2024. A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000080

Article Metrics

Article views(90) PDF downloads(429)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot

Abstract: Abstract: Recently, the field of robotics development and control has been advancing rapidly. Even though humans effortlessly manipulate everyday objects, enabling robots to interact with human-made objects in real-world environments remains a challenge despite years of dedicated research. For example, typing on a keyboard requires adapting to various external conditions, such as the size and position of the keyboard, and demands high accuracy from a robot to be able to use it properly. This paper introduces a novel hierarchical reinforcement learning algorithm based on the Deep Deterministic Policy Gradient (DDPG) algorithm to address the dual-arm robot typing problem. In this regard, the proposed algorithm employs a Convolutional Auto-Encoder (CAE) to deal with the associated complexities of continuous state and action spaces at the first stage, and then a DDPG algorithm serves as a strategy controller for the typing problem. Using a dual-arm humanoid robot, we have extensively evaluated our proposed algorithm in simulation and real-world experiments. The results showcase the high efficiency of our approach, boasting an average success rate of 96.14% in simulations and 92.2% in real-world settings. Furthermore, we demonstrate that our proposed algorithm outperforms DDPG and Deep Q-Learning, two frequently employed algorithms in robotic applications.

    • This work was financially supported by the ‘Chinese Language and Technology Center’ of the National Taiwan Normal University (NTNU) from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan and the Ministry of Science and Technology, Taiwan, under Grant Nos. MOST 108-2634-F-003-002, MOST 108-2634-F-003-003, and MOST 108-2634-F-003-004 (administered through Pervasive Artificial Intelligence Research (PAIR) Labs) as well as MOST 107-2811-E-003-503. We are grateful to the National Center for High-performance Computing for computer time and facilities to conduct this research.

    • This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
References (76)
  • About this article
    Cite this article
    Jacky Baltes, Hanjaya Mandala, Saeed Saeedvand. 2024. A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000080
    Jacky Baltes, Hanjaya Mandala, Saeed Saeedvand. 2024. A hierarchical deep reinforcement learning algorithm for typing with a dual-arm humanoid robot. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000080
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return