Loughborough University, Epinal Way, Loughborough, LE11 3TU, UK"/>
Search
2024 Volume 39
Article Contents
RESEARCH ARTICLE   Open Access    

I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

More Information
  • Abstract: Emerging reinforcement learning algorithms that utilize human traits as part of their conceptual architecture have been demonstrated to encourage cooperation in social dilemmas when compared to their unaltered origins. In particular, the addition of a mood mechanism facilitates more cooperative behaviour in multi-agent iterated prisoner dilemma (IPD) games, for both static and dynamic network contexts. Mood-altered agents also exhibit humanlike behavioural trends when environmental aspects of the dilemma are altered, such as the structure of the payoff matrix used. It is possible that other environmental effects from both human and agent-based research will interact with moody structures in previously unstudied ways. As the literature on these interactions is currently small, we seek to expand on previous research by introducing two more environmental dimensions; voluntary interaction in dynamic networks, and stability of interaction through varied network restructuring. From an initial Erdos–Renyi random network, we manipulate the structure of a network IPD according to existing methodology in human-based research, to investigate possible replication of their findings. We also facilitated strategic selection of opponents through the introduction of two partner evaluation mechanisms and tested two selection thresholds for each. We found that even minimally strategic play termination in dynamic networks is enough to enhance cooperation above a static level, though the thresholds for these strategic decisions are critical to desired outcomes. More forgiving thresholds lead to better maintenance of cooperation between kinder strategies than stricter ones, despite overall cooperation levels being relatively low. Additionally, moody reinforcement learning combined with certain play termination decision strategies can mimic trends in human cooperation affected by structural changes to the IPD played on dynamic networks—as can kind and simplistic strategies such as Tit-For-Tat. Implications of this in comparison with human data is discussed, and suggestions for diversification of further testing are made.
  • 加载中
  • Abdai , J. & Miklósi , Á. 2016. The origin of social evaluation, social eavesdropping, reputation formation, image scoring or what you will. Frontiers in Psychology 7, 1772. https://doi.org/10.3389/fpsyg.2016.01772.

    Google Scholar

    Andreoni , J. & Miller , J. H. 1993. Rational cooperation in the finitely repeated Prisoner’s Dilemma: Experimental evidence. The Economic Journal 103(418), 570–585. https://doi.org/10.2307/2234532.

    Google Scholar

    Axelrod , R. 1984. The Evolution of Cooperation. Basic Books.

    Google Scholar

    Bazzan , A. L. C. & Bordini , R. H. 2001. A framework for the simulation of agents with emotions. In Proceedings of the Fifth International Conference on Autonomous Agents, AGENTS ’01. Association for Computing Machinery, 292–299. ISBN: 158113326X. https://doi.org/10.1145/375735.376313.

    Google Scholar

    Belkaid , M., Cuperlier , N. & Gaussier , P. 2017. Emotional metacontrol of attention: Top-down modulation of sensorimotor processes in a robotic visual search task. PLoS ONE 12(9). https://doi.org/10.1371/journal.pone.0184960.

    Google Scholar

    Clore , G. L. & Ortony , A. 2013. Psychological construction in the OCC model of emotion. Emotion Review 5(4), 335–343.

    Google Scholar

    Collenette , J., et al. 2017a. Environmental effects on simulated emotional and moody agents. The Knowledge Engineering Review 32, 1–24. https://doi.org/10.1017/S0269888917000170.

    Google Scholar

    Collenette , J., et al. 2017b. Mood modelling within reinforcement learning. In Proceedings of ECAL’17. MIT Press, 106–113. https://doi.org/10.7551/ecal_a_021.

    Google Scholar

    Collenette , J., et al. 2018a. Modelling mood in co-operative emotional agents. Distributed Autonomous Robotic Systems 6, 559, 572. https://doi.org/10.1007/978-3-319-73008-0_39.

    Google Scholar

    Collenette , J., et al. 2018b. On the role of mobility and interaction topologies in social dilemmas. In Proceedings of Conference on Artificial Life, 477–484. https://doi.org/10.1162/isal_a_00088.

    Google Scholar

    Colman , A. M., Pulford , B. D. & Krockow , E. M. 2018. Persistent cooperation and gender differences in repeated prisoner’s dilemma games: Some things never change. Acta Psychologica 187, 1–8. https://doi.org/10.1016/j.actpsy.2018.04.014.

    Google Scholar

    Erev , I. & Roth , A. E. 1998. Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review 88(4), 848–881. ISSN: 00028282. http://www.jstor.org/stable/117009. Visited on August 30 2022.

    Google Scholar

    Feehan , G. & Fatima , S. 2022. Augmenting reinforcement learning to enhance cooperation in the iterated prisoner’s dilemma. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence – Volume 3: ICAART, INSTICC. SciTePress, 146–157. https://doi.org/10.5220/0010787500003116.

    Google Scholar

    Fehr , E. & Schmidt , K. M. 1999. A theory of fairness, competition, and cooperation. Quarterly Journal of Economics, 114, 817–868. https://doi.org/10.1162/003355399556151.

    Google Scholar

    Fu , F., et al. 2008. Reputation-based partner choice promotes cooperation in social networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 78(2), 026117. https://doi.org/10.1103/PhysRevE.78.026117.

    Google Scholar

    Gallo , E., et al. 2022. Cooperation and punishment mechanisms in uncertain and dynamic social networks. Games and Economic Behavior 134, 75–103. ISSN: 0899-8256. https://doi.org/10.1016/j.geb.2022.03.015.

    Google Scholar

    Gao , Y. 2012. A reinforcement learning based strategy for the double-game prisoner’s dilemma. In: Proceedings of the First International Conference on Agreement Technologies, 918, 317–331.

    Google Scholar

    Hagberg , A. A., Schult , D. A. & Swart , P. J. 2008. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference. Varoquaux , G., Vaught , T. & Millman , J. (eds). Pasadena, CA USA, 1–15.

    Google Scholar

    Hauk , E. 2001. Leaving the prison: Permitting partner choice and refusal in prisoner’s dilemma games. Computational Economics 18, 65–87. https://doi.org/10.1023/A:1013866527989.

    Google Scholar

    Hauk , E. & Nagel , R. 2001. Choice of partners in multiple two-person prisoner’s dilemma games: An experimental study. The Journal of Conflict Resolution 45(6), 770–793. https://doi.org/10.1177/0022002701045006004.

    Google Scholar

    Horita , Y., et al. 2017. Reinforcement learning accounts for moody conditional cooperation behavior: experimental results. Scientific Reports 7, 1–10. https://doi.org/10.1038/srep39275.

    Google Scholar

    Imhof , L. A., Fudenberg , D. & Nowak , M. A. 2007. Tit-for-tat or win-stay, lose-shift? Journal of Theoretical Biology 247(3), 574–80. https://doi.org/10.1016/j.jtbi.2007.03.027.

    Google Scholar

    Izquierdo , S., Izquierdo , L. & Vega-Redondo , F. 2010. The option to leave: Conditional dissociation in the evolution of cooperation. Journal of Theoretical Biology 267(1), 76–84. https://doi.org/10.1016/j.jtbi.2010.07.039.

    Google Scholar

    Jia , D., et al. 2021. Local and global stimuli in reinforcement learning. New Journal of Physics 23(8). https://doi.org/10.1088/1367-2630/ac170a.

    Google Scholar

    Jusup , M., et al. 2022. Social physics. Physics Reports 948, 1–148. arXiv: 2110.01866. https://doi.org/10.1016/j.physrep.2021.10.005.

    Google Scholar

    Kim , N.-R. & Shin , K.-S. 2015. A study on the impact of negativity bias on online spread of reputation: With a case study of election campaign. Journal of Information Technology Services 14(1), 263–276. https://doi.org/10.9716/KITS.2015.14.1.263.

    Google Scholar

    Knoke , D. H. & Yang , S. 2008. Social Network Analysis. 2nd edition. Quantitative Applications in the Social Sciences. SAGE Publications.

    Google Scholar

    Lin , B., et al. 2019. Reinforcement learning models of human behavior: Reward processing in mental disorders. In NeurIPS.

    Google Scholar

    Melamed , D., Harrell , A. & Simpson , B. 2018. Cooperation, clustering, and assortative mixing in dynamic networks. Proceedings of the National Academy of Sciences of the United States of America 115(5), 951–956. https://doi.org/10.1073/pnas.1715357115.

    Google Scholar

    Mesa 2021. Project Mesa. https://github.com/projectmesa/mesa.

    Google Scholar

    NHS 2019. NHS choices: Symptoms of clinical depression. https://www.nhs.uk/mental-health/conditions/clinical-depression/symptoms/. Last accessed 03 August 2021, 09 August 2022.

    Google Scholar

    Nowak , M. & Sigmund , K. 1993. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature 364(6432), 56–58. https://doi.org/10.1038/364056a0.

    Google Scholar

    Ortony , A., Clore , G. L. & Collins , A. 1988. The Cognitive Structure of Emotions. Cambridge University Press.

    Google Scholar

    Perc , M. & Szolnoki , A. 2010. Coevolutionary games – A mini review. Biosystems 99(2), 109–125. ISSN: 0303-2647. https://doi.org/10.1016/j.biosystems.2009.10.003.

    Google Scholar

    Perrone , G., Unpingco , J. & Lu , H.-m. 2020. Network visualizations with Pyvis and VisJS. CoRR abs/2006.04951. arXiv: 2006.04951. https://arxiv.org/abs/2006.04951.

    Google Scholar

    Qu , X., et al. 2016. Conditional dissociation as a punishment mechanism in the evolution of cooperation. Physica A: Statistical Mechanics and its Applications 449, 215–223. https://doi.org/10.1016/j.physa.2015.12.128.

    Google Scholar

    Rand , D. G., Arbesman , S. & Christakis , N. A. 2011. Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences 108(48), 19193–19198. https://doi.org/10.1073/pnas.1108243108.

    Google Scholar

    Shteingart , H. & Loewenstein , Y. 2014. Reinforcement learning and human behavior. Current Opinion in Neurobiology 25, 93–98.

    Google Scholar

    Stanley , E. A., Ashlock , D. & Tesfatsion , L. 1993. Iterated Prisoner’s Dilemma with Choice and Refusal of Partners. ISU Economic Reports Series 199302010800001028. Iowa State University, Department of Economics. https://ideas.repec.org/p/isu/genstf/199302010800001028.html.

    Google Scholar

    Sutton , R. & Barto , A. 2018. Reinforcement Learning: An Introduction. MIT Press.

    Google Scholar

    Vaughan , R. T. 2008. Massively multi-robot simulation in stage. Swarm Intelligence 2, 189–208.

    Google Scholar

    Wang , J., Suri , S. & Watts , D. J. 2012. Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences 109(36), 14363–14368. https://doi.org/10.1073/pnas.1120867109.

    Google Scholar

    Wedekind , C. & Milinski , M. 2000. Cooperation through image scoring in humans. Science 288(5467), 850–852. https://doi.org/10.1126/science.288.5467.850.

    Google Scholar

    Wilson , A. J. & Wu , H. 2017. At-will relationships: How an option to walk away affects cooperation and efficiency. Games and Economic Behaviour 102, 487–507. https://doi.org/10.1016/j.geb.2017.02.007.

    Google Scholar

    Wooldridge , M. 2013. An Introduction to Multiagent Systems. 2nd edition. Wiley. ISBN: 978-0-470-51946-2.

    Google Scholar

    Wrightsman , L. S., O’Connor , J. & Baker , N. J. 1972. Cooperation and Competition: Readings on Mixed-Motive Games. Brooks/Cole Pub. Co.

    Google Scholar

  • Cite this article

    Grace Feehan, Shaheen Fatima. 2024. ‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000018
    Grace Feehan, Shaheen Fatima. 2024. ‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000018

Article Metrics

Article views(72) PDF downloads(36)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

Abstract: Abstract: Emerging reinforcement learning algorithms that utilize human traits as part of their conceptual architecture have been demonstrated to encourage cooperation in social dilemmas when compared to their unaltered origins. In particular, the addition of a mood mechanism facilitates more cooperative behaviour in multi-agent iterated prisoner dilemma (IPD) games, for both static and dynamic network contexts. Mood-altered agents also exhibit humanlike behavioural trends when environmental aspects of the dilemma are altered, such as the structure of the payoff matrix used. It is possible that other environmental effects from both human and agent-based research will interact with moody structures in previously unstudied ways. As the literature on these interactions is currently small, we seek to expand on previous research by introducing two more environmental dimensions; voluntary interaction in dynamic networks, and stability of interaction through varied network restructuring. From an initial Erdos–Renyi random network, we manipulate the structure of a network IPD according to existing methodology in human-based research, to investigate possible replication of their findings. We also facilitated strategic selection of opponents through the introduction of two partner evaluation mechanisms and tested two selection thresholds for each. We found that even minimally strategic play termination in dynamic networks is enough to enhance cooperation above a static level, though the thresholds for these strategic decisions are critical to desired outcomes. More forgiving thresholds lead to better maintenance of cooperation between kinder strategies than stricter ones, despite overall cooperation levels being relatively low. Additionally, moody reinforcement learning combined with certain play termination decision strategies can mimic trends in human cooperation affected by structural changes to the IPD played on dynamic networks—as can kind and simplistic strategies such as Tit-For-Tat. Implications of this in comparison with human data is discussed, and suggestions for diversification of further testing are made.

    • The authors would like to offer thanks for the funding provided by Loughborough University and UK Research and Innovation (UKRI), upon which this paper and the research within it was constructed. We also offer appreciation to the original authors of the Moody SARSA algorithm, Joe Collenette and colleagues, for their initial assistance and clarity in provision of their algorithmic code.

    • Such as bilateral partnership negotiations, as discussed in Wang et al. (2012).

    • Contrasting with previous performance in Feehan and Fatima (2022) and Collenette et al. (2017b).

    • Known as the cooperation index, which is discussed thoroughly in Wrightsman et al. (1972) and Colman et al. (2018).

    • A value of 37.71%, taken from Andreoni and Miller (1993).

    • Such as >80%, in Izquierdo et al. (2010).

    • That is, values for score thresholds that identify a distinct payoff and no less—such as a minimum of 5. Such a threshold would require opponents to be exploited by judging agents, with no room for transitional behaviours such as payoffs above 3.

    • This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
References (46)
  • About this article
    Cite this article
    Grace Feehan, Shaheen Fatima. 2024. ‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000018
    Grace Feehan, Shaheen Fatima. 2024. ‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma. The Knowledge Engineering Review 39(1), doi: 10.1017/S0269888924000018
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return