‘<i>I don’t want to play with you anymore</i>’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

Grace Feehan; Shaheen Fatima; Grace Feehan; Shaheen Fatima

doi:10.1017/S0269888924000018

2024 Volume 39

Article Contents

Next Previous

RESEARCH ARTICLE Open Access

‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

Loughborough University, Epinal Way, Loughborough, LE11 3TU, UK

More Information

Corresponding author: Corresponding author: Grace Feehan; Email: g.feehan@lboro.ac.uk

Received: 21 October 2022
Revised: 16 January 2024
Accepted: 20 February 2024
Published online: 26 March 2024
The Knowledge Engineering Review 39, Article number: e2 (2024) | Cite this article

Abstract

Abstract: Emerging reinforcement learning algorithms that utilize human traits as part of their conceptual architecture have been demonstrated to encourage cooperation in social dilemmas when compared to their unaltered origins. In particular, the addition of a mood mechanism facilitates more cooperative behaviour in multi-agent iterated prisoner dilemma (IPD) games, for both static and dynamic network contexts. Mood-altered agents also exhibit humanlike behavioural trends when environmental aspects of the dilemma are altered, such as the structure of the payoff matrix used. It is possible that other environmental effects from both human and agent-based research will interact with moody structures in previously unstudied ways. As the literature on these interactions is currently small, we seek to expand on previous research by introducing two more environmental dimensions; voluntary interaction in dynamic networks, and stability of interaction through varied network restructuring. From an initial Erdos–Renyi random network, we manipulate the structure of a network IPD according to existing methodology in human-based research, to investigate possible replication of their findings. We also facilitated strategic selection of opponents through the introduction of two partner evaluation mechanisms and tested two selection thresholds for each. We found that even minimally strategic play termination in dynamic networks is enough to enhance cooperation above a static level, though the thresholds for these strategic decisions are critical to desired outcomes. More forgiving thresholds lead to better maintenance of cooperation between kinder strategies than stricter ones, despite overall cooperation levels being relatively low. Additionally, moody reinforcement learning combined with certain play termination decision strategies can mimic trends in human cooperation affected by structural changes to the IPD played on dynamic networks—as can kind and simplistic strategies such as Tit-For-Tat. Implications of this in comparison with human data is discussed, and suggestions for diversification of further testing are made.
Rights and permissions
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.

References

Abdai , J. & Miklósi , Á. 2016. The origin of social evaluation, social eavesdropping, reputation formation, image scoring or what you will. Frontiers in Psychology 7, 1772. https://doi.org/10.3389/fpsyg.2016.01772.

Google Scholar

Andreoni , J. & Miller , J. H. 1993. Rational cooperation in the finitely repeated Prisoner’s Dilemma: Experimental evidence. The Economic Journal 103(418), 570–585. https://doi.org/10.2307/2234532.

Google Scholar

Axelrod , R. 1984. The Evolution of Cooperation. Basic Books.

Google Scholar

Bazzan , A. L. C. & Bordini , R. H. 2001. A framework for the simulation of agents with emotions. In Proceedings of the Fifth International Conference on Autonomous Agents, AGENTS ’01. Association for Computing Machinery, 292–299. ISBN: 158113326X. https://doi.org/10.1145/375735.376313.

Google Scholar

Belkaid , M., Cuperlier , N. & Gaussier , P. 2017. Emotional metacontrol of attention: Top-down modulation of sensorimotor processes in a robotic visual search task. PLoS ONE 12(9). https://doi.org/10.1371/journal.pone.0184960.

Google Scholar

Clore , G. L. & Ortony , A. 2013. Psychological construction in the OCC model of emotion. Emotion Review 5(4), 335–343.

Google Scholar

Collenette , J., et al. 2017a. Environmental effects on simulated emotional and moody agents. The Knowledge Engineering Review 32, 1–24. https://doi.org/10.1017/S0269888917000170.

Google Scholar

Collenette , J., et al. 2017b. Mood modelling within reinforcement learning. In Proceedings of ECAL’17. MIT Press, 106–113. https://doi.org/10.7551/ecal_a_021.

Google Scholar

Collenette , J., et al. 2018a. Modelling mood in co-operative emotional agents. Distributed Autonomous Robotic Systems 6, 559, 572. https://doi.org/10.1007/978-3-319-73008-0_39.

Google Scholar

Collenette , J., et al. 2018b. On the role of mobility and interaction topologies in social dilemmas. In Proceedings of Conference on Artificial Life, 477–484. https://doi.org/10.1162/isal_a_00088.

Google Scholar

Colman , A. M., Pulford , B. D. & Krockow , E. M. 2018. Persistent cooperation and gender differences in repeated prisoner’s dilemma games: Some things never change. Acta Psychologica 187, 1–8. https://doi.org/10.1016/j.actpsy.2018.04.014.

Google Scholar

Erev , I. & Roth , A. E. 1998. Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review 88(4), 848–881. ISSN: 00028282. http://www.jstor.org/stable/117009. Visited on August 30 2022.

Google Scholar

Feehan , G. & Fatima , S. 2022. Augmenting reinforcement learning to enhance cooperation in the iterated prisoner’s dilemma. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence – Volume 3: ICAART, INSTICC. SciTePress, 146–157. https://doi.org/10.5220/0010787500003116.

Google Scholar

Fehr , E. & Schmidt , K. M. 1999. A theory of fairness, competition, and cooperation. Quarterly Journal of Economics, 114, 817–868. https://doi.org/10.1162/003355399556151.

Google Scholar

Fu , F., et al. 2008. Reputation-based partner choice promotes cooperation in social networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 78(2), 026117. https://doi.org/10.1103/PhysRevE.78.026117.

Google Scholar

Gallo , E., et al. 2022. Cooperation and punishment mechanisms in uncertain and dynamic social networks. Games and Economic Behavior 134, 75–103. ISSN: 0899-8256. https://doi.org/10.1016/j.geb.2022.03.015.

Google Scholar

Gao , Y. 2012. A reinforcement learning based strategy for the double-game prisoner’s dilemma. In: Proceedings of the First International Conference on Agreement Technologies, 918, 317–331.

Google Scholar

Hagberg , A. A., Schult , D. A. & Swart , P. J. 2008. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference. Varoquaux , G., Vaught , T. & Millman , J. (eds). Pasadena, CA USA, 1–15.

Google Scholar

Hauk , E. 2001. Leaving the prison: Permitting partner choice and refusal in prisoner’s dilemma games. Computational Economics 18, 65–87. https://doi.org/10.1023/A:1013866527989.

Google Scholar

Hauk , E. & Nagel , R. 2001. Choice of partners in multiple two-person prisoner’s dilemma games: An experimental study. The Journal of Conflict Resolution 45(6), 770–793. https://doi.org/10.1177/0022002701045006004.

Google Scholar

Horita , Y., et al. 2017. Reinforcement learning accounts for moody conditional cooperation behavior: experimental results. Scientific Reports 7, 1–10. https://doi.org/10.1038/srep39275.

Google Scholar

Imhof , L. A., Fudenberg , D. & Nowak , M. A. 2007. Tit-for-tat or win-stay, lose-shift? Journal of Theoretical Biology 247(3), 574–80. https://doi.org/10.1016/j.jtbi.2007.03.027.

Google Scholar

Izquierdo , S., Izquierdo , L. & Vega-Redondo , F. 2010. The option to leave: Conditional dissociation in the evolution of cooperation. Journal of Theoretical Biology 267(1), 76–84. https://doi.org/10.1016/j.jtbi.2010.07.039.

Google Scholar

Jia , D., et al. 2021. Local and global stimuli in reinforcement learning. New Journal of Physics 23(8). https://doi.org/10.1088/1367-2630/ac170a.

Google Scholar

Jusup , M., et al. 2022. Social physics. Physics Reports 948, 1–148. arXiv: 2110.01866. https://doi.org/10.1016/j.physrep.2021.10.005.

Google Scholar

Kim , N.-R. & Shin , K.-S. 2015. A study on the impact of negativity bias on online spread of reputation: With a case study of election campaign. Journal of Information Technology Services 14(1), 263–276. https://doi.org/10.9716/KITS.2015.14.1.263.

Google Scholar

Knoke , D. H. & Yang , S. 2008. Social Network Analysis. 2nd edition. Quantitative Applications in the Social Sciences. SAGE Publications.

Google Scholar

Lin , B., et al. 2019. Reinforcement learning models of human behavior: Reward processing in mental disorders. In NeurIPS.

Google Scholar

Melamed , D., Harrell , A. & Simpson , B. 2018. Cooperation, clustering, and assortative mixing in dynamic networks. Proceedings of the National Academy of Sciences of the United States of America 115(5), 951–956. https://doi.org/10.1073/pnas.1715357115.

Google Scholar

Mesa 2021. Project Mesa. https://github.com/projectmesa/mesa.

Google Scholar

NHS 2019. NHS choices: Symptoms of clinical depression. https://www.nhs.uk/mental-health/conditions/clinical-depression/symptoms/. Last accessed 03 August 2021, 09 August 2022.

Google Scholar

Nowak , M. & Sigmund , K. 1993. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature 364(6432), 56–58. https://doi.org/10.1038/364056a0.

Google Scholar

Ortony , A., Clore , G. L. & Collins , A. 1988. The Cognitive Structure of Emotions. Cambridge University Press.

Google Scholar

Perc , M. & Szolnoki , A. 2010. Coevolutionary games – A mini review. Biosystems 99(2), 109–125. ISSN: 0303-2647. https://doi.org/10.1016/j.biosystems.2009.10.003.

Google Scholar

Perrone , G., Unpingco , J. & Lu , H.-m. 2020. Network visualizations with Pyvis and VisJS. CoRR abs/2006.04951. arXiv: 2006.04951. https://arxiv.org/abs/2006.04951.

Google Scholar

Qu , X., et al. 2016. Conditional dissociation as a punishment mechanism in the evolution of cooperation. Physica A: Statistical Mechanics and its Applications 449, 215–223. https://doi.org/10.1016/j.physa.2015.12.128.

Google Scholar

Rand , D. G., Arbesman , S. & Christakis , N. A. 2011. Dynamic social networks promote cooperation in experiments with humans. Proceedings of the National Academy of Sciences 108(48), 19193–19198. https://doi.org/10.1073/pnas.1108243108.

Google Scholar

Shteingart , H. & Loewenstein , Y. 2014. Reinforcement learning and human behavior. Current Opinion in Neurobiology 25, 93–98.

Google Scholar

Stanley , E. A., Ashlock , D. & Tesfatsion , L. 1993. Iterated Prisoner’s Dilemma with Choice and Refusal of Partners. ISU Economic Reports Series 199302010800001028. Iowa State University, Department of Economics. https://ideas.repec.org/p/isu/genstf/199302010800001028.html.

Google Scholar

Sutton , R. & Barto , A. 2018. Reinforcement Learning: An Introduction. MIT Press.

Google Scholar

Vaughan , R. T. 2008. Massively multi-robot simulation in stage. Swarm Intelligence 2, 189–208.

Google Scholar

Wang , J., Suri , S. & Watts , D. J. 2012. Cooperation and assortativity with dynamic partner updating. Proceedings of the National Academy of Sciences 109(36), 14363–14368. https://doi.org/10.1073/pnas.1120867109.

Google Scholar

Wedekind , C. & Milinski , M. 2000. Cooperation through image scoring in humans. Science 288(5467), 850–852. https://doi.org/10.1126/science.288.5467.850.

Google Scholar

Wilson , A. J. & Wu , H. 2017. At-will relationships: How an option to walk away affects cooperation and efficiency. Games and Economic Behaviour 102, 487–507. https://doi.org/10.1016/j.geb.2017.02.007.

Google Scholar

Wooldridge , M. 2013. An Introduction to Multiagent Systems. 2nd edition. Wiley. ISBN: 978-0-470-51946-2.

Google Scholar

Wrightsman , L. S., O’Connor , J. & Baker , N. J. 1972. Cooperation and Competition: Readings on Mixed-Motive Games. Brooks/Cole Pub. Co.

Google Scholar

About this article

Cite this article

Grace Feehan, Shaheen Fatima. 2024. ‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma. The Knowledge Engineering Review. 39:18 doi: 10.1017/S0269888924000018

Grace Feehan, Shaheen Fatima. 2024. ‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma. The Knowledge Engineering Review. 39:18 doi: 10.1017/S0269888924000018

Download PDF

Article Metrics

Article views(248) PDF downloads(141)

{{lists.name}}

‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors

‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

HTML

Catalog

{{lists.name}}

‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors

‘I don’t want to play with you anymore’: dynamic partner judgements in moody reinforcement learners playing the prisoner’s dilemma

HTML

Catalog

Export File

Citation

Format

Content