doi:10.1017/S0269888920000119

Abdallah , S. & Lesser , V.2006. Learning the task allocation game, In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’06, ACM Press, 850–857.

Agogino , A. K. & Tumer , K.2004. Unifying temporal and structural credit assignment problems, In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’04, IEEE, 980–987.

Auer , P., Cesa-Bianchi , N., Freund , Y. & Schapire , R. E.2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing32(1), 48–77.

Awerbuch , B. & Kleinberg , R. D.2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, ACM, 45–53.

Bar-Gera , H.2010. Traffic assignment by paired alternative segments. Transportation Research Part B: Methodological44(8–9), 1022–1046.

Bazzan , A. L. C.2009. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multiagent Systems18(3), 342–375.

Bazzan , A. L. C. & Klügl , F.2005. Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models. Transportation Research C13(4), 299–319.

Bazzan , A. L. C. & Klügl , F.2013. Introduction to Intelligent Systems inTraffic and Transportation, Vol. 7. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan and Claypool.

Beckmann , M., McGuire , C. B. & Winsten , C. B.1956. Studies in the Economics of Transportation, Yale University Press, New Haven.

Bonifaci , V., Salek , M. & Schäfer , G.2011. Efficiency of restricted tolls in non-atomic network routing games. In Algorithmic Game Theory: Proceedings of the 4th International Symposium (SAGT 2011), Persiano , G. (ed). Springer, 302–313.

Bowling , M.2005. Convergence and no-regret in multiagent learning. In L. K.Saul , Y.Weiss & L.Bottou , (eds.) Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, MIT Press, 209–216.

Boyan , J. A. & Littman , M. L.1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems6, 671–678.

Braess , D.1968. Über ein Paradoxon aus der Verkehrsplanung. Unternehmensforschung12, 258.

Buşoniu , L., Babuska , R. & De Schutter , B.2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews38(2), 156–172.

Chen , H., An , B., Sharon , G., Hanna , J. P., Stone , P., Miao , C. & Soh , Y. C.2018. DyETC: Dynamic electronic toll collection for traffic congestion alleviation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), number February, AAAI Press, 757–765.

Chen , P.-A. & Kempe , D.2008. Altruism, selfishness, and spite in traffic routing. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08), Riedl , J. & Sandholm , T. (eds.), ACM Press, 140–149.

Claus , C. & Boutilier , C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 746–752.

Colby , M., Duchow-Pressley , T., Chung , J. J. & Tumer , K.2016. Local approximation of difference evaluation functions. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), IFAAMAS, Singapore, 521–529.

Cole , R., Dodis , Y. & Roughgarden , T.2003. Pricing network edges for heterogeneous selfish users. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC ’03, ACM, 521–530.

Crites , R. H. & Barto , A. G.1998. Elevator group control using multiple reinforcement learning agents. Machine Learning33(2), 235–262.

de Palma , A. & Lindsey , R.2011. Traffic congestion pricing methodologies and technologies. Transportation Research Part C: Emerging Technologies19(6), 1377–1399.

Fehr , E. & Fischbacher , U.2003. The nature of human altruism. Nature425(6960), 785–791.

Fleischer , L., Jain , K. & Mahdian , M.2004. Tolls for heterogeneous selfish users in multicommodity networks and generalized congestion games. In 45th Annual IEEE Symposium on Foundations of Computer Science, IEEE, Rome, Italy, 277–285.

Foerster , J., Nardell , N., Farquhar , G., Afouras , T., Torr , P. H., Kohli , P. & Whiteson , S.2017. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, 70, PMLR, 1146–1155.

Centre for Economics and Business Research2014. The Future Economic and Environmental Costs of Gridlock in 2030, Technical report, Centre for Economics and Business Research, London.

Hernandez-Leal , P., Zhan , Y., Taylor , M. E., Sucar , L. E. & Munoz de Cote , E.2017. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems31(5), 971–1002.

Hoefer , M. & Skopalik , A.2009. Altruism in atomic congestion games. In 17th Annual European Symposium on Algorithms, Fiat , A. & Sanders , P. (eds.), Springer, Berlin Heidelberg, 179–189.

Hu , J. & Wellman , M. P.1998. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, 242–250.

Hu , J. & Wellman , M. P.2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research4, 1039–1069.

Jayakrishnan , R., Cohen , M., Kim , J., Mahmassani , H. S. & Hu , T.-Y.1993. A Simulation-Based Framework for the Analysis of Traffic Networks Operating with Real-Time Information, Technical Report UCB-ITS-PRR-93-25, University of California, Berkeley.

Kaisers , M. & Tuyls , K.2010. Frequency adjusted multi-agent q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 309–316.

Kobayashi , K. & Do , M.2005. The informational impacts of congestion tolls upon route traffic demands. Transportation Research A39(7–9), 651–670.

Koutsoupias , E. & Papadimitriou , C.1999. Worst-case equilibria. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science (STACS), Springer-Verlag, 404–413.

Lanctot , M., Zambaldi , V., Gruslys , A., Lazaridou , A., Tuyls , K., Perolat , J., Silver , D. & Graepel , T.2017. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, Guyon , I., Luxburg , U. V., Bengio , S., Wallach , H., Fergus , R., Vishwanathan , S. & Garnett , R. (eds.), 30, Curran Associates, Inc., 4190–4203.

Lauer , M. & Riedmiller , M.2004. Reinforcement learning for stochastic cooperative multi-agent-systems. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 3, 1516–1517.

Laurent , G. J., Matignon , L. & Le Fort-Piat , N.2011. The world of independent learners is not markovian. International Journal of Knowledge-based and Intelligent Engineering Systems15(1), 55–64.

LeBlanc , L. J., Morlok , E. K. & Pierskalla , W. P.1975. An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research9(5), 309–318.

Levy , N. & Ben-Elia , E.2016. Emergence of system optimum: A fair and altruistic agent-based route-choice model. Procedia Computer Science83, 928–933.

Littman , M. L.1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, ML, Morgan Kaufmann, 157–163.

Littman , M. L.2001. Friend-or-Foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML’01). Morgan Kaufmann, 322–328.

Lujak , M., Giordani , S. & Ossowski , S.2015. Route guidance: Bridging system and user optimization in traffic assignment. Neurocomputing151, 449–460.

Malialis , K., Devlin , S. & Kudenko , D.2016. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 503–511.

Matignon , L., Laurent , G. J. & Le Fort-Piat , N.2012. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review27(1), 1–31.

McFadden , D.2001. Disaggregate behavioral travel demand’s RUM side. In Travel Behaviour Research: The Leading Edge, Hensher , D. A. (ed), Elsevier, 17–63.

Meir , R. & Parkes , D.2018. Playing the wrong game: Bounding externalities in diverse populations of agents. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), IFAAMAS, Stockholm, 86–94.

Meir , R. & Parkes , D. C.2016. When are marginal congestion tolls optimal? In Proceedings of the Ninth Workshop on Agents in Traffic and Transportation (ATT-2016), Bazzan , A. L. C., Klügl , F., Ossowski , S. & Vizzari , G. (eds). CEUR-WS.org, 8.

Mirzaei , H., Sharon , G., Boyles , S., Givargis , T. & Stone , P.2018. Link-based parameterized micro-tolling scheme for optimal traffic management. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’18, Dastani , M., Sukthankar , G., André , E. & Koenig , S. (eds). IFAAMAS, 2013–2015.

Nash , J.1950. Non-Cooperative Games, PhD thesis, Princeton University.

National Surface Transportation Infrastructure Financing Commission2009. Paying Our Way: A New Framework for Transportation Finance, Technical report, National Surface Transportation Infrastructure Financing Commission, Washington DC.

Bureau of Public Roads1964. Traffic Assignment Manual, Technical report, US Department of Commerce, Washington, D. C.

Omidshafiei , S., Pazis , J., Amato , C., How , J. P. & Vian , J.2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning, 70, 4108–4122.

Ortúzar , J. d. D. & Willumsen , L. G.2011. Modelling Transport, 4 edition, John Wiley & Sons.

Pigou , A.1920. The Economics of Welfare, Palgrave Classics in Economics, Palgrave Macmillan.

Proper , S. & Tumer , K.2012. Modeling difference rewards for multiagent learning. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, V., Winikoff, M., Padgham, L. & van der Hoek, W. (eds). IFAAMAS.

Rădulescu , R., Vrancx , P. & Nowé , A.2017. Analysing congestion problems in multi-agent reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 1705–1707.

Ramos , G. de O.2018. Regret Minimisation and System-Efficiency in Route Choice, PhD thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre.

Ramos , G. de O. & Bazzan , A. L. C.2015. Towards the user equilibrium in traffic assignment using GRASP with path relinking. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, ACM, 473–480.

Ramos , G. de O. & Bazzan , A. L. C.2016. Efficient local search in traffic assignment. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1493–1500.

Ramos , G. de O., da Silva , B. C. & Bazzan , A. L. C.2017. Learning to minimise regret in route choice. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das , S., Durfee , E., Larson , K. & Winikoff , M. (eds). IFAAMAS, 846–855.

Ramos , G. de O., da Silva , B. C., Rădulescu , R. & Bazzan , A. L. C.2018. Learning system-efficient equilibria in route choice using tolls. In Proceedings of the Adaptive Learning Agents Workshop 2018 (ALA-18), Stockholm.

Ramos , G. de O., Rădulescu , R., Nowé , A. & Tavares , A. R. 2020. Toll-based learning for minimising congestion under heterogeneous preferences. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), An, B., Yorke-Smith, N., El Fallah Seghrouchni, A. & Sukthankar , G. (eds). IFAAMAS.

Robbins , H.1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society58(5), 527–535.

Rosenthal , R. W.1973. A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory2, 65–67.

Roughgarden , T.2005. Selfish Routing and the Price of Anarchy, MIT Press.

Sandholm , T.2007. Perspectives on multiagent learning. Artificial Intelligence171(7), 382–391.

Sen , S., Sekaran , M. & Hale , J.1994. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 426–431.

Sharon , G., Boyles , S. D., Alkoby , S. & Stone , P.2019. Marginal cost pricing with a fixed error factor in traffic networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Agmon , N., Taylor , M., Elkind , E. & Veloso , M. (eds). IFAAMAS, Montreal, 1539–1546.

Sharon , G., Hanna , J. P., Rambha , T., Levin , M. W., Albert , M., Boyles , S. D. & Stone , P.2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das , S., Durfee , E., Larson , K. & Winikoff , M. (eds). IFAAMAS, 828–836.

Stefanello , F. & Bazzan , A. L. C.2016. Traffic Assignment Problem – Extending Braess Paradox, Technical report, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS.

Sutton , R. & Barto , A.1998. Reinforcement Learning: An Introduction, MIT Press.

Tan , M.1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337.

Tavares , A. R. & Bazzan , A. L.2014. An agent-based approach for road pricing: System-level performance and implications for drivers. Journal of the Brazilian Computer Society20(1), 15.

Tesauro , G.1994. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation6(2), 215–219.

Tuyls , K. & Weiss , G.2012. Multiagent learning: Basics, challenges, and prospects. AI Magazine33(3), 41–52.

van Essen , M., Thomas , T., van Berkum , E. & Chorus , C.2016. From user equilibrium to system optimum: a literature review on the role of travel information, bounded rationality and non-selfish behaviour at the network and individual levels. Transport Reviews36(4), 527–548.

Verbeeck , K., Nowé , A., Parent , J. & Tuyls , K.2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems14(3), 239–269.

Vrancx , P., Verbeeck , K. & Nowe , A.2008. Decentralized learning in markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)38(4), 976–981.

Vrancx , P., Verbeeck , K. & Nowé , A.2010. Learning to take turns. In Proceedings of the AAMAS 2010 Workshop on Adaptive Learning Agents and Multi-Agent Systems (ALA 2010), 1–7.

Wardrop , J. G.1952. Some theoretical aspects of road traffic research. Proceedings of the Institution of Civil Engineers, Part II 1(36), 325–362.

Watkins , C. J. C. H. & Dayan , P.1992. Q-learning. Machine Learning8(3), 279–292.

Wolpert , D. H. & Tumer , K.1999. An introduction to Collective Intelligence, Technical report NASA-ARC-IC-99-63, NASA Ames Research Center. arXiv:cs/9908014 [cs.LG].

Wolpert , D. H. & Tumer , K.2002. Collective intelligence, data routing and Braess’ paradox. Journal of Artificial Intelligence Research16, 359–387.

Yang , H., Meng , Q. & Lee , D.-H.2004. Trial-and-error implementation of marginal-cost pricing on networks in the absence of demand functions. Transportation Research Part B: Methodological38(6), 477–493.

Ye , H., Yang , H. & Tan , Z.2015. Learning marginal-cost pricing via a trial-and-error procedure with day-to-day flow dynamics. Transportation Research Part B: Methodological81, 794–807.

Yen , J. Y.1971. Finding the k shortest loopless paths in a network. Management Science17(11), 712–716.

Youn , H., Gastner , M. T. & Jeong , H.2008. Price of anarchy in transportation networks: Efficiency and optimality control. Physical Review Letters101(12), 128701.

Zhang , J., Pourazarm , S., Cassandras , C. G. & Paschalidis , I. C.2016. The price of anarchy in transportation networks by estimating user cost functions from actual traffic data. In 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, 789–794.

Zinkevich , M.2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 928–936.