Toll-based reinforcement learning for efficient equilibria in route choice

Gabriel de O. Ramos; Bruno C. Da Silva; Roxana Rădulescu; Ana L. C. Bazzan; Ann Nowé; Gabriel de O. Ramos; Bruno C. Da Silva; Roxana Rădulescu; Ana L. C. Bazzan; Ann Nowé

doi:10.1017/S0269888920000119

2020 Volume 35

Article Contents

Next Previous

ORIGINAL RESEARCH Open Access

Toll-based reinforcement learning for efficient equilibria in route choice

¹Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil, e-mail: gdoramos@unisinos.br
²Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: roxana@ai.vub.ac.be, ann.nowe@ai.vub.ac.be
³Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, e-mails: bsilva@inf.ufrgs.br, bazzan@inf.ufrgs.br

More Information

Received: 16 December 2018
Revised: 30 January 2020
Accepted: 30 January 2020
Published online: 05 March 2020
The Knowledge Engineering Review 35, Article number: e8 (2020) | Cite this article

Abstract

Abstract: The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.
Rights and permissions
© Cambridge University Press 20202020Cambridge University Press

References

Abdallah , S. & Lesser , V.2006. Learning the task allocation game, In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’06, ACM Press, 850–857.

Google Scholar

Agogino , A. K. & Tumer , K.2004. Unifying temporal and structural credit assignment problems, In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’04, IEEE, 980–987.

Google Scholar

Auer , P., Cesa-Bianchi , N., Freund , Y. & Schapire , R. E.2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing32(1), 48–77.

Google Scholar

Awerbuch , B. & Kleinberg , R. D.2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, ACM, 45–53.

Google Scholar

Bar-Gera , H.2010. Traffic assignment by paired alternative segments. Transportation Research Part B: Methodological44(8–9), 1022–1046.

Google Scholar

Bazzan , A. L. C.2009. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multiagent Systems18(3), 342–375.

Google Scholar

Bazzan , A. L. C. & Klügl , F.2005. Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models. Transportation Research C13(4), 299–319.

Google Scholar

Bazzan , A. L. C. & Klügl , F.2013. Introduction to Intelligent Systems inTraffic and Transportation, Vol. 7. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan and Claypool.

Google Scholar

Beckmann , M., McGuire , C. B. & Winsten , C. B.1956. Studies in the Economics of Transportation, Yale University Press, New Haven.

Google Scholar

Bonifaci , V., Salek , M. & Schäfer , G.2011. Efficiency of restricted tolls in non-atomic network routing games. In Algorithmic Game Theory: Proceedings of the 4th International Symposium (SAGT 2011), Persiano , G. (ed). Springer, 302–313.

Google Scholar

Bowling , M.2005. Convergence and no-regret in multiagent learning. In L. K.Saul , Y.Weiss & L.Bottou , (eds.) Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, MIT Press, 209–216.

Google Scholar

Boyan , J. A. & Littman , M. L.1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems6, 671–678.

Google Scholar

Braess , D.1968. Über ein Paradoxon aus der Verkehrsplanung. Unternehmensforschung12, 258.

Google Scholar

Buşoniu , L., Babuska , R. & De Schutter , B.2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews38(2), 156–172.

Google Scholar

Chen , H., An , B., Sharon , G., Hanna , J. P., Stone , P., Miao , C. & Soh , Y. C.2018. DyETC: Dynamic electronic toll collection for traffic congestion alleviation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), number February, AAAI Press, 757–765.

Google Scholar

Chen , P.-A. & Kempe , D.2008. Altruism, selfishness, and spite in traffic routing. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08), Riedl , J. & Sandholm , T. (eds.), ACM Press, 140–149.

Google Scholar

Claus , C. & Boutilier , C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 746–752.

Google Scholar

Colby , M., Duchow-Pressley , T., Chung , J. J. & Tumer , K.2016. Local approximation of difference evaluation functions. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), IFAAMAS, Singapore, 521–529.

Google Scholar

Cole , R., Dodis , Y. & Roughgarden , T.2003. Pricing network edges for heterogeneous selfish users. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC ’03, ACM, 521–530.

Google Scholar

Crites , R. H. & Barto , A. G.1998. Elevator group control using multiple reinforcement learning agents. Machine Learning33(2), 235–262.

Google Scholar

de Palma , A. & Lindsey , R.2011. Traffic congestion pricing methodologies and technologies. Transportation Research Part C: Emerging Technologies19(6), 1377–1399.

Google Scholar

Fehr , E. & Fischbacher , U.2003. The nature of human altruism. Nature425(6960), 785–791.

Google Scholar

Fleischer , L., Jain , K. & Mahdian , M.2004. Tolls for heterogeneous selfish users in multicommodity networks and generalized congestion games. In 45th Annual IEEE Symposium on Foundations of Computer Science, IEEE, Rome, Italy, 277–285.

Google Scholar

Foerster , J., Nardell , N., Farquhar , G., Afouras , T., Torr , P. H., Kohli , P. & Whiteson , S.2017. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, 70, PMLR, 1146–1155.

Google Scholar

Centre for Economics and Business Research2014. The Future Economic and Environmental Costs of Gridlock in 2030, Technical report, Centre for Economics and Business Research, London.

Google Scholar

Hernandez-Leal , P., Zhan , Y., Taylor , M. E., Sucar , L. E. & Munoz de Cote , E.2017. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems31(5), 971–1002.

Google Scholar

Hoefer , M. & Skopalik , A.2009. Altruism in atomic congestion games. In 17th Annual European Symposium on Algorithms, Fiat , A. & Sanders , P. (eds.), Springer, Berlin Heidelberg, 179–189.

Google Scholar

Hu , J. & Wellman , M. P.1998. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, 242–250.

Google Scholar

Hu , J. & Wellman , M. P.2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research4, 1039–1069.

Google Scholar

Jayakrishnan , R., Cohen , M., Kim , J., Mahmassani , H. S. & Hu , T.-Y.1993. A Simulation-Based Framework for the Analysis of Traffic Networks Operating with Real-Time Information, Technical Report UCB-ITS-PRR-93-25, University of California, Berkeley.

Google Scholar

Kaisers , M. & Tuyls , K.2010. Frequency adjusted multi-agent q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 309–316.

Google Scholar

Kobayashi , K. & Do , M.2005. The informational impacts of congestion tolls upon route traffic demands. Transportation Research A39(7–9), 651–670.

Google Scholar

Koutsoupias , E. & Papadimitriou , C.1999. Worst-case equilibria. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science (STACS), Springer-Verlag, 404–413.

Google Scholar

Lanctot , M., Zambaldi , V., Gruslys , A., Lazaridou , A., Tuyls , K., Perolat , J., Silver , D. & Graepel , T.2017. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, Guyon , I., Luxburg , U. V., Bengio , S., Wallach , H., Fergus , R., Vishwanathan , S. & Garnett , R. (eds.), 30, Curran Associates, Inc., 4190–4203.

Google Scholar

Lauer , M. & Riedmiller , M.2004. Reinforcement learning for stochastic cooperative multi-agent-systems. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 3, 1516–1517.

Google Scholar

Laurent , G. J., Matignon , L. & Le Fort-Piat , N.2011. The world of independent learners is not markovian. International Journal of Knowledge-based and Intelligent Engineering Systems15(1), 55–64.

Google Scholar

LeBlanc , L. J., Morlok , E. K. & Pierskalla , W. P.1975. An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research9(5), 309–318.

Google Scholar

Levy , N. & Ben-Elia , E.2016. Emergence of system optimum: A fair and altruistic agent-based route-choice model. Procedia Computer Science83, 928–933.

Google Scholar

Littman , M. L.1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, ML, Morgan Kaufmann, 157–163.

Google Scholar

Littman , M. L.2001. Friend-or-Foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML’01). Morgan Kaufmann, 322–328.

Google Scholar

Lujak , M., Giordani , S. & Ossowski , S.2015. Route guidance: Bridging system and user optimization in traffic assignment. Neurocomputing151, 449–460.

Google Scholar

Malialis , K., Devlin , S. & Kudenko , D.2016. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 503–511.

Google Scholar

Matignon , L., Laurent , G. J. & Le Fort-Piat , N.2012. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review27(1), 1–31.

Google Scholar

McFadden , D.2001. Disaggregate behavioral travel demand’s RUM side. In Travel Behaviour Research: The Leading Edge, Hensher , D. A. (ed), Elsevier, 17–63.

Google Scholar

Meir , R. & Parkes , D.2018. Playing the wrong game: Bounding externalities in diverse populations of agents. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), IFAAMAS, Stockholm, 86–94.

Google Scholar

Meir , R. & Parkes , D. C.2016. When are marginal congestion tolls optimal? In Proceedings of the Ninth Workshop on Agents in Traffic and Transportation (ATT-2016), Bazzan , A. L. C., Klügl , F., Ossowski , S. & Vizzari , G. (eds). CEUR-WS.org, 8.

Google Scholar

Mirzaei , H., Sharon , G., Boyles , S., Givargis , T. & Stone , P.2018. Link-based parameterized micro-tolling scheme for optimal traffic management. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’18, Dastani , M., Sukthankar , G., André , E. & Koenig , S. (eds). IFAAMAS, 2013–2015.

Google Scholar

Nash , J.1950. Non-Cooperative Games, PhD thesis, Princeton University.

Google Scholar

National Surface Transportation Infrastructure Financing Commission2009. Paying Our Way: A New Framework for Transportation Finance, Technical report, National Surface Transportation Infrastructure Financing Commission, Washington DC.

Google Scholar

Bureau of Public Roads1964. Traffic Assignment Manual, Technical report, US Department of Commerce, Washington, D. C.

Google Scholar

Omidshafiei , S., Pazis , J., Amato , C., How , J. P. & Vian , J.2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning, 70, 4108–4122.

Google Scholar

Ortúzar , J. d. D. & Willumsen , L. G.2011. Modelling Transport, 4 edition, John Wiley & Sons.

Google Scholar

Pigou , A.1920. The Economics of Welfare, Palgrave Classics in Economics, Palgrave Macmillan.

Google Scholar

Proper , S. & Tumer , K.2012. Modeling difference rewards for multiagent learning. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, V., Winikoff, M., Padgham, L. & van der Hoek, W. (eds). IFAAMAS.

Google Scholar

Rădulescu , R., Vrancx , P. & Nowé , A.2017. Analysing congestion problems in multi-agent reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 1705–1707.

Google Scholar

Ramos , G. de O.2018. Regret Minimisation and System-Efficiency in Route Choice, PhD thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre.

Google Scholar

Ramos , G. de O. & Bazzan , A. L. C.2015. Towards the user equilibrium in traffic assignment using GRASP with path relinking. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, ACM, 473–480.

Google Scholar

Ramos , G. de O. & Bazzan , A. L. C.2016. Efficient local search in traffic assignment. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1493–1500.

Google Scholar

Ramos , G. de O., da Silva , B. C. & Bazzan , A. L. C.2017. Learning to minimise regret in route choice. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das , S., Durfee , E., Larson , K. & Winikoff , M. (eds). IFAAMAS, 846–855.

Google Scholar

Ramos , G. de O., da Silva , B. C., Rădulescu , R. & Bazzan , A. L. C.2018. Learning system-efficient equilibria in route choice using tolls. In Proceedings of the Adaptive Learning Agents Workshop 2018 (ALA-18), Stockholm.

Google Scholar

Ramos , G. de O., Rădulescu , R., Nowé , A. & Tavares , A. R. 2020. Toll-based learning for minimising congestion under heterogeneous preferences. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), An, B., Yorke-Smith, N., El Fallah Seghrouchni, A. & Sukthankar , G. (eds). IFAAMAS.

Google Scholar

Robbins , H.1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society58(5), 527–535.

Google Scholar

Rosenthal , R. W.1973. A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory2, 65–67.

Google Scholar

Roughgarden , T.2005. Selfish Routing and the Price of Anarchy, MIT Press.

Google Scholar

Sandholm , T.2007. Perspectives on multiagent learning. Artificial Intelligence171(7), 382–391.

Google Scholar

Sen , S., Sekaran , M. & Hale , J.1994. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 426–431.

Google Scholar

Sharon , G., Boyles , S. D., Alkoby , S. & Stone , P.2019. Marginal cost pricing with a fixed error factor in traffic networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Agmon , N., Taylor , M., Elkind , E. & Veloso , M. (eds). IFAAMAS, Montreal, 1539–1546.

Google Scholar

Sharon , G., Hanna , J. P., Rambha , T., Levin , M. W., Albert , M., Boyles , S. D. & Stone , P.2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das , S., Durfee , E., Larson , K. & Winikoff , M. (eds). IFAAMAS, 828–836.

Google Scholar

Stefanello , F. & Bazzan , A. L. C.2016. Traffic Assignment Problem – Extending Braess Paradox, Technical report, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS.

Google Scholar

Sutton , R. & Barto , A.1998. Reinforcement Learning: An Introduction, MIT Press.

Google Scholar

Tan , M.1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337.

Google Scholar

Tavares , A. R. & Bazzan , A. L.2014. An agent-based approach for road pricing: System-level performance and implications for drivers. Journal of the Brazilian Computer Society20(1), 15.

Google Scholar

Tesauro , G.1994. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation6(2), 215–219.

Google Scholar

Tuyls , K. & Weiss , G.2012. Multiagent learning: Basics, challenges, and prospects. AI Magazine33(3), 41–52.

Google Scholar

van Essen , M., Thomas , T., van Berkum , E. & Chorus , C.2016. From user equilibrium to system optimum: a literature review on the role of travel information, bounded rationality and non-selfish behaviour at the network and individual levels. Transport Reviews36(4), 527–548.

Google Scholar

Verbeeck , K., Nowé , A., Parent , J. & Tuyls , K.2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems14(3), 239–269.

Google Scholar

Vrancx , P., Verbeeck , K. & Nowe , A.2008. Decentralized learning in markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)38(4), 976–981.

Google Scholar

Vrancx , P., Verbeeck , K. & Nowé , A.2010. Learning to take turns. In Proceedings of the AAMAS 2010 Workshop on Adaptive Learning Agents and Multi-Agent Systems (ALA 2010), 1–7.

Google Scholar

Wardrop , J. G.1952. Some theoretical aspects of road traffic research. Proceedings of the Institution of Civil Engineers, Part II 1(36), 325–362.

Google Scholar

Watkins , C. J. C. H. & Dayan , P.1992. Q-learning. Machine Learning8(3), 279–292.

Google Scholar

Wolpert , D. H. & Tumer , K.1999. An introduction to Collective Intelligence, Technical report NASA-ARC-IC-99-63, NASA Ames Research Center. arXiv:cs/9908014 [cs.LG].

Google Scholar

Wolpert , D. H. & Tumer , K.2002. Collective intelligence, data routing and Braess’ paradox. Journal of Artificial Intelligence Research16, 359–387.

Google Scholar

Yang , H., Meng , Q. & Lee , D.-H.2004. Trial-and-error implementation of marginal-cost pricing on networks in the absence of demand functions. Transportation Research Part B: Methodological38(6), 477–493.

Google Scholar

Ye , H., Yang , H. & Tan , Z.2015. Learning marginal-cost pricing via a trial-and-error procedure with day-to-day flow dynamics. Transportation Research Part B: Methodological81, 794–807.

Google Scholar

Yen , J. Y.1971. Finding the k shortest loopless paths in a network. Management Science17(11), 712–716.

Google Scholar

Youn , H., Gastner , M. T. & Jeong , H.2008. Price of anarchy in transportation networks: Efficiency and optimality control. Physical Review Letters101(12), 128701.

Google Scholar

Zhang , J., Pourazarm , S., Cassandras , C. G. & Paschalidis , I. C.2016. The price of anarchy in transportation networks by estimating user cost functions from actual traffic data. In 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, 789–794.

Google Scholar

Zinkevich , M.2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 928–936.

Google Scholar

About this article

Cite this article

Gabriel de O. Ramos, Bruno C. Da Silva, Roxana Rădulescu, Ana L. C. Bazzan, Ann Nowé. 2020. Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review. 35: doi: 10.1017/S0269888920000119

Gabriel de O. Ramos, Bruno C. Da Silva, Roxana Rădulescu, Ana L. C. Bazzan, Ann Nowé. 2020. Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review. 35: doi: 10.1017/S0269888920000119

Download PDF

Article Metrics

Article views(288) PDF downloads(315)

{{lists.name}}

Toll-based reinforcement learning for efficient equilibria in route choice

Abstract

Rights and permissions

References

About this article

Cite this article

Article Metrics

Access History

Other Articles By Authors