Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil, e-mail: gdoramos@unisinos.br"/> Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: roxana@ai.vub.ac.be, ann.nowe@ai.vub.ac.be"/> Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, e-mails: bsilva@inf.ufrgs.br, bazzan@inf.ufrgs.br"/>
Search
2020 Volume 35
Article Contents
ORIGINAL RESEARCH   Open Access    

Toll-based reinforcement learning for efficient equilibria in route choice

More Information
  • Abstract: The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.
  • 加载中
  • Abdallah , S. & Lesser , V.2006. Learning the task allocation game, In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’06, ACM Press, 850–857.

    Google Scholar

    Agogino , A. K. & Tumer , K.2004. Unifying temporal and structural credit assignment problems, In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’04, IEEE, 980–987.

    Google Scholar

    Auer , P., Cesa-Bianchi , N., Freund , Y. & Schapire , R. E.2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing32(1), 48–77.

    Google Scholar

    Awerbuch , B. & Kleinberg , R. D.2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, ACM, 45–53.

    Google Scholar

    Bar-Gera , H.2010. Traffic assignment by paired alternative segments. Transportation Research Part B: Methodological44(8–9), 1022–1046.

    Google Scholar

    Bazzan , A. L. C.2009. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multiagent Systems18(3), 342–375.

    Google Scholar

    Bazzan , A. L. C. & Klügl , F.2005. Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models. Transportation Research C13(4), 299–319.

    Google Scholar

    Bazzan , A. L. C. & Klügl , F.2013. Introduction to Intelligent Systems inTraffic and Transportation, Vol. 7. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan and Claypool.

    Google Scholar

    Beckmann , M., McGuire , C. B. & Winsten , C. B.1956. Studies in the Economics of Transportation, Yale University Press, New Haven.

    Google Scholar

    Bonifaci , V., Salek , M. & Schäfer , G.2011. Efficiency of restricted tolls in non-atomic network routing games. In Algorithmic Game Theory: Proceedings of the 4th International Symposium (SAGT 2011), Persiano , G. (ed). Springer, 302–313.

    Google Scholar

    Bowling , M.2005. Convergence and no-regret in multiagent learning. In L. K.Saul , Y.Weiss & L.Bottou , (eds.) Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, MIT Press, 209–216.

    Google Scholar

    Boyan , J. A. & Littman , M. L.1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems6, 671–678.

    Google Scholar

    Braess , D.1968. Über ein Paradoxon aus der Verkehrsplanung. Unternehmensforschung12, 258.

    Google Scholar

    Buşoniu , L., Babuska , R. & De Schutter , B.2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews38(2), 156–172.

    Google Scholar

    Chen , H., An , B., Sharon , G., Hanna , J. P., Stone , P., Miao , C. & Soh , Y. C.2018. DyETC: Dynamic electronic toll collection for traffic congestion alleviation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), number February, AAAI Press, 757–765.

    Google Scholar

    Chen , P.-A. & Kempe , D.2008. Altruism, selfishness, and spite in traffic routing. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08), Riedl , J. & Sandholm , T. (eds.), ACM Press, 140–149.

    Google Scholar

    Claus , C. & Boutilier , C.1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 746–752.

    Google Scholar

    Colby , M., Duchow-Pressley , T., Chung , J. J. & Tumer , K.2016. Local approximation of difference evaluation functions. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), IFAAMAS, Singapore, 521–529.

    Google Scholar

    Cole , R., Dodis , Y. & Roughgarden , T.2003. Pricing network edges for heterogeneous selfish users. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC ’03, ACM, 521–530.

    Google Scholar

    Crites , R. H. & Barto , A. G.1998. Elevator group control using multiple reinforcement learning agents. Machine Learning33(2), 235–262.

    Google Scholar

    de Palma , A. & Lindsey , R.2011. Traffic congestion pricing methodologies and technologies. Transportation Research Part C: Emerging Technologies19(6), 1377–1399.

    Google Scholar

    Fehr , E. & Fischbacher , U.2003. The nature of human altruism. Nature425(6960), 785–791.

    Google Scholar

    Fleischer , L., Jain , K. & Mahdian , M.2004. Tolls for heterogeneous selfish users in multicommodity networks and generalized congestion games. In 45th Annual IEEE Symposium on Foundations of Computer Science, IEEE, Rome, Italy, 277–285.

    Google Scholar

    Foerster , J., Nardell , N., Farquhar , G., Afouras , T., Torr , P. H., Kohli , P. & Whiteson , S.2017. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, 70, PMLR, 1146–1155.

    Google Scholar

    Centre for Economics and Business Research2014. The Future Economic and Environmental Costs of Gridlock in 2030, Technical report, Centre for Economics and Business Research, London.

    Google Scholar

    Hernandez-Leal , P., Zhan , Y., Taylor , M. E., Sucar , L. E. & Munoz de Cote , E.2017. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems31(5), 971–1002.

    Google Scholar

    Hoefer , M. & Skopalik , A.2009. Altruism in atomic congestion games. In 17th Annual European Symposium on Algorithms, Fiat , A. & Sanders , P. (eds.), Springer, Berlin Heidelberg, 179–189.

    Google Scholar

    Hu , J. & Wellman , M. P.1998. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, 242–250.

    Google Scholar

    Hu , J. & Wellman , M. P.2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research4, 1039–1069.

    Google Scholar

    Jayakrishnan , R., Cohen , M., Kim , J., Mahmassani , H. S. & Hu , T.-Y.1993. A Simulation-Based Framework for the Analysis of Traffic Networks Operating with Real-Time Information, Technical Report UCB-ITS-PRR-93-25, University of California, Berkeley.

    Google Scholar

    Kaisers , M. & Tuyls , K.2010. Frequency adjusted multi-agent q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 309–316.

    Google Scholar

    Kobayashi , K. & Do , M.2005. The informational impacts of congestion tolls upon route traffic demands. Transportation Research A39(7–9), 651–670.

    Google Scholar

    Koutsoupias , E. & Papadimitriou , C.1999. Worst-case equilibria. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science (STACS), Springer-Verlag, 404–413.

    Google Scholar

    Lanctot , M., Zambaldi , V., Gruslys , A., Lazaridou , A., Tuyls , K., Perolat , J., Silver , D. & Graepel , T.2017. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, Guyon , I., Luxburg , U. V., Bengio , S., Wallach , H., Fergus , R., Vishwanathan , S. & Garnett , R. (eds.), 30, Curran Associates, Inc., 4190–4203.

    Google Scholar

    Lauer , M. & Riedmiller , M.2004. Reinforcement learning for stochastic cooperative multi-agent-systems. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 3, 1516–1517.

    Google Scholar

    Laurent , G. J., Matignon , L. & Le Fort-Piat , N.2011. The world of independent learners is not markovian. International Journal of Knowledge-based and Intelligent Engineering Systems15(1), 55–64.

    Google Scholar

    LeBlanc , L. J., Morlok , E. K. & Pierskalla , W. P.1975. An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research9(5), 309–318.

    Google Scholar

    Levy , N. & Ben-Elia , E.2016. Emergence of system optimum: A fair and altruistic agent-based route-choice model. Procedia Computer Science83, 928–933.

    Google Scholar

    Littman , M. L.1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, ML, Morgan Kaufmann, 157–163.

    Google Scholar

    Littman , M. L.2001. Friend-or-Foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML’01). Morgan Kaufmann, 322–328.

    Google Scholar

    Lujak , M., Giordani , S. & Ossowski , S.2015. Route guidance: Bridging system and user optimization in traffic assignment. Neurocomputing151, 449–460.

    Google Scholar

    Malialis , K., Devlin , S. & Kudenko , D.2016. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 503–511.

    Google Scholar

    Matignon , L., Laurent , G. J. & Le Fort-Piat , N.2012. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review27(1), 1–31.

    Google Scholar

    McFadden , D.2001. Disaggregate behavioral travel demand’s RUM side. In Travel Behaviour Research: The Leading Edge, Hensher , D. A. (ed), Elsevier, 17–63.

    Google Scholar

    Meir , R. & Parkes , D.2018. Playing the wrong game: Bounding externalities in diverse populations of agents. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), IFAAMAS, Stockholm, 86–94.

    Google Scholar

    Meir , R. & Parkes , D. C.2016. When are marginal congestion tolls optimal? In Proceedings of the Ninth Workshop on Agents in Traffic and Transportation (ATT-2016), Bazzan , A. L. C., Klügl , F., Ossowski , S. & Vizzari , G. (eds). CEUR-WS.org, 8.

    Google Scholar

    Mirzaei , H., Sharon , G., Boyles , S., Givargis , T. & Stone , P.2018. Link-based parameterized micro-tolling scheme for optimal traffic management. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’18, Dastani , M., Sukthankar , G., André , E. & Koenig , S. (eds). IFAAMAS, 2013–2015.

    Google Scholar

    Nash , J.1950. Non-Cooperative Games, PhD thesis, Princeton University.

    Google Scholar

    National Surface Transportation Infrastructure Financing Commission2009. Paying Our Way: A New Framework for Transportation Finance, Technical report, National Surface Transportation Infrastructure Financing Commission, Washington DC.

    Google Scholar

    Bureau of Public Roads1964. Traffic Assignment Manual, Technical report, US Department of Commerce, Washington, D. C.

    Google Scholar

    Omidshafiei , S., Pazis , J., Amato , C., How , J. P. & Vian , J.2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning, 70, 4108–4122.

    Google Scholar

    Ortúzar , J. d. D. & Willumsen , L. G.2011. Modelling Transport, 4 edition, John Wiley & Sons.

    Google Scholar

    Pigou , A.1920. The Economics of Welfare, Palgrave Classics in Economics, Palgrave Macmillan.

    Google Scholar

    Proper , S. & Tumer , K.2012. Modeling difference rewards for multiagent learning. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, V., Winikoff, M., Padgham, L. & van der Hoek, W. (eds). IFAAMAS.

    Google Scholar

    Rădulescu , R., Vrancx , P. & Nowé , A.2017. Analysing congestion problems in multi-agent reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 1705–1707.

    Google Scholar

    Ramos , G. de O.2018. Regret Minimisation and System-Efficiency in Route Choice, PhD thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre.

    Google Scholar

    Ramos , G. de O. & Bazzan , A. L. C.2015. Towards the user equilibrium in traffic assignment using GRASP with path relinking. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, ACM, 473–480.

    Google Scholar

    Ramos , G. de O. & Bazzan , A. L. C.2016. Efficient local search in traffic assignment. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1493–1500.

    Google Scholar

    Ramos , G. de O., da Silva , B. C. & Bazzan , A. L. C.2017. Learning to minimise regret in route choice. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das , S., Durfee , E., Larson , K. & Winikoff , M. (eds). IFAAMAS, 846–855.

    Google Scholar

    Ramos , G. de O., da Silva , B. C., Rădulescu , R. & Bazzan , A. L. C.2018. Learning system-efficient equilibria in route choice using tolls. In Proceedings of the Adaptive Learning Agents Workshop 2018 (ALA-18), Stockholm.

    Google Scholar

    Ramos , G. de O., Rădulescu , R., Nowé , A. & Tavares , A. R. 2020. Toll-based learning for minimising congestion under heterogeneous preferences. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), An, B., Yorke-Smith, N., El Fallah Seghrouchni, A. & Sukthankar , G. (eds). IFAAMAS.

    Google Scholar

    Robbins , H.1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society58(5), 527–535.

    Google Scholar

    Rosenthal , R. W.1973. A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory2, 65–67.

    Google Scholar

    Roughgarden , T.2005. Selfish Routing and the Price of Anarchy, MIT Press.

    Google Scholar

    Sandholm , T.2007. Perspectives on multiagent learning. Artificial Intelligence171(7), 382–391.

    Google Scholar

    Sen , S., Sekaran , M. & Hale , J.1994. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 426–431.

    Google Scholar

    Sharon , G., Boyles , S. D., Alkoby , S. & Stone , P.2019. Marginal cost pricing with a fixed error factor in traffic networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Agmon , N., Taylor , M., Elkind , E. & Veloso , M. (eds). IFAAMAS, Montreal, 1539–1546.

    Google Scholar

    Sharon , G., Hanna , J. P., Rambha , T., Levin , M. W., Albert , M., Boyles , S. D. & Stone , P.2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das , S., Durfee , E., Larson , K. & Winikoff , M. (eds). IFAAMAS, 828–836.

    Google Scholar

    Stefanello , F. & Bazzan , A. L. C.2016. Traffic Assignment Problem – Extending Braess Paradox, Technical report, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS.

    Google Scholar

    Sutton , R. & Barto , A.1998. Reinforcement Learning: An Introduction, MIT Press.

    Google Scholar

    Tan , M.1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337.

    Google Scholar

    Tavares , A. R. & Bazzan , A. L.2014. An agent-based approach for road pricing: System-level performance and implications for drivers. Journal of the Brazilian Computer Society20(1), 15.

    Google Scholar

    Tesauro , G.1994. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation6(2), 215–219.

    Google Scholar

    Tuyls , K. & Weiss , G.2012. Multiagent learning: Basics, challenges, and prospects. AI Magazine33(3), 41–52.

    Google Scholar

    van Essen , M., Thomas , T., van Berkum , E. & Chorus , C.2016. From user equilibrium to system optimum: a literature review on the role of travel information, bounded rationality and non-selfish behaviour at the network and individual levels. Transport Reviews36(4), 527–548.

    Google Scholar

    Verbeeck , K., Nowé , A., Parent , J. & Tuyls , K.2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems14(3), 239–269.

    Google Scholar

    Vrancx , P., Verbeeck , K. & Nowe , A.2008. Decentralized learning in markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)38(4), 976–981.

    Google Scholar

    Vrancx , P., Verbeeck , K. & Nowé , A.2010. Learning to take turns. In Proceedings of the AAMAS 2010 Workshop on Adaptive Learning Agents and Multi-Agent Systems (ALA 2010), 1–7.

    Google Scholar

    Wardrop , J. G.1952. Some theoretical aspects of road traffic research. Proceedings of the Institution of Civil Engineers, Part II 1(36), 325–362.

    Google Scholar

    Watkins , C. J. C. H. & Dayan , P.1992. Q-learning. Machine Learning8(3), 279–292.

    Google Scholar

    Wolpert , D. H. & Tumer , K.1999. An introduction to Collective Intelligence, Technical report NASA-ARC-IC-99-63, NASA Ames Research Center. arXiv:cs/9908014 [cs.LG].

    Google Scholar

    Wolpert , D. H. & Tumer , K.2002. Collective intelligence, data routing and Braess’ paradox. Journal of Artificial Intelligence Research16, 359–387.

    Google Scholar

    Yang , H., Meng , Q. & Lee , D.-H.2004. Trial-and-error implementation of marginal-cost pricing on networks in the absence of demand functions. Transportation Research Part B: Methodological38(6), 477–493.

    Google Scholar

    Ye , H., Yang , H. & Tan , Z.2015. Learning marginal-cost pricing via a trial-and-error procedure with day-to-day flow dynamics. Transportation Research Part B: Methodological81, 794–807.

    Google Scholar

    Yen , J. Y.1971. Finding the k shortest loopless paths in a network. Management Science17(11), 712–716.

    Google Scholar

    Youn , H., Gastner , M. T. & Jeong , H.2008. Price of anarchy in transportation networks: Efficiency and optimality control. Physical Review Letters101(12), 128701.

    Google Scholar

    Zhang , J., Pourazarm , S., Cassandras , C. G. & Paschalidis , I. C.2016. The price of anarchy in transportation networks by estimating user cost functions from actual traffic data. In 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, 789–794.

    Google Scholar

    Zinkevich , M.2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 928–936.

    Google Scholar

  • Cite this article

    Gabriel de O. Ramos, Bruno C. Da Silva, Roxana Rădulescu, Ana L. C. Bazzan, Ann Nowé. 2020. Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000119
    Gabriel de O. Ramos, Bruno C. Da Silva, Roxana Rădulescu, Ana L. C. Bazzan, Ann Nowé. 2020. Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000119

Article Metrics

Article views(73) PDF downloads(71)

ORIGINAL RESEARCH   Open Access    

Toll-based reinforcement learning for efficient equilibria in route choice

Abstract: Abstract: The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

    • The authors thank the anonymous reviewers for their thorough analysis and valuable suggestions. Ramos, Rădulescu, and Nowé were supported by Flanders Innovation & Entrepreneurship (VLAIO), SBO project 140047: Stable MultI-agent LEarnIng for neTworks (SMILE-IT). Part of this research was also supported by the The Flanders AI Research Impulse Program, Belgium. Ramos and da Silva were partially supported by FAPERGS (grants 19/2551-0001277-2 and 17/2551-000, respectively). Bazzan was partially supported by CNPq (grant 307215/2017-2). This work was also partially supported by CNPq and CAPES scholarships.

    • We abuse notation here and use $n_u^{p}$ ($n_v^{p}$) to denote the start (end) node of the $p^{th}$ link of route R.

    • As discussed forward, links’ costs are represented by positive reals. Since for every route with a cycle we can obtain an equivalent sequence without cycles, then cycles can be ignored without loss of generality.

    • Observe that although the reward an agent receives is formulated as a function of its route, it actually depends on the flow of vehicles on the links that comprise that route. This is expressed by means of the VDF function, as explained in Section 2.1, whose value is a function of the flow of vehicles on its links. Furthermore, since we assume a macroscopic traffic model, we remark that all routes’ costs can be computed together, when all drivers complete their trips.

    • We remark that, using a macroscopic traffic model (see Section 2), travel times are computed whenever all drivers complete their trips.

    • Hereafter, we refer to the action with highest Q-value as the best action and to the other actions as non-best. Observe that the best action is not necessarily optimal.

    • In order to improve presentation, whenever it is clear from the context, we refer to $\rho({{\hbox{$\begin{matrix}\scriptscriptstyle+\\[-5.6pt]a\end{matrix}$}}}{}^t_i)$ and $\rho(\bar{a}{}^t_i)$ as ${{\hbox{$\begin{matrix}\scriptscriptstyle+\\[-5.6pt]\rho\end{matrix}$}}}{}^t_i$ and $\bar{\rho}{}^t_i$, respectively. Whenever possible, we also omit t and i.

    • The road networks are publicly available at .

    • As for the BB networks, we enforced the route with fewest links among the shortest ones, otherwise the system optimum would not be attainable, as discussed in Ramos (2018).

    • We remark that, although other tolling schemes are also available in the literature (see discussion in Section 3.2), these cannot be directly applied to our problem. This is because such approaches do not present a learning scheme and cannot be run distributedly by the drivers.

    • Since our objective here is to compare the quality of the solutions obtained by the methods, we adopted the traditional version of difference rewards (DR, which assumes full observability in order to compute the difference signals) rather than the most recent, function approximation-based version (FADR) by Colby et al. (2016). The reason is that DR obtains better results than FADR (as discussed in Section 3). Hence, we believe this choice promotes a fairer comparison of DR against our method.

    • © Cambridge University Press 20202020Cambridge University Press
References (88)
  • About this article
    Cite this article
    Gabriel de O. Ramos, Bruno C. Da Silva, Roxana Rădulescu, Ana L. C. Bazzan, Ann Nowé. 2020. Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000119
    Gabriel de O. Ramos, Bruno C. Da Silva, Roxana Rădulescu, Ana L. C. Bazzan, Ann Nowé. 2020. Toll-based reinforcement learning for efficient equilibria in route choice. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000119
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return