HTML
This work is partially supported by the Smart Surface ANR (French National Research Agency) project (ANR_06_ROBO_0009_03).
-
Markov games are also called stochastic games, but we use the term Markov games to avoid confusion with stochastic (non-deterministic) Markov games.
-
The greedy policy based on Qi picks for every state the action with the highest Q-value.
-
Library of Simulink tools for reinforcement learning.
-
12 and 6 are received with equal probabilities in the stochastic mode.
-
This concept is close to the concept of off-policy algorithms for single-agent problem, see Sutton and Barto (1998), for example.
-
With the exception of distributed Q-learning that complies with its theoretical guarantees.
-
Copyright © Cambridge University Press 2012 2012 Cambridge University Press
| Laetitia Matignon, Guillaume J. Laurent, Nadine Le Fort-Piat. 2012. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review. 27:57 doi: 10.1017/S0269888912000057 |





