doi:10.1017/S0269888925100052

Abels , A., Roijers , D., Lenaerts , T., Nowé , A. & Steckelmacher , D. 2019. Dynamic weights in multi-objective deep reinforcement learning. In International Conference on Machine Learning (ICML), 11–20.

Alegre , L. N., Bazzan , A. L., Roijers , D. M., Nowé , A. & da Silva , B. C. 2023. Sample-efficient multi-objective learning via generalized policy improvement prioritization. arXiv preprint arXiv:230107784.

Bai , Q., Agarwal , M. & Aggarwal , V. 2021. Joint optimization of multi-objective reinforcement learning with policy gradient based algorithm. arXiv preprint arXiv: 210514125.10.1613/jair.1.13981

Basaklar , T., Gumussoy , S. & Ogras , U. Y. 2022. Pd-morl: Preference-driven multi-objective reinforcement learning algorithm. arXiv preprint arXiv:220807914.

Bryce , D., Cushing , W. & Kambhampati , S. 2007. Probabilistic Planning Is Multi-Objective. Arizona State University Computer Science and Engineering Technical Report 07-006.

Cai , X. Q., Zhang , P., Zhao , L., Bian , J., Sugiyama , M. & Llorens , A. 2023. Distributional pareto-optimal multi-objective reinforcement learning. In Advances in Neural Information Processing Systems 36, 15593–15613.

Ding , K. 2022. Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning. arXiv preprint arXiv: 221108669.

Dornheim , J. 2022. gTLO: A generalized and non-linear multi-objective deep reinforcement learning approach. arXiv preprint arXiv: 220404988.

Drugan , M. M. & Nowe , A. 2013. Designing multi-objective multi-armed bandits algorithms: A study. In The 2013 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.10.1109/IJCNN.2013.6707036

Fan , Z., Peng , N., Tian , M. & Fain , B. 2022. Welfare and fairness in multi-objective reinforcement learning. arXiv preprint arXiv:221201382.

Felten , F., Alegre , L. N., Nowé , A., Bazzan , A. L. C., Talbi , E. G., Danoy , G. & da Silva , B. C. 2023. A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023).

Gábor , Z., Kalmár , Z. & Szepesvári , C. 1998. Multi-criteria reinforcement learning. In ICML, 98, 197–205.

Hayes , C. F., Howley , E. & Mannion , P. 2020. Dynamic thresholded lexicograpic ordering. In Adaptive and Learning Agents Workshop (AAMAS 2020).

Hayes , C. F., Rădulescu , R., Bargiacchi , E., Källström , J., Macfarlane , M., Reymond , M., Verstraeten , T., Zintgraf , L., Dazeley , R., Heintz , F., Howley , E., Irissappane , A., Mannion , P., Nowé , A., Ramos , G., Restelli , M., Vamplew , P. & Roijers , D. 2022b. A practical guide to multi-objective reinforcement learning and planning. In Autonomous Agents and Multi-Agent Systems 36. DOI 10.1007/s10458-022-09552-y10.1007/s10458-022-09552-y

Hayes , C. F., Roijers , D. M., Howley , E. & Mannion , P. 2022a. Multi-objective distributional value iteration. In Adaptive and Learning Agents Workshop (AAMAS 2022).

Huanca-Anquise , C. A., Bazzan , A. L. C. & Tavares , A. R. 2023. Multi-objective, multi-armed bandits: Algorithms for repeated games and application to route choice. Revista de Informática Teórica e Aplicada 30(1), 11–23.10.22456/2175-2745.122929

Issabekov , R. & Vamplew , P. 2012. An empirical comparison of two common multiobjective reinforcement learning algorithms. In AI 2012: Advances in Artificial Intelligence, Thielscher , M. & Zhang , D. (eds). Lecture Notes in Computer Science, 626–636.

Jin , J. & Ma , X. 2017. A multi-objective multi-agent framework for traffic light control. In 2017 11th Asian Control Conference (ASCC), 1199–1204. IEEE.10.1109/ASCC.2017.8287341

Kulkarni , T. D., Saeedi , A., Gautam , S. & Gershman , S. J. 2016. Deep successor reinforcement learning. https://arxiv.org/abs/1606.02396,

Lian , Z., Lv , C. & Lu , W. 2023. Inkjet OLED printing planning based on deep reinforcement learning and reward-based TLO. Journal of Physics: Conference Series, IOP Publishing 2450, 012081.

Lu , H., Herman , D. & Yu , Y. 2023. Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. In The Eleventh International Conference on Learning Representations.

Machado , M. C., Barreto , A., Precup , D. & Bowling , M. 2023. Temporal abstraction in reinforcement learning with the successor representation. Journal of Machine Learning Research 24(80), 1–69.

Parisi , S., Pirotta , M., Smacchia , N., Bascetta , L. & Restelli , M. 2014. Policy gradient approaches for multi-objective sequential decision making. In 2014 International Joint Conference on Neural Networks (IJCNN), 2323–2330. IEEE.10.1109/IJCNN.2014.6889738

Reymond , M., Bargiacchi , E. & Nowé , A. 2022. Pareto conditioned networks. arXiv preprint arXiv:220405036.

Roijers , D. M., Vamplew , P., Whiteson , S. & Dazeley , R. 2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48, 67–113.10.1613/jair.3987

Röpke , W., Reymond , M., Mannion , P., Roijers , D. M., Nowé , A. & Rădulescu, R. 2024. Divide and conquer: Provably unveiling the pareto front with multi-objective reinforcement learning. arXiv preprint arXiv: 240207182.

Siddique , U., Weng , P. & Zimmer , M. 2020. Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards. In International Conference on Machine Learning, 8905–8915. PMLR.

Skalse , J., Hammond , L., Griffin , C.& Abate , A. 2022. Lexicographic multi-objective reinforcement learning. arXiv preprint arXiv: 221213769.10.24963/ijcai.2022/476

Sutton , R. S. & Barto , A. G. 2018. Reinforcement Learning: An Introduction. MIT Press.

Sutton , R. S., Precup , D. & Singh , S. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1-2), 181–211.10.1016/S0004-3702(99)00052-1

Tercan , A. 2022. Solving MDPs with Thresholded Lexicographic Ordering Using Reinforcement Learning. PhD thesis, Colorado State University.

Tessler , C., Mankowitz , D. J. & Mannor , S. 2018. Reward constrained policy optimization. arXiv preprint arXiv: 180511074.

Vamplew , P., Dazeley , R., Barker , E. & Kelarev , A. 2009. Constructing stochastic mixture policies for episodic multiobjective reinforcement learning tasks. In AJCAI, 340–349. Springer.10.1007/978-3-642-10439-8_35

Vamplew , P., Dazeley , R., Berry , A., Issabekov , R. & Dekker , E. 2011. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning 84(1-2), 51–80.10.1007/s10994-010-5232-5

Vamplew , P., Dazeley , R. & Foale , C. 2017. Softmax exploration strategies for multiobjective reinforcement learning. Neurocomputing 263, 74–86.10.1016/j.neucom.2016.09.141

Vamplew , P., Foale , C., Dazeley , R. & Bignold , A. 2021. Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence 100, 104186.10.1016/j.engappai.2021.104186

Vamplew , P., Foale , C. & Dazeley , R. 2022a. The impact of environmental stochasticity on value-based multiobjective reinforcement learning. Neural Computing and Applications 34(3), 1783–1799. DOI 10.1007/s00521-021-05859-110.1007/s00521-021-05859-1

doi: 10.1007/s00521-021-05859-1

Vamplew , P., Foale , C. & Dazeley , R. 2024. Value function interference and greedy action selection in value-based multi-objective reinforcement learning. arXiv preprint arXiv: 240206266.

Vamplew , P., Smith , B. J., Källström , J., Ramos , G., Rădulescu , R., Roijers , D. M., Hayes , C. F., Heintz , F., Mannion , P., Libin , P. J., et al. 2022b. Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021). Autonomous Agents and Multi-Agent Systems 36(2), 41.10.1007/s10458-022-09575-5

Vamplew , P., Yearwood , J., Dazeley , R. & Berry , A. 2008. On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts. Springer-Verlag.10.1007/978-3-540-89378-3_37

Van Hasselt , H., Doron , Y., Strub , F., Hessel , M., Sonnerat , N. & Modayil , J. 2018. Deep reinforcement learning and the deadly triad. arXiv preprint arXiv: 181202648.

Van Moffaert , K., Drugan , M. M. & Nowé , A. 2013. Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).10.1109/ADPRL.2013.6615007

Vincent , M. 2024. Nonlinear scalarization in stochastic multi-objective mdps. Neural Computing and Applications, 1–13. https://link.springer.com/article/10.1007/s00521-024-10504-8#citeas 10.1007/s00521-024-10504-8

Xu , J., Tian , Y., Ma , P., Rus , D., Sueda , S. & Matusik , W. 2020. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International Conference on Machine Learning, 10607–10616. PMLR.

Yang , R., Sun , X. & Narasimhan , K. 2019. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In Advances in Neural Information Processing Systems 32.