Generative Curricula for Multi-Agent Path Finding via Unsupervised and Reinforcement Learning

Abstract

T. Phan, T. Phan and S. Koenig. Generative Curricula for Multi-Agent Path Finding via Unsupervised and Reinforcement Learning. Journal of Artificial Intelligence Research, 82, 2471-2534, 2025.

Abstract: Multi-Agent Path Finding (MAPF) is the challenging problem of finding collision-free paths for multiple agents, which has a wide range of applications, such as automated warehouses, smart manufacturing, and traffic management. Recently, machine learning-based approaches have become popular in addressing MAPF problems in a decentralized and potentially generalizing way. Most learning-based MAPF approaches use reinforcement and imitation learning to train agent policies for decentralized execution under partial observability. However, current state-of-the-art approaches suffer from a prevalent bias to micro-aspects of particular MAPF problems, such as congestions in corridors and potential delays caused by single agents, leading to tight specializations through extensive engineering via oversized models, reward shaping, path finding algorithms, and communication. These specializations are generally detrimental to the sample efficiency, i.e., the learning progress given a certain amount of experience, and generalization to previously unseen scenarios. In contrast, curriculum learning offers an elegant and much simpler way of training agent policies in a step-by-step manner to master all aspects implicitly without extensive engineering. In this paper, we propose a generative curriculum approach to learning-based MAPF using Variational Autoencoder Utilized Learning of Terrains (VAULT). We introduce a two-stage framework to (I) train the VAULT via unsupervised learning to obtain a latent space representation of maps and (II) use the VAULT to generate curricula in order to improve sample efficiency and generalization of learning-based MAPF methods. For the second stage, we propose a bi-level curriculum scheme by combining our VAULT curriculum with a low-level curriculum method to improve sample efficiency further. Our framework is designed in a modular and general way, where each proposed component serves its purpose in a black-box manner without considering specific micro-aspects of the underlying problem. We empirically evaluate our approach in maps of the public MAPF benchmark set as well as novel artificial maps generated with the VAULT. Our results demonstrate the effectiveness of the VAULT as a map generator and our VAULT curriculum in improving sample efficiency and generalization of learning-based MAPF methods compared to alternative approaches. We also demonstrate how data pruning can further reduce the dependence on available maps without affecting the generalization potential of our approach.

Download the paper in pdf.

Many publishers do not want authors to make their papers available electronically after the papers have been published. Please use the electronic versions provided here only if hardcopies are not yet available. If you have comments on any of these papers, please send me an email! Also, please send me your papers if we have common interests.

This page was automatically created by a bibliography maintenance system that was developed as part of an undergraduate research project, advised by Sven Koenig.