Abstract

W. Chen, S. Koenig and B. Dilkina. Solving Multi-Agent Path Finding as an LLM Benchmark: How, How Good and Why. Transactions on Machine Learning Research, June, 2025.

Abstract: The rapid success of large language models (LLMs) has spurred extensive research into their ability to solve a wide range of tasks. However, their potential in multi-agent planning remains underexplored. Multi-agent planning presents unique challenges due to the combined complexity of coordination and long-horizon reasoning, often making it difficult to leverage external tools for assistance. In this paper, we introduce Multi-Agent Path Finding (MAPF), also known as multi-robot route planning, as a novel benchmark for evaluating the reasoning capabilities of LLMs. We first describe how the MAPF benchmark can be adapted for LLM-based evaluation, including dataset curation and an agentic workflow for LLMs. We show the motivating success of single-agent planning and multi-agent pathfinding in an empty room map without obstacles, then the failure to plan on the harder room map and maze map of the standard MAPF benchmark. We present our position on why directly solving MAPF with LLMs has not been successful yet, and we use various experiments to support our hypothesis. Based on our results, we discussed how researchers with different backgrounds could help with this problem from different perspectives.

Many publishers do not want authors to make their papers available electronically after the papers have been published. Please use the electronic versions provided here only if hardcopies are not yet available. If you have comments on any of these papers, please send me an email! Also, please send me your papers if we have common interests.


This page was automatically created by a bibliography maintenance system that was developed as part of an undergraduate research project, advised by Sven Koenig.