Multi-Object Navigation (MultiON) Challenge

Held in conjunction with the Embodied AI Workshop at CVPR 2024

An example episode from the MultiON challenge 2024.
Go to the kitchen sink cabinet.
Find the pet house.
Go to the window curtain on the sliding door.


Multi-Object Navigation (MultiON) challenge is hosted at Embodied AI workshop, CVPR 2024. This challenge is built upon the task introduced in MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation (NeurIPS 2020). MultiON is a long-horizon generalization of the object-navigation task, i.e., where the agent has to navigate to multiple goals.


Challenge starts
(Dataset and starter code available and
EvalAI opens up for Minival Phase submissions)
April 19, 2024
Leaderboard opens
(Test Standard Phase and
Test Challenge Phase submissions)
April 19, 2024
Challenge submission deadline June 3, 2024 (AoE)
Winner announcement at Embodied AI Workshop, CVPR 2024 June, 2024


In MultiON, an agent is tasked with navigating to a sequence of objects. These objects are placed into a realistic 3D environment. The task is based on the Habitat platform and Habitat Synthetic Scenes Dataset (HSSD) scenes.

Each episode contains 3 target objects randomly sampled from the objects present in the scene. Unlike previous challenges, this time the objects are described by a language instruction such as ‘Find the mantel clock on the chest of drawers’ instead of an object category. Moreover, the set of goal objects is not known apriori, making it closer to a real-world setting where the agent might be asked to ‘any’ object in the environment. Each language instruction may contain a coarse description of the object (‘Go to the candle’) or a fine-grained description (‘Find the mini spa candle’). In the case of the former, there might exist multiple valid goal objects and navigating to any of them is considered successful. Instructions may also contain spatial relations between objects such as ‘Find the red short pillar candle on the grey nightstand’.

In summary, in each episode, the agent is initialized at a random starting position and orientation in an unseen environment and provided a sequence of 3 target objects. The agent must navigate to each target object in the sequence (in the given order) and call the FOUND action to indicate discovery. The agent has access to an RGB-D camera and a noiseless GPS+Compass sensor. GPS+Compass sensor provides the agent's current location and orientation information relative to the start of the episode.

Evaluation Details

The episode terminates when an agent discovers all objects in the sequence of the current episode or when it calls an incorrect FOUND action. A FOUND action is incorrect if it is called when the agent is not within a 1.5m from its current target object. Note that this does not require the agent to be viewing the object at the time of calling FOUND. After the episode terminates, the agent is evaluated using the Progress and PPL metrics that are defined below.
Progress: The proportion of objects correctly found in the episode.
PPL: Progress weighted by path length. PPL quantifies the efficiency of agent's trajectory with respect to the optimal trajectory.

Submission Guidelines

Participants must make submission through our EvalAI page. There are three phases in the challenge.

Phase 1: Minival Phase

This phase evaluates MultiON agents on the minival set of the MultiON dataset. This phase is meant to be used for sanity checking the results of remote evaluation against your local evaluations.

Phase 2: Test Standard Phase

This results of this phase will be used to prepare the public leaderboard for the challenge. We suggest using this phase for reporting results in papers and for comparing with other methods. Each team is allowed a maximum of 3 submissions per day for this phase.

Phase 3: Test Challenge Phase

Only submissions made in this phase are considered as entries to the MultiON Challenge since this will be used to decide the winners. Each team is allowed a total of 3 submissions to this phase until the end of this phase. For detailed submission instruction, please refer this.

Challenge Updates

Any updates related to the challenge will be posted here. Please join the Google Group email list to receive updates about the challenge: click here to join or send an email to

Terms and Conditions

The Habitat-Sim is released under MIT license. To use HSSD dataset, please refer here. If you use Habitat-Sim or the MultiON dataset in a paper, please consider citing the following publications:
    title={Habitat: {A} {P}latform for {E}mbodied {AI} {R}esearch},
    author={Manolis Savva and Abhishek Kadian and Oleksandr Maksymets and Yili Zhao and Erik Wijmans and Bhavana Jain and Julian Straub and Jia Liu and Vladlen Koltun and Jitendra Malik and Devi Parikh and Dhruv Batra},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    title={Multi-ON: Benchmarking Semantic Map Memory using Multi-Object Navigation},
    author={Saim Wani and Shivansh Patel and Unnat Jain and Angel X. Chang and Manolis Savva},
    booktitle={Neural Information Processing Systems (NeurIPS)},
If you used the proposed Baseline:
    title={MOPA: Modular Object Navigation With PointGoal Agents},
    author={Raychaudhuri, Sonia and Campari, Tommaso and Jain, Unnat and Savva, Manolis and Chang, Angel X},
    booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
If you used the HSSD dataset:
  title={Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation},
  author={Khanna, Mukul and Mao, Yongsen and Jiang, Hanxiao and Haresh, Sanjay and Schacklett, Brennan and Batra, Dhruv and Clegg, Alexander and Undersander, Eric and Chang, Angel X and Savva, Manolis},
  journal={arXiv preprint arXiv:2306.11290},