Generalist Language Grounding Agents Challenge

Embodied AI Workshop @ CVPR 2023

Recent embodied agents have been successful in learning navigation and interaction skills from large-scale datasets, but progress has been limited to single-setting domains like either instruction-following or dialogue-driven tasks. To avoid over-specialization of models to specific datasets and tasks, this challenge encourages the development of generalist language grounding agents whose architectures transfer language-understanding and decision-making capabilities across tasks. For this first iteration, we unify aspects of the ALFRED and TEACh datasets. While both datasets are set in the Ai2THOR simulator, they differ along several axes:

  • Declarative (ALFRED) vs Dialogue (TEACh) language introduces grounding and alignment challenges
  • Agent heights change depth estimation and segmentation pipelines
  • Changes to the action spaces and room layout require world model generalization
Participants will submit independently to each leaderboard, and submissions will be ranked by a combined unseen success metric.

The ALFRED Leaderboard is accepting submissions to the Embodied AI Challenge with [EAI23] tags! Humans have a success rate of 91% on unseen environments, but the best models are still far behind 😢

Can you do even better? Code, precomputed features, and AI2Thor simulator are all available on Github. For a quick start, checkout FILM – a SoTA agent.

The TEACh Leaderboard is coming soon! not ready, but you can run evaluations on the validation set. The best performing agents have an unseen success rate of <1% 😖

Can you do even better? Code and baselines are all available on Github.