A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

Mohit Shridhar Jesse Thomason Daniel Gordon Yonatan Bisk
Winson Han Roozbeh Mottaghi Luke Zettlemoyer Dieter Fox

ALFRED (Action Learning From Realistic Environments and Directives), is a new benchmark for learning a mapping from natural language instructions and egocentric vision to sequences of actions for household tasks. Long composition rollouts with non-reversible state changes are among the phenomena we include to shrink the gap between research benchmarks and real-world applications.

Paper on ArXiv» GitHub» Simulator» Leaderboard»

  title ={{ALFRED: A Benchmark for Interpreting Grounded
           Instructions for Everyday Tasks}},
  author={Mohit Shridhar and Jesse Thomason and
          Daniel Gordon and Yonatan Bisk and
          Winson Han and Roozbeh Mottaghi and
          Luke Zettlemoyer and Dieter Fox},
  booktitle = {The IEEE Conference on Computer Vision
              and Pattern Recognition (CVPR)},
  year = {2020},
  url  = {}