ALFRED Challenge

Leaderboard is now live for submissions to the ALFRED challenge! Humans have a success rate of 91% on unseen environments, but our baseline model has a 0.4% success rate. 😢

Challenge Winner: After 17 submissions, we saw a 10x increase in success rate to 4.5% by the challenge winners: Van-Quang Nguyen and Takayuki Okatani of Tohoku University!

Can you do even better? Code, precomputed features, and AI2Thor simulator are all available for a quick start on GitHub

Leaderboard

Leaderboard»

Embodied Vision, Actions & Language Workshop @ ECCV

Speakers and Panelists

Kristen Grauman (Talk)	Nick Roy (Talk)	Chelsea Finn (Talk)
Jean Oh (Talk)	Jason Baldridge (Talk)	Abhinav Gupta (Talk)

Live Sessions!

There will be two live sessions including panels, poster presentations and best challenge submissions award:

Aug 22
7 - 7:45pm	Opening Remarks & Paper Talks (Recording)
7:45 - 9pm	Poster Q&A
Aug 23
11am - 12pm:	Invited Speaker Panel (Recording)
12 - 1pm:	Social Hour

All times UTC-5 (EDT)

Poster Session

#1. A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks (Talk)
Unnat Jain, Luca Weihs, Eric Kolve, Ali Farhadi, Svetlana Lazebnik, Aniruddha Kembhavi, Alex Schwing

#2. On the Evaluation of Vision-and-Language Navigation Instructions (Talk)
Ming Zhao, Peter Anderson, Alexander Ku, Vihan Jain, Jason Baldridge, Eugene Ie

#3. Sim-to-Real Transfer for Vision-and-Language Navigation (Talk)
Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

#4. Modular Pretraining for Vision Language Navigation (Talk)
Felix Labelle, Xiaopeng Lu, Nariaki Kitamura, Jean Oh

#5. Semantic Visual Navigation by Watching YouTube Videos (Talk)
Matthew Chang, Arjun Gupta, Saurabh Gupta

#6. ALFRED Speaks: Automatic Instruction Generation for Egocentric Skill Learning (Talk)
Legg Yeung, Yonatan Bisk, Oleksandr Polozov

#7. ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes (Talk)
Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, Mohamed Elhoseiny, Leonidas Guibas

#8. The RobotSlang Benchmark:Dialog-guided Robot Localization and Navigation (Talk)
Shurjo Banerjee, Jesse Thomason and Jason J. Corso

#9. Improving Mask Prediction for Long Horizon Instruction Following (Talk)
Kunal Pratap Singh, Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi

#10. A Hierarchical Attention Model for Action Learning from Realistic Environments and Directives (Talk)
Van-Quang Nguyen, Takayuki Okatani

Workshop Details

The focus of this workshop is on embodied visual tasks that require the grounding of language to actions in real-world settings. Specifically, we want to draw focus to challenges like partial observability, continuous state spaces, and irrevocable actions for language-guided agents in visual environments. Such challenges are not captured by current datasets for grounding and embodiment [1, 2, 3].

Key Topics

Egocentric and Robotic vision
Language Grounding
Navigation and Motion Planning
Interactive/Causal Reasoning
Learning from Demonstration
Task and Symbolic Planning
Deep Reinforcement Learning
Commonsense Reasoning

To encourage research in embodied vision & language, the workshop includes a benchmark challenge based on ALFRED. This benchmark captures real-world complexities like object state changes, and requires long-horizon planning. This workshop exists to bring together Vision, Robotics, and NLP researchers to tackle the unique challenges of this three-field intersection that are often avoided when focusing only on vision-and-language or vision-and-robotics (i.e., 'embodied AI').

Yonatan Bisk

Jesse Thomason

Mohit Shridhar

Chris Paxton

Peter Anderson

Roozbeh Mottaghi

Eric Kolve

Submission Details

Contributed Papers
Standard ECCV 2020 format -- Submit papers to OpenReview

Challenge Papers
Participants are required to upload their model to our evaluation server. The evaluation server automatically evaluates the models on an unseen test set. Final numbers for the prize challenge will be frozen on Aug 5

Publication Options Archival vs Unofficial
Papers can submitted for publication in either the official proceedings (archival) or to be hosted on this website (unofficial). Both submission types can be presented at the workshop, but opting out of the proceedings allows you to submit your work for publication at another venue. Unofficial submissions are not required to be in the ECCV format. Please add either Archival or Unofficial as a keyword when submitting to indicate the correct submission track. No submissions will be made public during the review process. If there is any doubt we will contact authors to confirm the desired submission track.

Important Dates

Contributed Papers
~~Submission~~	~~July 31~~
Challenge Papers
~~Prize Leaderboard closes~~	~~Aug 5~~
~~Abstract deadline~~	~~Aug 5~~
~~System descriptions~~	~~Aug 10 (5pm PDT)~~

~~Notification~~	~~Aug 17~~
Camera Ready	Sept 10

Abstract Deadline ECCV conference organizers have asked that there be a version of all papers (official and unofficial) in the conference system by Aug 5. We are interpreting this restriction as abstract only and that you will be allowed to make changes via OpenReview to the full paper through the deadline and all papers can be heavily revised for the camera ready. We are sorry for the new virtual restrictions. We will continue to update this space (including how to upload abstracts) as the conference organizers give us more information. Please don't hesitate to reach out with questions!

NEW! System Descriptions Submit your descriptions to OpenReview by Aug 10 (5pm Pacific Daylight Time).

Submission FAQ

Do unofficial submissions need to be in ECCV format?
No

Can unofficial submissions already be published?
Yes

Page lengths for official submissions?
4 - 14 pages (ECCV rules require not major overlap with existing published work)

Page lengths for unofficial submissions?
Minimum 4 pages

Is reviewing blind?
Official are double blind, Unofficial are single

Supplemental Material
Can be merged to the end of the submission PDF

Competition Metric
Ranking and awards are based on Unseen Success Rate