Ego-Exo4D V2: Full Ego-Exo4D Release

We are excited to announce the full release of Ego-Exo4D. In December of last year, we shared an initial V1 of this dataset, but we have been working since that time to round out the entirety of what Ego-Exo4D makes available. We are very pleased to share all Ego-Exo4D resources today with the community. Please read below to learn more about how we have updated the data and annotations and sought to make them more easily accessible.

To further demonstrate what Ego-Exo4D can enable, we also announce two new teaser challenges to conclude at the EgoVis Workshop at CVPR, focused on advancing our benchmarks on EgoPose (Body) and EgoPose (Hands). Look for an official announcement in the next few days!

For users who have signed the license, you can explore V2 immediately in the visualizer.

Full Release Description


  • Ego-Exo4D now includes 1,286.30 total video hours across 5035 takes (221.26 ego-centric hours). Comparing to V1:
    • 1341 additional takes: spanning 254.8 more total hours (44.4 ego-hours)
    • 119 additional Narrate & Act takes (456 takes)
  • 99% of all takes now contain eye gaze (2D and 3D), trajectory data and 3D point clouds
  • Note on Dataset Quality: each take is verified to have the correct task ID label by external annotators and verified by university data POCs. Please see this forum post for additional context: Ego-Exo4D Dataset Changes: Quality Issues & Future Update
  • NEW: Best exocentric labels available for 90% of the takes
  • NEW: Tight take time boundaries indicating when the task occurs within a take


  • UPDATED: Expert Commentary: 11,689 annotations time stamped to 3,055 takes (117,812 audio recordings) annotated from 50 experts, providing detailed activity descriptions from an expert point of view
  • UPDATED: EgoPose (Body) human generated ground truth available for 1,358 takes containing 376K 3D body poses and 2M 2D body pose annotations. Automatic annotations covering 2559 takes, 9.2M 3D body poses and 46.87M 2D body poses are also available.
  • UPDATED: EgoPose (Hand) human generated ground truth available for 458 takes containing 68K 3D hand poses and 340K 2D hand pose annotations. Automatic annotations covering 976 takes, 4.3M 3D hand poses and 21M 2D hand poses are also available.
  • UPDATED: Atomic Actions Descriptions covering 4,965 takes with 432,467 text descriptions. Comparing to V1:
    • 172% more descriptions (250,742 descriptions in V1) across 2,202 more takes
    • Bug fix: “unsure” and “best exo” fields corrected
  • UPDATED: Relation Annotations 2.2M segmentation masks across 1653 takes. Comparing to V1:
    • 832,733 more segmentation masks across 396 more takes
  • UPDATED: Train/val/test sets now cover all takes in the dataset: 3072 train/842 val/1121 test


  • NEW: Transcriptions and pre-extracted audio all takes, including Narrate & Act takes, where camera-wearers verbally describe what they are doing while they are doing it.
  • NEW: MAWS CLIP features for each frame of video in Ego-Exo4D
  • NEW: 2D eye gaze for 99% of takes (derived from the 3D eye gaze; projected into the egocentric image plane)

Please refer to our research documentation pages for more information about Ego-Exo4D data and annotation access and review our research paper for the full description of our effort. If you encounter issues with the dataset, please utilize our Forum. We do not regularly staff our email inbox for the Forum is the best way to get in touch. Thank you for being a part of Ego-Exo4D! We are very excited to see how the community uses this resource!