Download of only narration.json from the annotations folder

I have a list of video ids which I want to download. I am particularly interested in the summaries section(narration.json/video_uid/narration_pass_1/summaries), and thus only want to download the narration.json file. I tried running the below command(in google colab):

ego4d --output_directory=“/content/ego_4d_2” --datasets video_540ss annotations --video_uid_file ‘/content/sample_data/ego4d_ids’ --aws_profile_name ego4d --version v2 --no-metadata

I have noticed that this downloads all the annotation files(the annotations folder around 6GB in size). Is there any workaround to only download the narration.json file for the videos?

No, it is currently not possible to only download the narrations.json file from the CLI tool.

You may be able to copy it directly from S3. Please let me know if this command fails.

aws s3 cp s3://ego4d-consortium-sharing/public/v2/annotations/narration.json narration.json

As a couple of side notes:

  1. I recommend using all_narrations_redacted.json: this file contains more narrations & summaries, which I am now realizing we do not have documentation for (sorry)
  2. If you’re constrained on storage on colab, you could also consider using the features (FP16 variant). Here is an example colab using the FP16 features: Google Colab

Download (1) via:

aws s3 cp s3://ego4d-consortium-sharing/public/v2/annotations/all_narrations_redacted.json all_narrations_redacted.json

Note: (1) narration.json was derived from all the narrations to create two “tracks” of narrations.

Hi Miguel,

Thanks for responding to my query.

  1. I was able to download both of the narration.json and all_narrations_redacted.json files successfully using the commands you’ve mentioned.
  2. I will definitely consider going through the FP16 features colab notebook.