EgoTracks dataset download failure

Hello.
I have received my aws cli license from ego4d yesterday, and I’m trying to download the “EgoTracks” dataset.

I can successsfully download the viz and annotations but egotracks videos is failing.

ego4d --output_directory="~/scratch/data/tracking/ego4d" --datasets egotracks
output:
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

Also I wanted to ask how large (in GB) the EgoTracks dataset is? can only find it consists of 5.9K videos but I couldn’t find anywhere it states the actual size.

I have followed the instructions based on
egotracks download instructions
and
ego4d cli instructions

Hi

I am running into the same problem. If I do not use the region in config file then it throws ValueError: Invalid endpoint and if I use us-east-1 or us-east-2 then it throws the error stated above. Did you find any solution?

Thanks,

Hello I have not been able to resolve this. Also i think --datasets annotations_540ss also doesn’t work… Please let me know where i can find the downscaled 540ss annotations!

Please try with the --version v2 argument as well. Can you confirm that works for EgoTracks?

(Looking at 540ss, will come back shortly.)

–version v2 argument works, but it says the dataset is only 2.4 GB which I am highly doubtful it is the size of the entire egotracks dataset?

ego4d --output_directory ./ --datasets egotracks --version v2
Expected size of downloaded files is 2.4 GB. Do you want to start the download?

Could you confirm with me that 2.4GB is the correct size of the entire EgoTracks dataset?

I think the --datasets egotracks only gives the annotation json files.
So do I have to download the video data by the following command???
ego4d --output_directory ./ --datasets full_scale --benchmark EM --version v2

Hello. I have downloaded version 2 of the full-scale videos for EM Benchmark, but while running the preprocessing step, I meet this error.

python tools/preprocess/extract_ego4d_clip_frames.py

line 77, in extract_clip_ids
clip_uids.append(c["exported_clip_uid"])
KeyError: 'exported_clip_uid'

Could you please provide a full guide for the egotracks dataset download & preprocess so I can follow?
Thank you.

Hi, which annotation_path are you using (train, val or test)?

  • This should not happen with the challenge test set, but if it does, please let us know!
  • For the train and val, we are working on pushing an updated preprocess that should solve the problem. The workaround is: You can simple ignore these clips (should be less than 1%). We don’t have the exported_clip_uid for these frames because of conversion error.

Thank you for the reply.
Can you please confirm for me that the download script i used is correct?

ego4d --output_directory ./ --datasets egotracks full_scale --benchmark EM --version v2

Total size was about 2.7 TB

Hi, I am working on confirming the download script, but we don’t need full_scale, only the clips for EM are needed. So it should be the following:

ego4d --output_directory ./ --datasets egotracks clips --benchmark EM --version v2

Thank you so much for the fast reply! I think I will try to re-download only the clip dataset in the meanwhile. Please let me know when the preprocess script is cleaned :). Thanks @haotang!

@haotang
Just as an FYI. Skipping 113 videos...Total 3433 to be processed ... Is the number of videos that don’t have “exported_clip_uid” field in annotations

Hi @aram Thanks for the sharing the numbers! This looks correct to me. I ignored a few more videos because of issues with the frame conversion for certain bounding boxes. I created a fix in Egotracks fix by tanghaotommy · Pull Request #42 · EGO4D/episodic-memory · GitHub and am waiting for review. But you may take a look and use that.

Thank you for making this adjustment!!

I have tried running the preprocess code but it hangs. Also I can no longer cd or ls into the drive storing the data. Currently I am using a 4TB ssd to store all the data. Could you please provide reference on what is the total disk space that is required to run the preprocess script?

I believe the clips themselves are less than 1TB. The preprocessed data takes about 800GB (only annotated frames).

Thanks. I have a 4 TB ssd, and I have a problem where the extraction code hangs in the middle, and I cannot ls into my ssd. (probably the extraction process/thread is not exitting??)

So I have cancelled - restarted multiple times but know the extracted frames folder is ~ 3.8 TB. (disk space is basically full)
Do you suspect anything going wrong?

Major problems

  1. frame extraction process hanging (probably due to 2? but not sure)
  2. Disk space requirement > 8 TB

I did some calculation where
each video ~ 8 min with 30fps = 8 * 60 * 30 = 14400 frames. (I checked and the extracted folder actually has 14400 frames)
Each frame ~ 200 KB
Each video image folder (extracted frames) = 14400 * 200 KB = 2.88 GB.
Train set includes 3000 videos which leads to 8.6 TB disk space for extracted frames.

Would really appreciate your reply!

I noticed that you mentioned we should be extracting “only annotated frames” but I guess the current preprocessing code is extracting all frames?

Please correct me if I am wrong.

Yes, true. We annotate at 5FPS, so it should be fine if only extracting those frames. If you extract at 30 FPS, the disk space is not enough. Please take a look at the pull request here: Egotracks fix by tanghaotommy · Pull Request #42 · EGO4D/episodic-memory · GitHub. EgoTracks/tools/preprocess/extract_ego4d_clip_annotated_frames.py only extracts annotated frames.

1 Like

Start processing db211359-c259-4515-9d6c-be521711b6d0!
Start processing 87b52dc5-3ac3-47e7-9648-1b719049732f!
Start processing b7fc5f98-e5d5-405d-8561-68cbefa75106!
Start processing 59daca91-5433-48a4-92fc-422b406b551f!

I have problems preprocessing the videos above. (Process never ends and gives error)

File "av/enum.pyx", line 60, in av.enum.EnumType.__getitem__
KeyError: 'ERRORTYPE_2'

Could you please help me check what is going wrong with these?