DER score is fine, but I don’t think MOTA and IDF1 scores are correctly computed by EvalAI. I used the same detection&tracking algorithms that were provided by the Ego4D’s audio-visual repo and could get the exact same scores for the validation set (74.52, 84.92). Therefore, I should get (71.94, 80.07) but EvalAI gave me (-73.38, 0.054). Could you confirm that the submission format for ‘person’ is [frame_num, person_id, x1, y1, x2, y2], not [frame_num, person_id, x1, y1, width, height]?
The format should be [frame_num, person_id, x1, y1, width, height]. Sorry for the inaccurate description (it has been updated). Please try again and let us know whether you get the correct number.
Now it works fine. Thanks!