PACO challenge question

jonathanli · April 20, 2023, 12:38pm

Hi, I’m participating PACO challenge and I have 3 questions.

I would like to ask if the scope of our query is all the 9892 test images or the positive and negative annotations associated with the query
Did the baseline models’ results in MODEL_ZOO.md be calculated in test-std or test-dev? May you provide the results of baseline models in test-std?
The Submission Guidelines says ‘Results should be submitted as a list of prediction dictionaries for all images in the dataset’, but due to the aforementioned reasons, the result can’t be accurately calculated. Can I directly submit the det_scores matrix?

vpetrovi · April 20, 2023, 6:25pm

Hi there, thanks for your interest in our challenge! Below are answers to your questions:

The scope of results is all of the 9892 test images. Please run your model on all 9892 images and provide results in format detailed in submission instructions https://eval.ai/web/challenges/challenge-page/1970/submission.
MODEL_ZOO.md shows results on the entire test dataset (std + dev). We have provided results on std and dev splits in the camera ready paper version but I see that it is not up on arXiv yet, it will take a few days for the new version to show up. I can send you the camera ready version privately to unblock you in the mean time.
We have released the test_dev split so you can compare performance of your own models on test_dev split (again using predictions on all images, the eval code in paco/paco_query_evaluation.py at main · facebookresearch/paco · GitHub supports evaluation on a subset that has the ground truths). You can also compare results of baseline models we released in model zoo to your results, that should give you an idea of whether your model is better or worse than the baseline. Once you are satisfied, you can upload the results in the format specified in submission instructions and the eval server will evaluate on test_std and post results on the leaderboard if you decide to do so.

Hope this helps, good luck with the challenge!

jonathanli · April 21, 2023, 12:20pm

Thank you for your answer. Accoding to the submission instructions https://eval.ai/web/challenges/challenge-page/1970/submission, I should submit a dict for each image.
However, the evalution methods in your code don’t use images’ dicts, in fact, the evaluation code herehttps://github.com/facebookresearch/paco/blob/main/paco/evaluation/paco_query_evaluation.py first calculates the det_scores matrix using the compute_similarity_matrix() fuction, in which det_scores[i][j] means the score of the ith box matched with the jth query_id. I first considered to select the query_id with the max score as the final result dict of each box, but after I submit the result , it turned out that the L2 queries and L3 queries have 0 score, which means they are not predicted at all. In fact, the det_scores matrix’s high score are all in L1 query, the scores of L2 and L3 query are very low.
Then I check your evalution code, I found that you didn’t choose the max score as I did. Instead, you first select the query_id that match with this image, and then choose the boxes with the max score of these specific query_id.
However, you haven’t provided the query_id that match with the specific image in the test_std, so I can’t submit result with the right format.
May you answer this quetion or can I just submit the det_scores matrix ? Thank you !

vpetrovi · April 21, 2023, 8:17pm

Hi there, not sure why you are looking at the eval code to determine the format of the submissions, but the complete eval process from the input prediction dump (that contains a dict for each image) is given in paco/query_predictions_eval.py at main · facebookresearch/paco · GitHub (this code will take the prediction dump in the required format and convert to internal representation (det_bboxes and det_scores)). det_scores matrix is used only internally and cannot be submitted directly.

So your model should take an image, detect bounding boxes for objects in the image, and then for each box predict scores for 4345 queries that exist in this dataset. This should be done for all images. Now, you could provide a Kx4345 matrix of scores (for all boxes and all queries) for each image but prediction dumps would be too big (10-100GB) and you will have issues with submission upload. The way that detectors usually work is that they keep only N top boxes based on predicted scores. This basically sparsifies the prediction score matrix and doing so reduces the prediction dump size so we opted for this format (providing arrays of bounding boxes, predicted query classes, and predicted scores, all of the same length).

Hope this helps clarify submission format.

jonathanli · April 23, 2023, 6:13am

Thank you ! But how can I get the prediction dump in the required format from the code?

vpetrovi · April 23, 2023, 8:25pm

We haven’t released the dumping code for baselines (and there is no plan to do so), we assume that participant developing their methods have their own code and that they will write their own code for dumping their predictions in the submission format.

If you need baseline numbers on test-dev for comparison with your method, baseline code can be run on test-dev dataset and evaluation numbers can be obtained directly by using the PACOQueryEvaluator.

jonathanli · April 24, 2023, 5:38am

Thank you! The problem has been solved!

Topic		Replies	Views
Paco challenge questions Ego4D Challenges	9	401	July 3, 2023
Paco submission question Ego4D Challenges	3	271	May 19, 2023
FHO State Change Object Detection Test Annotations Released Ego4D Challenges	7	1018	June 3, 2023
About the results on Leaderboard Ego4D Challenges audio-visual	2	655	September 19, 2022
2022 Ego4D Challenge@ECCV – Next Steps Ego4D Challenges	2	678	September 20, 2022

PACO challenge question

Related topics