Hi, I’m participating PACO challenge and I have 3 questions.
- I would like to ask if the scope of our query is all the 9892 test images or the positive and negative annotations associated with the query
- Did the baseline models’ results in MODEL_ZOO.md be calculated in test-std or test-dev? May you provide the results of baseline models in test-std?
- The Submission Guidelines says ‘Results should be submitted as a list of prediction dictionaries for all images in the dataset’, but due to the aforementioned reasons, the result can’t be accurately calculated. Can I directly submit the det_scores matrix?
Hi there, thanks for your interest in our challenge! Below are answers to your questions:
- The scope of results is all of the 9892 test images. Please run your model on all 9892 images and provide results in format detailed in submission instructions https://eval.ai/web/challenges/challenge-page/1970/submission.
- MODEL_ZOO.md shows results on the entire test dataset (std + dev). We have provided results on std and dev splits in the camera ready paper version but I see that it is not up on arXiv yet, it will take a few days for the new version to show up. I can send you the camera ready version privately to unblock you in the mean time.
- We have released the test_dev split so you can compare performance of your own models on test_dev split (again using predictions on all images, the eval code in paco/paco_query_evaluation.py at main · facebookresearch/paco · GitHub supports evaluation on a subset that has the ground truths). You can also compare results of baseline models we released in model zoo to your results, that should give you an idea of whether your model is better or worse than the baseline. Once you are satisfied, you can upload the results in the format specified in submission instructions and the eval server will evaluate on test_std and post results on the leaderboard if you decide to do so.
Hope this helps, good luck with the challenge!
Thank you for your answer. Accoding to the submission instructions https://eval.ai/web/challenges/challenge-page/1970/submission, I should submit a dict for each image.
However, the evalution methods in your code don’t use images’ dicts, in fact, the evaluation code herehttps://github.com/facebookresearch/paco/blob/main/paco/evaluation/paco_query_evaluation.py first calculates the det_scores matrix using the compute_similarity_matrix() fuction, in which det_scores[i][j] means the score of the ith box matched with the jth query_id. I first considered to select the query_id with the max score as the final result dict of each box, but after I submit the result , it turned out that the L2 queries and L3 queries have 0 score, which means they are not predicted at all. In fact, the det_scores matrix’s high score are all in L1 query, the scores of L2 and L3 query are very low.
Then I check your evalution code, I found that you didn’t choose the max score as I did. Instead, you first select the query_id that match with this image, and then choose the boxes with the max score of these specific query_id.
However, you haven’t provided the query_id that match with the specific image in the test_std, so I can’t submit result with the right format.
May you answer this quetion or can I just submit the det_scores matrix ? Thank you !
Hi there, not sure why you are looking at the eval code to determine the format of the submissions, but the complete eval process from the input prediction dump (that contains a dict for each image) is given in paco/query_predictions_eval.py at main · facebookresearch/paco · GitHub (this code will take the prediction dump in the required format and convert to internal representation (det_bboxes and det_scores)). det_scores matrix is used only internally and cannot be submitted directly.
So your model should take an image, detect bounding boxes for objects in the image, and then for each box predict scores for 4345 queries that exist in this dataset. This should be done for all images. Now, you could provide a Kx4345 matrix of scores (for all boxes and all queries) for each image but prediction dumps would be too big (10-100GB) and you will have issues with submission upload. The way that detectors usually work is that they keep only N top boxes based on predicted scores. This basically sparsifies the prediction score matrix and doing so reduces the prediction dump size so we opted for this format (providing arrays of bounding boxes, predicted query classes, and predicted scores, all of the same length).
Hope this helps clarify submission format.
Thank you ! But how can I get the prediction dump in the required format from the code?
We haven’t released the dumping code for baselines (and there is no plan to do so), we assume that participant developing their methods have their own code and that they will write their own code for dumping their predictions in the submission format.
If you need baseline numbers on test-dev for comparison with your method, baseline code can be run on test-dev dataset and evaluation numbers can be obtained directly by using the PACOQueryEvaluator.
Thank you! The problem has been solved!