TTM baseline Implementation Query

I have a query regarding baseline implementation. In baseline implementation, modelling is done similar to audio visual video classification (audio and video frames are considered for the same). While testing the model on test set, same prediction score is iterated through whole segment. Is it bug or is it intentional?, as in test set, it specifically requires per frame prediction score.


Hi, it is intentional. Our baseline uses a very simple and straightforward framework, we make segment level predictions during training. As for evaluation, we assign the segment-wise predctions to each frame because the evalutation metrics calculate MAP based on frame level results.