I have a query regarding baseline implementation. In baseline implementation, modelling is done similar to audio visual video classification (audio and video frames are considered for the same). While testing the model on test set, same prediction score is iterated through whole segment. Is it bug or is it intentional?, as in test set, it specifically requires per frame prediction score.
Thanks