I sent this email to the organisers and thought like sharing it here would be interesting for getting additional feedback.
Can models really have access to all frames at query time?
Dear Ego4d team,
As a PhD student in ML working on episodic memory, I am very interested by your challenges. However, according to my current understanding of the rules, it seems that someone like me does not stand a chance because the tasks might not require episodic memory.
In neuroscience and cognitive science, episodic memory refers to what-where-when information. This aspect is well captured by your challenge. However, episodic memory also refers to a timespan that is too long for activity-based memory. Yet, I do not see how this is captured by your challenge. In fact, after reading the natural language queries challenge description, it seems that candidate models have direct access to the whole video at query time, which totally undermines the need for memory, not to mention the need for weight-based long term memory.
Could you please provide some clarification?
Please note that I do not intend to criticise for the sake of criticising. I am very happy to see interest in episodic memory in ML but as a cognitive scientist, I would like to make sure an accurate definition of the concept is used. On a personal note, I would like to stand a chance with my model of episodic memory.