The approach we are taking to long term anticipation involves substantial data wrangling on the test and training data - I have outlined the approach below in case it conflicts with what you are interested in or in fact is of interest to other teams - any comments or feedback will be appreciated;
Many challenges to AI systems operating in “the wild” is that they lack common sense and therefore underperform versus human estimation – the Key Three Data approach to long-term action anticipation is designed to embed “common sense” anticipation in the aggregator stage in order to effect better predictions.
Methodology
The Key Three Data Team approach works with Social Sequence Analysis –
-
DATA PREPARATION : essentially generating a single ground truth with lower ambiguity and equivalent accuracy (this is parallel work from the input provided for all video that is not based on artificial behaviour e.g. role-play or constrained environment) standardizing descriptive language from the summaries and narrations in the annotation file provided, consolidating the two annotator points of view and smoothing the action descriptions in time
-
Verb dependencies based on the identified noun will be used to improve the action identification
-
ACTIVITY TO PURPOSE : disentangle parallel activities, to generate sequences of activities undertaken for a single purpose (clustering, noun standardisation, etc)
-
GENERATING TRANSITION MATRICES PER PURPOSE : the probability that any action follows the current one is calculated – including following immediately or in proximity to the current action AND based on the current or multiple prior actions
-
ROLLING PROJECTIONS : the training and projection method will work with rolling projections to enable the assessment of how many sequence steps or sequence steps of which nature increase the probability of accurately anticipating future actions
Implications of following this methodology
Because we are regenerating the noun-verb dictionaries some under-performance relative to benchmark will be due to differing action description
A broader training data will be used