Thank you for the amazing data! If I want to know the specific time of each ASR recognized text tokens in expert commentary, e.g. 0.webm → whisper → text tokens → the timestamp of each text token, what should I do?
You can see how I transcribed the expert commentary audio files here: Ego4d/ego4d/internal/expert_commentary/transcribe.py at main · facebookresearch/Ego4d · GitHub
You can re-process the audio files with whisper, as it does support word-level timestamps (refer to Ego4d/ego4d/egoexo/scripts/extract_audio_transcribe.py at main · facebookresearch/Ego4d · GitHub for details)
I can do this on my end, but this wont be available for awhile.