Can I get the explanation for AV annotation meaning?

Hi, I am interested in AV benchmark.
I am exploring AV annotation, but I failed to find explanation for annotation meaning

From here (Annotation Schemas | Ego4D), I was able to check how the annotations are structured, but I could not understand what their meaning are.

For example, what is the ‘persons’? does it meaning all persons in the video? or does it contain persons who spoke at least once in the clip.

If I could get an explanation about it, I would appreciate it.
Thank you in advance.