A new system dispenses with the human annotation of training
data required by its predecessors but achieves comparable results.
A central topic in spoken-language-systems research is
what’s called speaker diarization, or computationally determining how many
speakers feature in a recording and which of them speaks when. Speaker
diarization would be an essential function of any program that automatically
annotated audio or video recordings.