NeMo
Two routes:
- Automatic Speech Recognition i.e Whisper
- VAD i.e MarbleNet Diarizer i.e Speechbrain
Note: 2nd route is the familiar one where we have a VAD i.e. Silero or MarbleNet(NeMo) and a diarizer i.e Speechbrain, Pyannote.Audio or MSDD(NeMo)
Conclusion from NeMo Diarization
The segmentation model of NeMo is prone to cutting speakers off. Can be improved by using the neural diarizer.
Improved model idea: Take the segmentation approach of pyannote and combine it with the speaker detection of NeMo.