Notes
Notes from 1st meeting:
Open Source Diarization models How to evaluate a diarization model
Boundary Recall & F1 score
Diarization measures
Using whisper and feed to a LLM to seperate who is speaking (sort of like a dialog between two people )
Learn about textgrids (praat)
Babaloon (Look at )
For next meeting Explain my topic
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
Tried so far:
- To get voice_type_classifier running on mac (might just be some outdated dependencies in github repo)
- ALICE for counting number of words, syllables and phonemes in adult speakers
- WhisperX requires using Nvidia Cuda. Does not work with CPU.
- PyannoteAI activated for a month. The best diarization model so far (not open source)
- So far, pyannote, reverb and VBx is working
Diarization models:
- pyannote
- reverb
- silero vad - just detects when someone is speaking and not who is speaking. https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb#scrollTo=5w5AkskZ2Fwr. Found this silero-vad playground which takes an example.wav and based on the timestamp, splits the audio file to create a new one without “empty noises”. So based on that i thought it would be a good idea to take the timestamp and use it to split speakers completely and create their own audio files. This can be used for easy comparison of models based on how good the audio was split so that no overlapping voices can be heard from each others Audio Technique
Complete diarization models
PyannoteAI API (monthly subscription)
Source: https://dashboard.pyannote.ai/
Pyannote (open-source)
Source: https://huggingface.co/pyannote/speaker-diarization-3.1
Reverb-diarization-v2
Source: https://huggingface.co/Revai/reverb-diarization-v2
Possible other models
Silero-vad + speechbrain
Silero-vad
Source: https://github.com/snakers4/silero-vad/?tab=readme-ov-file
Speechbrain
Source: https://github.com/snakers4/silero-vad/?tab=readme-ov-file