Jaret's Wiki

Search

❯

❯

Notes

Sep 21, 20252 min read

Notes

Notes from 1st meeting:

Open Source Diarization models How to evaluate a diarization model

Boundary Recall & F1 score

Diarization measures

Using whisper and feed to a LLM to seperate who is speaking (sort of like a dialog between two people )

Learn about textgrids (praat)

Babaloon (Look at )

For next meeting Explain my topic

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

Tried so far:

To get voice_type_classifier running on mac (might just be some outdated dependencies in github repo)
ALICE for counting number of words, syllables and phonemes in adult speakers
WhisperX requires using Nvidia Cuda. Does not work with CPU.
PyannoteAI activated for a month. The best diarization model so far (not open source)
So far, pyannote, reverb and VBx is working

Diarization models:

pyannote
reverb
silero vad - just detects when someone is speaking and not who is speaking. https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb#scrollTo=5w5AkskZ2Fwr. Found this silero-vad playground which takes an example.wav and based on the timestamp, splits the audio file to create a new one without “empty noises”. So based on that i thought it would be a good idea to take the timestamp and use it to split speakers completely and create their own audio files. This can be used for easy comparison of models based on how good the audio was split so that no overlapping voices can be heard from each others Audio Technique

Complete diarization models

PyannoteAI API (monthly subscription)

Source: https://dashboard.pyannote.ai/

Pyannote (open-source)

Source: https://huggingface.co/pyannote/speaker-diarization-3.1

Reverb-diarization-v2

Source: https://huggingface.co/Revai/reverb-diarization-v2

Possible other models

Silero-vad + speechbrain

Silero-vad

Source: https://github.com/snakers4/silero-vad/?tab=readme-ov-file

Speechbrain

Source: https://github.com/snakers4/silero-vad/?tab=readme-ov-file

Graph View

Notes
Complete diarization models
PyannoteAI API (monthly subscription)
Pyannote (open-source)
Reverb-diarization-v2
Possible other models
Silero-vad + speechbrain

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025

GitHub
Discord Community