Jaret's Wiki

Search

❯

❯

Meetings

Sep 21, 20252 min read

Meetings

Meeting 1

10 June

To-Do:

Open-source diarisation models
How to evaluate
Writing up the evaluation
Abstract
Learn about Praat and Textgrids
Look at data

Meeting 2

27 June

To-Do:

What is DER ← Write in report
Other error metrics
Label ground truth adult
Literature Review (Research) of open source models
VTC get running
Evaluation code

Meeting

23 July

To-Do:

Send Textgrids for VAD (ground truths)
Child vs adult discrimination
Label child/adult speech segments
Evaluation code - Figure out DER
Train on our data

Meeting

30 July

To-Do:

Finalize the ground truths

Meeting

6 Aug

To-Do:

“Final” result (with all metrics) for a first model (on dev)
Fine-tune on our data and check
Train adult/child discriminator on our data
Decide on metrics (after reading)
Agreement form

Meeting

13 Aug

To-Do:

Writing (including metrics & figures)
Figure out DER hyperparameters in Pyannote
kNN and LogReg for adult/child classification
PC

Key Takeway from this meeting is how to build the adult/child classifier using speaker embeddings

Meeting

20 Aug

(Think Herman was away at Interspeech)

Meeting

27 Aug

To-Do:

NeMo
Read up on speechbrain embeddings (x-vector, but there are others)
Logistic reg binary classifier (adult, child classifier)
Fine-tuning
Writing

Meeting

3 Sep

To-Do:

MAC- address of dongle (Used Citrix VPN to connect to JABA in the end)
Nemo diarization extraction using JABA
Read up on speaker embeddings (SpeechBrain ECAPA-TDNN (spkrec-ecapa-voxceleb) & x-vector TDNN (spkrec-xvect-voxceleb))
Fine-tuning
Writing report
If Time - Other classifiers (SVM, AHC etc) and other embedding types

Meeting

10 Sep

No meeting in holiday

Worked on writing, gather more research to include in report.

Coding wise, finished NeMo, VBx diarizations and also experimented with combining different VADs and embedding models that are capable of diarization. Used k-means for clustering (got decent performance) but can still try other clustering methods.

NeMo results on afr training set:

[Train] Average DER (collar=0.10s): 44.62% [Train] Average DER (collar=0.25s): 39.00% [Train] Average DER (collar=0.50s): 31.50% [Train] Average JER: 35.79%

VBx results on afr training set:

[Train] Evaluated 221 files. [Train] Average DER (collar=0.10s): 81.96% [Train] Average DER (collar=0.25s): 80.94% [Train] Average DER (collar=0.50s): 82.80% [Train] Average JER: 37.50%

VBx got punished harshly by DER through its inability to deal with overlapping speech, however JER is actually on-par with NeMo.

To-Do:

Fine-tuning
Writing

Graph View

Meetings
Meeting 1
Meeting 2
Meeting
Meeting
Meeting
Meeting
Meeting
Meeting
Meeting
Meeting

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025

GitHub
Discord Community