Diarization - Medera

Overview

Speech to Text

Facial Physiological Engine

Vocal Acoustic Engine

Neurobehavioral Construct Computer

Medera diarizes up to 4 speakers in conversational STT with sub-second labelling latency.

How it works

Speaker embeddings extracted per voiced window
Online clustering with minimum-cluster-duration heuristic
Label stability via Viterbi smoothing
Optional enrollment for known clinicians

Enrollment

POST /api/providers/{id}/voice-enroll
Content-Type: audio/wav

Upload 30 s of clean clinician speech to register a speaker embedding. Enrolled speakers are labelled consistently across sessions.

Audio Events Microphone Devices