Documentation Index
Fetch the complete documentation index at: https://docs.medera.info/llms.txt
Use this file to discover all available pages before exploring further.
Engine inputs
| Channel | Input | Requirement |
|---|---|---|
| Audio | NumPy array (mono PCM) | 16 kHz, ≥ 1.0 s for vocal features |
| Video | List of frames (np.ndarray) | ≥ 15 fps; ≥ 30 s for HR, ≥ 60 s for HRV |
| Assessment | Dict of screener scores | PHQ-9, GAD-7, C-SSRS optional |
Vocal pipeline
- Parselmouth
Soundobject loaded - F0 contour via Praat autocorrelation (gender-aware range)
- Voice quality (jitter, shimmer, HNR) via Praat point-process
- Prosody (speaking rate, pauses, intensity)
- Spectral (MFCC, centroid, bandwidth, rolloff, flatness) via Librosa
- Clinical marker fusion (depression, anxiety, distress indices)
Facial pipeline
- Haar Cascade face detection per frame
- ROI extraction (forehead / cheeks / full face)
- RGB trace extraction across frames
- CHROM rPPG conversion + bandpass filtering
- SNR-gated HR extraction (HeartPy + SciPy FFT)
- HRV from inter-beat intervals
- BP estimation from PPG morphology + HR
- Respiration from RGB amplitude modulation
- Stress + ANS balance from HRV + HR
RDoC fusion
ConstructActivation:
Analyzer orchestration
MultimodalTherapyAnalyzer.analyze_session_with_multimodal(...) orchestrates the engines, computes the RDoC profile, optionally pulls evidence from AdvancedClinicalRAG, and returns a canonical payload with flat ClinicalMetric envelopes — the shape the frontend renders.
Latency
| Operation | Typical |
|---|---|
| Vocal feature extraction (10 s window) | 80–150 ms |
| Facial feature extraction (10 s window) | 120–220 ms |
| RDoC construct activation (15 constructs) | 60–110 ms |
| End-to-end analyze_session (60 s audio + 30 s video) | 1.8–3.5 s |