Medera’s Multimodal Sensing layer fuses three real-time clinical signal channels — vocal acoustics, facial physiology, and assessment context — into 15 RDoC construct activations. The pipeline runs inDocumentation Index
Fetch the complete documentation index at: https://docs.medera.info/llms.txt
Use this file to discover all available pages before exploring further.
ai-services under src/multimodal/engines/ and src/rdoc/.
The three engines
Vocal Acoustic Engine
VocalAcousticEngine — Librosa + Parselmouth (Praat). F0 statistics, jitter, shimmer, HNR, MFCC (13), prosodic features, and clinical markers.Facial Physiological Engine
FacialPhysiologicalEngine — OpenCV + HeartPy + SciPy. rPPG-derived HR, BP, HRV (SDNN / RMSSD / LF-HF / SD1 / SD2), respiration, stress index.RDoC Construct Computer
RDoCConstructComputer — fuses facial features, vocal features, and assessment context into 15 named constructs across 5 domains.Architecture
Endpoints
The multimodal pipeline is exposed through two API surfaces.| Surface | Endpoint | Use |
|---|---|---|
| Backend (3001) | POST /api/multimodal-therapy/analyze-session | Full multimodal analysis + biomarker persistence + metered credit |
| Backend (3001) | POST /api/multimodal-therapy/analyze-audio-only | Audio-only proxy to AI services |
| AI Services (8000) | POST /api/multimodal-therapy/analyze-session | Source-of-truth analyzer |
| AI Services (8000) | POST /api/multimodal-therapy/analyze-audio-only | Vocal-only analysis |
| AI Services (8000) | GET /api/multimodal-therapy/health | Engine availability + dep status |
Quality and confidence
Every output reports confidence inConfidenceLevel bands (defined in advanced_clinical_rag.py):
| Band | Range |
|---|---|
VERY_HIGH | ≥ 0.95 |
HIGH | ≥ 0.85 |
MODERATE | ≥ 0.70 |
LOW | ≥ 0.50 |
VERY_LOW | < 0.50 |
MODERATE, results are delivered with requires_human_review: true. If signal quality fails (SNR < 3.0 on the facial channel, voiced fraction too low on the vocal channel), the engine returns the metric with confidence: 0.0 and the analyzer falls back to a GPT-4o transcript-inferred analysis — never to fabricated values.
What’s next
Architecture
Pipeline deep dive.
Quickstart
Stream your first multimodal session.
RDoC Constructs
15 named constructs.
Co-Therapy Agent
Agent-level documentation.