Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.medera.info/llms.txt

Use this file to discover all available pages before exploring further.

Engine inputs

ChannelInputRequirement
AudioNumPy array (mono PCM)16 kHz, ≥ 1.0 s for vocal features
VideoList of frames (np.ndarray)≥ 15 fps; ≥ 30 s for HR, ≥ 60 s for HRV
AssessmentDict of screener scoresPHQ-9, GAD-7, C-SSRS optional

Vocal pipeline

engine = VocalAcousticEngine(
    sample_rate=16000,
    frame_length_ms=25,
    hop_length_ms=10,
    gender=None,  # auto
)
features: VocalFeatures = engine.extract_features(audio, sr=16000)
Engine internals:
  1. Parselmouth Sound object loaded
  2. F0 contour via Praat autocorrelation (gender-aware range)
  3. Voice quality (jitter, shimmer, HNR) via Praat point-process
  4. Prosody (speaking rate, pauses, intensity)
  5. Spectral (MFCC, centroid, bandwidth, rolloff, flatness) via Librosa
  6. Clinical marker fusion (depression, anxiety, distress indices)

Facial pipeline

engine = FacialPhysiologicalEngine(
    fps=30,
    min_hr_duration=30,
    min_hrv_duration=60,
    roi_type='forehead',
)
signals: PhysiologicalSignals = engine.extract_features(frames, timestamps=None)
Engine internals:
  1. Haar Cascade face detection per frame
  2. ROI extraction (forehead / cheeks / full face)
  3. RGB trace extraction across frames
  4. CHROM rPPG conversion + bandpass filtering
  5. SNR-gated HR extraction (HeartPy + SciPy FFT)
  6. HRV from inter-beat intervals
  7. BP estimation from PPG morphology + HR
  8. Respiration from RGB amplitude modulation
  9. Stress + ANS balance from HRV + HR

RDoC fusion

computer = RDoCConstructComputer()
profile: RDoCActivationProfile = computer.compute_all_constructs(
    facial_features=signals.to_dict(),
    vocal_features=features.to_dict(),
    assessments={"phq9": 14, "gad7": 11},
    context={"chief_complaint": "..."},
)
For each of the 15 constructs, the computer returns a ConstructActivation:
@dataclass
class ConstructActivation:
    name: str                     # e.g. "potential_threat_anxiety"
    score: float                  # 0–1 activation
    confidence: float             # 0–1 calibrated
    contributors: list            # [{feature, value, contribution}]
    interpretation: str           # clinical text
    domain: str                   # RDoC domain
    low_threshold: float = 0.3
    high_threshold: float = 0.7
    # get_severity() → "low" | "moderate" | "high"

Analyzer orchestration

MultimodalTherapyAnalyzer.analyze_session_with_multimodal(...) orchestrates the engines, computes the RDoC profile, optionally pulls evidence from AdvancedClinicalRAG, and returns a canonical payload with flat ClinicalMetric envelopes — the shape the frontend renders.

Latency

OperationTypical
Vocal feature extraction (10 s window)80–150 ms
Facial feature extraction (10 s window)120–220 ms
RDoC construct activation (15 constructs)60–110 ms
End-to-end analyze_session (60 s audio + 30 s video)1.8–3.5 s