Building Your Co-Therapy Multimodal Layer

The Co-Therapy Agent is a silent multimodal partner — it never speaks to the patient. It captures audio, video, and language during the session and fuses the signal into 15 neurobehavioral construct activations on a private clinician channel.

Architecture

+ VIDEO → Multimodal Engine → Neurobehavioral Construct Computer │ ▼ Clinician private channel │ ▼ Session note + outcomes

Multimodal channels

Channel	Engine	Outputs
Audio	Vocal Acoustic Engine	F0, jitter, shimmer, HNR, MFCC, prosodic flatness, depression/anxiety/distress indices
Video	Facial Physiological Engine	HR, BP, HRV (SDNN/RMSSD/LF-HF), respiration rate, stress index, affect
Language	Linguistic Content Expert	Topic, valence, certainty, pronoun shift	These fuse into 15 neurobehavioral construct activations. See Neurobehavioral Construct Overview.

Steps

Capture consent

Recording requires explicit patient consent per session. Configure your consent flow in the Console.

Start the session

POST /api/therapy-sessions/start with the participants and modality. Returns a session_id and a multimodal WebSocket URL.

Stream audio + video

Stream PCM audio at 16 kHz and video frames (vision pipeline 468-point landmarks) over the multimodal WebSocket.

Receive construct activations

The clinician’s private channel emits rdoc.activation events with feature contributions and confidence.

Close the session

POST /api/therapy-sessions/stop finalizes the session. The Co-Therapy Agent drafts the session note with construct-anchored claims.

Co-Therapy Agent

Agent-level documentation.

Facial Engine

Physiological signals from video.

Vocal Engine

Acoustic features from audio.

Neurobehavioral Construct Constructs

15 documented constructs.

Building Your Intake Voice Agent Building Your Dictation Solution

​Architecture

​Multimodal channels

​Steps

​Related