EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services

1 University of Virginia
AAAI 2026
EgoEMS overview figure
Overview. High‑fidelity egocentric multimodal EMS dataset with rich annotations to advance cognitive assistance research.

Highlights

20+ hrs

Synchronized multimodal recordings

233

Simulated emergency scenarios

62

Participants (46 EMS professionals)

Modalities & Annotations

Egocentric video, audio, IMU • Keysteps, diarized transcripts, CPR metrics, boxes & masks

Abstract

Emergency Medical Services (EMS) are critical to patient survival in emergencies, but first responders often face intense cognitive demands in high-stakes situations. AI cognitive assistants, acting as virtual partners, have the potential to ease this burden by supporting real-time data collection and decision making. In pursuit of this vision, we introduce EgoEMS, the first end-to-end, high-fidelity, multimodal, multiperson dataset capturing over 20 hours of realistic, procedural EMS activities from an egocentric view in 233 simulated emergency scenarios performed by 62 participants, including 46 EMS professionals. Developed in collaboration with EMS experts and aligned with national standards, EgoEMS is captured using an open-source, low-cost, and replicable data collection system and is annotated with keysteps, timestamped audio transcripts with speaker diarization, action quality metrics, and bounding boxes with segmentation masks. Emphasizing realism, the dataset includes responder-patient interactions reflecting real-world emergency dynamics. We also present a suite of benchmarks for real-time multimodal keystep recognition and action quality estimation, essential for developing AI support tools for EMS. We hope EgoEMS inspires the research community to push the boundaries of intelligent EMS systems and ultimately contribute to improved patient outcomes.

Benchmarks

EgoEMS benchmarks figure
Tasks. Keystep classification, keystep segmentation, and CPR quality estimation.

Learn more in Benchmarks

Keystep Classification

Classify procedural steps from multimodal input.

Keystep Segmentation

Detect transitions between steps over time.

CPR Quality Estimation

Estimate rate and depth from IMU and video.

Data Access

The full dataset will be hosted publicly. Links will appear here when released.

  • Harvard Dataverse — coming soon
  • Alternate hosting — TBD

Data Collection System

Data Collection System
Open-source, low-cost, and replicable system for synchronized egocentric multimodal data capture.
  • Open-source, low-cost, replicable capture system
  • Synchronized egocentric video, audio, smartwatch IMU

Preprint

BibTeX