MiDAS: A Multimodal Data Acquisition System and Dataset for Robot Assisted Minimally Invasive Surgery

Overview. MiDAS is an open-source, platform-agnostic multimodal data acquisition system and dataset for robot-assisted minimally invasive surgery, enabling time-synchronized, non-invasive recording of hand motion, video, and interaction signals across different robotic platforms.

Highlights

2

Robotic platforms (Raven-II, da Vinci Xi)

32

Trials (15 peg transfer, 17 suturing)

Multimodal

EM hand tracking, RGB-D hands, surgical video, foot pedals, robot states

Open & Non-Invasive

Open-source, platform-agnostic system with no proprietary hardware access required

Abstract

Purpose: Robot-assisted minimally invasive surgery (RMIS) research increasingly demands multimodal data, yet access to proprietary surgical robot telemetry remains a critical barrier. This work introduces MiDAS, an open-source, platform-agnostic system designed to enable time-synchronized, non-invasive multimodal data acquisition across diverse surgical platforms.

Methods: MiDAS integrates external sensors, including electromagnetic (EM) hand tracking, surgical video, RGB-Depth (RGB-D) hand tracking, and foot pedal interactions, without requiring proprietary hardware access. We validated the system on both the open-source Raven-II and the clinical da Vinci Xi, collecting datasets during peg transfer and hernia repair suturing tasks. We conducted correlation analysis to quantify how well external EM tracking approximates internal robot kinematics and performed downstream gesture recognition experiments with modality ablation studies.

Results: Correlation analysis confirms that EM hand tracking closely approximates robot kinematics for positional and rotational trajectories. Downstream gesture recognition experiments demonstrate that non-invasive motion signals (EM tracking) achieve performance comparable to proprietary robot kinematics. Furthermore, visual streams are shown to benefit significantly from domain-adaptive and self-supervised pretraining strategies.

Conclusion: MiDAS enables accurate, extensible, and reproducible multimodal data collection for surgical robotics research across both open and commercial platforms. The system successfully lowers barriers to data-driven learning in RMIS by providing a non-invasive alternative to proprietary data access.

Dataset & code available at https://uva-dsa.github.io/MiDAS/.

Experiments & Benchmarks

Correlation between EM hand tracking and robot kinematics across positional and rotational axes. — **Correlation Analysis.** Pearson correlations showing that external EM hand tracking closely matches internal robot position and orientation trajectories on Raven-II and da Vinci Xi.

Gesture recognition F1 scores by modality and model on Raven-II and da Vinci Xi. — **Gesture Recognition & Modality Ablation.** F1 scores for Transformer- and CNN-based models under different modality pairings, highlighting the benefit of combining EM motion, video, and interaction streams.

Dive deeper into the experimental setup and code on the MiDAS project page

Correlation Analysis

Quantify how well EM hand tracking approximates internal robot kinematics.

Gesture Recognition

Benchmark Transformer- and CNN-based models for gesture recognition from multimodal signals.

Modality Ablation

Identify effective combinations of motion, video, and interaction modalities for RMIS tasks.

Data Access

The MiDAS data collection system and multimodal datasets are publicly released to support research in surgical robotics, gesture recognition, and multimodal learning.

MiDAS Project Page — Dataset & Code
Additional hosting and derived resources — coming soon.

Data Collection System

MiDAS hardware stack with EM sensors, RGB-D cameras, video capture, and logging workstation. — **Integrated sensor stack.** MiDAS fuses EM hand tracking, RGB-D hand cameras, surgical video capture, and foot pedal sensing through a time-synchronized logging workstation that can attach to Raven-II or da Vinci Xi consoles.

The data collection rig deploys entirely external instrumentation on top of existing surgical robot consoles. A synchronization backbone aligns motion, video, and interaction streams so downstream methods can trust multimodal labels out of the box.

Open & Platform-Agnostic

Modular controllers interface with Raven-II, da Vinci Xi, or future teleoperation platforms without vendor APIs.

Non-Invasive Integration

External sensors mount to the surgeon console and OR equipment while preserving all clinical workflows.

Time-Synchronized Logging

Unified clocking aligns EM motion, RGB-D hands, stereo video, foot pedals, and robot state estimates.

Hand Motion: 6-DoF EM trackers on surgeon manipulators.
Visual Streams: RGB-D hand cameras plus laparoscopic video capture.
Interactions: Foot pedal telemetry and derived robot states.

Poster

BibTeX


Coming soon.