✨MICCAI2025✨
Towards Holistic Surgical Scene Graph

Jongmin Shin*¹, Enki Cho*², Ka Young Kim*²,
Jung Yong Kim¹, Seong Tae Kim†², Namkee Oh†¹
¹Department of Surgery, Samsung Medical Center, Seoul 06351, Republic of Korea
²Kyung Hee University, Yongin 17104, Republic of Korea

Abstract

Surgical scene understanding is crucial for computer-assisted intervention systems, requiring visual comprehension of surgical scenes that involves diverse elements such as surgical tools, anatomical structures, and their interactions. To effectively represent the complex information in surgical scenes, graph-based approaches have been explored to structurally model surgical entities and their relationships. However, aspects such as tool–action–target combinations and the identity of the operating hand remain underexplored. To address this, we propose Endoscapes-SG201, a new dataset including annotations for action triplets (tool–action–target) and hand identity. We also introduce SSG-Com, a graph-based method designed to represent these critical elements. Experiments on downstream tasks—Critical View of Safety (CVS) assessment and action triplet recognition—demonstrate the importance of integrating these scene graph components, significantly advancing holistic surgical scene understanding.

Main Contributions

Key Contribution

Endoscapes-SG201

Construction

We were fortunate to build Endoscapes-SG201, a dataset for holistic scene graph research, by extending and refining the publicly available Endoscapes-Bbox201 dataset released by CAMMA. To annotate additional labels, two clinical experts from Samsung Medical Center refined the bounding boxes in Endoscapes-Bbox201.

  • Step 1: We refined Bounding Boxes from Endoscapes-Bbox201
  • Step 2: We subdivided the 'Tool' class into 6 classes
  • Step 3: We annotated Action labels (tool–structure interactions) and Hand Identity labels (which hand manipulates each tool)

Dataset Comparison

Dataset Comparison

This table contrasts the datasets used in previous surgical scene graph studies with Endoscapes-SG201.

  • Endoscapes-SG201 is designed with holistic scene graph research in mind.
  • It incorporates:
    • Diverse tools and anatomical structures as graph nodes.
    • Diverse relationships as graph edges.
    • Hand Identity labels as attributes of the tool nodes.
  • By unifying these elements, the dataset provides a more expressive and comprehensive foundation for modeling surgical scenes.

Endoscapes-SG201 Details

Endoscapes-SG201 Dataset Details

This table presents the category-wise distribution of the additional labels introduced in Endoscapes-SG201.

Additional Annotations:

  • 6 Surgical Instruments: Hook (HK), Grasper (GP), Clipper (CL), Bipolar (BP), Irrigator (IG), Scissors (SC)
  • 6 Surgical Actions: Dissect (Dis.), Retract (Ret.), Grasp (Gr.), Clip (Cl.), Coagulate (Co.), Null
  • 3 Hand Identities: Operator’s Right Hand (Rt), Operator’s Left Hand (Lt), Assistant’s Hand (Assi)

SSG-Com

SSG-Com Overall Architecture

SSG-Com is designed to leverage the diverse labels of Endoscapes-SG201.

  1. Graph Construction
    Nodes: Surgical instruments (with Hand identity), Anatomical structures
    Edges: Spatial relations, Surgical action relations
  2. Multi-task Training (3 classifiers)
    Classifier 1: Spatial relation classification
    Classifier 2: Action relation classification
    Classifier 3: Hand identity classification
    Total Loss: \[ L_{\text{total}} = L_{\text{LG}} + \lambda_{\text{action}} L_{\text{action}} + \lambda_{\text{hand}} L_{\text{hand}} \tag*{} \]

Experimental Results

The latent graph of SSG-Com demonstrated its effectiveness across two downstream tasks.

  • Action Triplet Recognition
  • CVS prediction

Quantitative Results

Quantitative Results

In Action Triplet Recognition (a):

  • Modeling action relations as graph edges between nodes improved performance from 18.0 mAP (LG-CVS) to 23.5.
  • Further incorporating Hand Identity increased performance to 24.2.

In CVS Prediction (b):

  • Using Endoscapes-SG201 improved the performance of LG-CVS by 0.9 mAP, and SSG-Com achieved the highest score of 64.6.

Qualitative Results

Qualitative Results
By employing Endoscapes-SG201 and SSG-Com, we demonstrate the ability to construct a richer holistic surgical scene graph compared to existing approaches.
Collaborations

The authors thank Ms. Haeun Kim, M.F.A., for her professional assistance with the illustrations in this work.