Abstract
Main Contributions

Endoscapes-SG201

We were fortunate to build Endoscapes-SG201, a dataset for holistic scene graph research, by extending and refining the publicly available Endoscapes-Bbox201 dataset released by CAMMA. To annotate additional labels, two clinical experts from Samsung Medical Center refined the bounding boxes in Endoscapes-Bbox201.
- Step 1: We refined Bounding Boxes from Endoscapes-Bbox201
- Step 2: We subdivided the 'Tool' class into 6 classes
- Step 3: We annotated Action labels (tool–structure interactions) and Hand Identity labels (which hand manipulates each tool)
Dataset Comparison

This table contrasts the datasets used in previous surgical scene graph studies with Endoscapes-SG201.
- Endoscapes-SG201 is designed with holistic scene graph research in mind.
- It incorporates:
- Diverse tools and anatomical structures as graph nodes.
- Diverse relationships as graph edges.
- Hand Identity labels as attributes of the tool nodes.
- By unifying these elements, the dataset provides a more expressive and comprehensive foundation for modeling surgical scenes.
Endoscapes-SG201 Details

This table presents the category-wise distribution of the additional labels introduced in Endoscapes-SG201.
Additional Annotations:
- 6 Surgical Instruments: Hook (HK), Grasper (GP), Clipper (CL), Bipolar (BP), Irrigator (IG), Scissors (SC)
- 6 Surgical Actions: Dissect (Dis.), Retract (Ret.), Grasp (Gr.), Clip (Cl.), Coagulate (Co.), Null
- 3 Hand Identities: Operator’s Right Hand (Rt), Operator’s Left Hand (Lt), Assistant’s Hand (Assi)
SSG-Com

SSG-Com is designed to leverage the diverse labels of Endoscapes-SG201.
-
Graph Construction
Nodes: Surgical instruments (with Hand identity), Anatomical structures
Edges: Spatial relations, Surgical action relations -
Multi-task Training (3 classifiers)
Classifier 1: Spatial relation classification
Classifier 2: Action relation classification
Classifier 3: Hand identity classificationTotal Loss: \[ L_{\text{total}} = L_{\text{LG}} + \lambda_{\text{action}} L_{\text{action}} + \lambda_{\text{hand}} L_{\text{hand}} \tag*{} \]
Experimental Results
The latent graph of SSG-Com demonstrated its effectiveness across two downstream tasks.
- Action Triplet Recognition
- CVS prediction
Quantitative Results

In Action Triplet Recognition (a):
- Modeling action relations as graph edges between nodes improved performance from 18.0 mAP (LG-CVS) to 23.5.
- Further incorporating Hand Identity increased performance to 24.2.
In CVS Prediction (b):
- Using Endoscapes-SG201 improved the performance of LG-CVS by 0.9 mAP, and SSG-Com achieved the highest score of 64.6.
Qualitative Results


The authors thank Ms. Haeun Kim, M.F.A., for her professional assistance with the illustrations in this work.