✨MICCAI 2025✨
SurgX: Neuron-Concept Association for Explainable Surgical Phase Recognition

Ka Young Kim, Hyeon Bae Kim, Seong Tae Kim†
Kyung Hee University, Yongin, Republic of Korea

Abstract

Surgical phase recognition is central to workflow analysis, enabling applications such as monitoring, skill assessment, and process optimization. However, the underlying deep models are often black boxes, limiting interpretability and trust. SurgX is a concept-driven explanation framework that associates neurons with human-interpretable surgical concepts. We construct cholecystectomy-specific concept sets, identify representative neuron activation sequences, and annotate neurons with concepts. Evaluated on TeCNO and Causal ASFormer using Cholec80, SurgX yields meaningful explanations and improves transparency in surgical AI.

Main Contributions

  • Introduce SurgX, a concept-based explanation framework for surgical phase recognition.
  • Develop cholecystectomy-tailored concept sets and conduct ablation studies on best practices for concept selection, including how to choose representative frames and representative sequences for neuron–concept annotation.
  • Validate SurgX on Causal ASFormer and TeCNO, showing consistent concept–neuron associations that enhance interpretability.

SurgX STEP 1. Neuron–Concept Annotation

overall
The pipeline for annotating concepts to neurons proceeds in three stages:
    A. Neuron Representative Sequence Selection – Select representative activation sequences for each neuron.
    B. Concept Set Selection – Choose among three concept sets; ChoLec-270 performs best in our study.
    C. Neuron–Concept Association – Match neuron sequences with concepts via similarity in a surgical VLM space.
Details of each stage are provided below.

A. Neuron Representative Sequence Selection

representative sequence selection
Given a trained temporal phase recognizer (e.g., Causal ASFormer or TeCNO), we first select frames that yield high activations in the penultimate layer. Because temporal models respond to sequences rather than single frames, we extend each selected frame with its preceding frames to form a representative sequence. Ablation studies are summarized below.

Ablation Study: Frame Selection

  • Concept Alignment: Cosine similarity between key neuron’s concepts and the groundtruth class
  • Prediction Interpretability: Cosine similarity between key neuron’s concepts and predicted class
frame selection ablation

Ablation Study: Sequence Length

sequence length ablation

B. Concept Set Selection

concept set selection
Appropriate concept coverage is critical: if a neuron’s behavior is not representable by the concept set, reliable annotation is impossible. We therefore construct three cholecystectomy-related concept sets and compare them empirically.

Ablation Study: Concept Sets

concept set ablation

C. Neuron–Concept Association

neuron–concept association
Using the selected sequences and concept set, we compute cosine similarity in a surgical VLM space (e.g., SurgVLP, PeskaVLP) between each neuron’s representative sequence and each concept text, and assign to each neuron the concepts with highest similarity.

SurgX STEP 2. Model's Prediction Explanation

models prediction explanation
The explanation process at test time proceeds as follows:
  • Input surgical frames are processed by the phase recognition model.
  • Neuron contributions to the final prediction are computed using the contribution formula.
  • Important neurons (e.g., Neuron 2, Neuron 128) are identified, and their annotated concepts (“extract the gallbladder into the bag”, “specimen bag”, “gallbladder”) explain the model’s decision.

Qualitative Results – Explanation Examples

0/0 (0:00/0:00)