Goals and procedure
The objective of MusteR-FM is to develop a reusable foundation model for minimally invasive surgical and endoscopic workflows. The model is intended to jointly capture two different clinical imaging domains: laparoscopic videos with rigid optics and a wide field of view, and flexible gastrointestinal endoscopy in narrow luminal structures. This creates a shared model basis that can generalize beyond individual datasets and types of interventions.
The project builds a multimodal spatio-temporal model architecture. Existing clinical video data and annotations are used to link visual information with medical-procedural descriptions, phase information and indicators of visibility quality. As a result, the model should not only capture image content, but also model the temporal course of an intervention and infer cues about upcoming workflow transitions.
Evaluation is carried out using clinically validated laparoscopic data as well as endoscopic ESD videos from the participating clinical collaborations. In addition, publicly available datasets are included to investigate the robustness and transferability of the learned representations. The results are to be documented transparently and made usable for non-commercial research. Planned outputs include model and data cards, reproducible evaluation protocols and the open provision of suitable artefacts.
The resulting foundation model thus provides a basis for various downstream tasks such as workflow recognition, phase classification, prediction of next process steps, object detection, segmentation, image retrieval, quality assurance and training support. At the same time, MusteR-FM strengthens the connection between university-based AI research, clinical application and medical technology transfer in Bavaria.
Funding BY
Bayerisches Staatsministerium für Wissenschaft und Kunst as part of the Bavarian AI Foundation Model Initiative.
Project page: https://www.ai-bay.eu/#