LANTERN-XGB interpretable multi-modal ML for NSCLC; external validation AUC ≈ 0.77

LANTERN-XGB: An Interpretable Multi-Modal Machine Learning for Improving Clinical Decision-Making in Lung Cancer

Dalfovo, D.; Sassorossi, C.; de Paolis, E.; Campanella, A.; Nachira, D.; Ciavarella, L.; Boldrini, L.;Troost, E. G. C.; Adany, R.; Farré, N.; Öztürk, E.; Minucci, A.; Trisolini, R.; Bria, E.; Löcke, S.; Margaritora, S.; Lococo, F.

Non-small cell lung cancer (NSCLC) remains the leading cause of cancer-related mortality globally. While multi-modal artificial intelligence (AI) models offer significant predictive potential, their translation into routine clinical practice is delayed by the “black box” nature of complex algorithms and the fragmentation of heterogeneous data. We present LANTERN-XGB, a hierarchical machine learning workflow designed to bridge this gap by generating interpretable “digital human avatars” for precision oncology. The methodology employs a multi-stage scalable tree boosting system (XGBoost) architecture utilizing shapley additive explanations (SHAP) for rigorous hierarchical feature selection, missing value management, and patient-specific decision support. The workflow was developed and benchmarked using a retrospective cohort of 437 patients with clinical N0 NSCLC, followed by validation on a prospective dataset (n = 100) and an independent external dataset (n = 100). The pipeline integrates diverse data modalities to predict occult lymph node metastasis (OLM). LANTERN-XGB identified a robust consensus signature driven by non-linear interactions among CT textural fragmentation, PET metabolic heterogeneity, tumor density distribution, and systemic clinical modulators. Exploratory transcriptomic pathway analysis (GSVA) revealed that high-risk predictions strongly correlate with systemic molecular dysregulation, such as the enrichment of immune-inflammatory signaling and metabolic stress pathways. The model achieved robust discrimination in external validation (AUC ≈ 0.77), performing comparably to state-of-the-art nomogram benchmarks. Crucially, the LANTERN-XGB framework demonstrated superior utility in handling diagnostic ambiguity; local force plots allowed for the correct reclassification of “borderline” prediction by visualizing feature interactions that standard linear models fail to capture. LANTERN-XGB provides a validated, open-source framework that successfully balances predictive power with clinical transparency. By empowering clinicians to visualize and verify the logic behind AI predictions, this workflow offers a pragmatic path for integrating reliable multi-modal avatars into daily medical decision-making.

Keywords:multi-modal integration; artificial intelligence; precision oncology; lung cancer; radiogenomics

International Journal of Molecular Sciences 27(2026), 3128

Bautzner Landstraße 400, 01328 Dresden - GermanyPhone: +49 351 260 - 0Email:kontakt@hzdr.de

Helmholtz-Zentrum Dresden-Rossendorf - Bautzner Landstraße 400, 01328 Dresden Germany

Telefon: +49 351 260-0 - https: