Research Article: Radiomics-machine learning model for predicting invasiveness of subcentimeter subsolid lung adenocarcinoma: a validation study with external cohort and SHAP interpretability
Abstract:
Preoperative discrimination of invasive adenocarcinoma (IAC) from pre-invasive lesions in subcentimeter subsolid nodules (SSNs) remains challenging using conventional computed tomography (CT). We aimed to develop and validate an interpretable radiomics-machine learning (ML) model for predicting invasiveness by leveraging SHapley Additive exPlanations (SHAP).
In this two-center retrospective study, 177 patients from Hospital 1 (training and internal validation) and 83 patients from Hospital 2 (independent external validation) with surgically confirmed lung adenocarcinoma manifesting as SSNs (?1 cm) were enrolled. Radiomic features were then extracted from preoperative CT using the uAI Research Portal. Following a reproducibility assessment (intraclass correlation coefficient >0.75), the minimum Redundancy Maximum Relevance (mRMR) and Least Absolute Shrinkage and Selection Operator (LASSO) regression were applied to select the most predictive features. Three ML classifiers: logistic regression (LR), random forest (RF) and support vector machine (SVM) were trained and validated using a 7:3 cohort split, and the best-performing model was further evaluated in the external validation cohort. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, F1 score, calibration, and decision curve analysis (DCA). SHAP analysis was employed to provide global and local model interpretability.
A set of ten radiomic features was selected to predict invasiveness (IAC prevalence: 44.6%). The LR model demonstrated optimal performance during internal validation (AUC: 0.842; sensitivity: 79.2%; specificity: 73.3%; F1 score: 0.745) and exhibited superior generalizability compared to both the RF and SVM models. In the external validation cohort, the LR model maintained robust diagnostic performance, with an AUC of 0.778 (95%CI: 0.673-0.862), confirming its cross-institutional generalizability. The DCA and PRC curves further confirmed its clinical utility and stability across different institutions. SHAP analysis identified wavelet_HLL_glszm_LowGrayLevelZoneEmphasis (an indicator of necrosis), original_shape_Flatness (reflecting morphological irregularity), and log_firstorder_LoG.Minimum (suggestive of air-trapping) as top predictors of invasiveness. Decision curve analysis confirmed the model’s superior clinical utility over empirical management strategies.
The developed radiomics-LR model robustly predicts invasiveness in subcentimeter SSNs and provides biologically plausible explanations through SHAP. Its balanced performance and inherent interpretability support its potential integration into clinical workflow to aid in surgical decision-making.
Introduction:
Preoperative discrimination of invasive adenocarcinoma (IAC) from pre-invasive lesions in subcentimeter subsolid nodules (SSNs) remains challenging using conventional computed tomography (CT). We aimed to develop and validate an interpretable radiomics-machine learning (ML) model for predicting invasiveness by leveraging SHapley Additive exPlanations (SHAP).
Read more