Research Article: Development and validation of a machine learning model for predicting hypersplenism in Wilson disease patients
Abstract:
Wilson disease (WD) is a rare autosomal recessive copper metabolism disorder, with hypersplenism as a severe, common complication secondary to disease-related cirrhosis. Currently, there is a lack of precise early prediction tools for this complication. This study aimed to construct a hypersplenism prediction model for WD patients by integrating multidimensional clinical indicators and machine learning, providing references for early identification of high-risk individuals and personalized interventions.
A total of 524 WD patients were enrolled at the First Affiliated Hospital of Anhui University of Chinese Medicine from December 2019 to February 2025, including 244 with hypersplenism (HG) and 280 without (non-HG). After Key variables were selected through LASSO regression feature selection. Variate multicollinearity within the model was assessed using variance inflation factors (VIF). The predictive model was visualized using a nomogram. Five machine learning models were built with 10-fold cross-validation for parameter optimization. Finally, the model performance was evaluated, and the feature contributions were explained using the SHapley Additive exPlanations (SHAP) method.
Compared with the non-HG group, the HG group had significantly lower WBC, PLT, and ceruloplasmin (CER), and higher A/G, PIIINP, CIV, hyaluronic acid (HA), laminin (LN), and 24-h urinary copper (CUU) (all p <?0.05). Multivariate logistic regression showed A/G, CIV, and PIIINP were independent risk factors, while WBC and PLT were independent protective factors. The SVM model performed best: training set AUC?=?0.867 (95% CI: 0.830–0.904), accuracy?=?0.807, specificity?=?0.856, precision?=?0.812, F1 score?=?0.771; test set AUC?=?0.771 (95% CI: 0.699–0.844) with AUC decay <10%. It also had excellent calibration (training set Brier score?=?0.146, test set?=?0.206) and clinical utility via DCA. SHAP analysis identified PIIINP as the core predictive feature, followed by WBC, PLT, and A/G, with CIV having relatively weaker influence.
The SVM-based predictive model exhibits superior discriminatory power, calibration accuracy, and clinical utility for hypersplenism in WD patients. The five key features (WBC, PLT, A/G, CIV, PIIINP) with PIIINP as the core provide an objective quantitative basis for risk stratification, facilitating early identification and precise intervention of high-risk patients and improving WD prognosis.
Introduction:
Wilson disease (WD) is an autosomal recessive disorder of copper metabolism caused by mutations in the ATP7B gene ( 1 ). Dysfunction of the transmembrane copper transport ATPase encoded by this gene directly impairs the normal excretion of copper ions into bile or their binding to ceruloplasmin. This leads to progressive accumulation of copper in vital organs such as the liver and brain, triggering multisystem damage through mechanisms including oxidative stress and cytotoxicity ( 2 ). As the primary target organ…
Read more