Research Article: Associations between metabolic-inflammatory biomarkers and Helicobacter pylori infection: an interpretable machine learning prediction approach
Abstract:
This study investigated the association between metabolic-inflammatory markers and Helicobacter pylori (HP) infection using interpretable machine learning models, with a focus on the triglyceride-glucose (TyG) index, TyG/HDL-C ratio, and systemic inflammatory biomarkers.
Data from 2,924 NHANES participants and 1,021 patients from the Second Hospital of Jilin University were analyzed. Associations between metabolic-inflammatory markers and HP were assessed using multivariable regression. Eleven machine learning models were compared for predictive performance, evaluated by AUC, accuracy, sensitivity, specificity, precision, F1 score, and Kappa statistic. Interpretability was assessed via SHAP values, calibration plots, confusion matrices, and decision curve analysis.
In NHANES, the TyG index was independently associated with HP infection (OR = 1.25, 95% CI 1.06–1.48, P = 0.009), and the TyG/HDL-C ratio remained significant after full adjustment (OR = 1.16, 95% CI 1.07–1.25, P < 0.001), while SIRI, IBI, and CRP lost significance. In the external Chinese cohort, the TyG association attenuated ( P = 0.057), but higher TyG/HDL-C quartiles remained significant. Among 11 algorithms, Random Forest (RF) and Gaussian Process (GP) achieved the highest AUCs on the training set (both 0.97) but dropped markedly on the validation set (both 0.75), indicating overfitting. In contrast, XGBoost (XGB) and MLP maintained more consistent AUCs between training (0.77) and validation (0.77), reflecting better generalization. DeLong’s test indicated that both RF and XGB significantly outperformed baseline models ( P < 0.001), while XGB demonstrated more stable validation performance. Decision curve and SHAP analyses supported the clinical relevance of XGB, highlighting Race and Age as dominant contributors.
The TyG index and TyG/HDL-C ratio were independently associated with HP infection. Among machine learning models, XGBoost demonstrated the most stable and generalizable performance (AUC 0.77 in both training and validation), whereas RF and GP (AUC 0.97 ? 0.75) exhibited overfitting. These results suggest that XGB provides a more reliable framework for infection risk prediction, though the cross-sectional design precludes causal inference.
Introduction:
This study investigated the association between metabolic-inflammatory markers and Helicobacter pylori (HP) infection using interpretable machine learning models, with a focus on the triglyceride-glucose (TyG) index, TyG/HDL-C ratio, and systemic inflammatory biomarkers.
Read more