Background: Periprosthetic joint infection (PJI) after hip arthroplasty for proximal femur fracture is a severe complication, and early postoperative identification remains challenging. This study developed and validated machine learning (ML) models for the early prediction of 90-day EBJIS 2021 “confirmed” PJI using routinely available perioperative data. Methods: We performed a single-center retrospective study including 1182 consecutive adults undergoing primary hip arthroplasty for proximal femur fracture (2015–2022). Forty-seven perioperative candidate predictors were extracted, including early postoperative laboratory values (postoperative day 1–2 and maxima within 72 h). Six algorithms were trained and compared (logistic regression, random forest, support vector machine, multilayer perceptron, XGBoost, and stacking ensemble) using a stratified 80/20 training–test split with 10-fold cross-validation, grid-search hyperparameter tuning, and class weighting. A sensitivity-prioritizing classification threshold was derived using training data only and applied unchanged to evaluation cohorts. Uncertainty was estimated via 1000 bootstrap iterations. Calibration was assessed using the Brier score and calibration intercept/slope. Temporal validation was conducted in a same-center 2023 cohort (n = 147). Model explainability used SHAP. Results: EBJIS-confirmed 90-day PJI occurred in 58/1182 (4.9%) patients. In held-out testing, the final XGBoost model demonstrated good discrimination (AUC 0.889, 95% CI 0.804–0.960) with good overall calibration (Brier score 0.043). Using a prespecified sensitivity-prioritizing threshold selected in the training set, test-set sensitivity was 100%, specificity 58.5%, PPV 11.4%, and NPV 100%. The stacking ensemble yielded the highest discrimination (AUC 0.937; 95% CI 0.89–0.98). In temporal validation (same-center 2023 cohort; n = 147), model performance remained stable (AUC 0.892; sensitivity 85.7%; NPV 99.1% at the prespecified threshold). Calibration was favorable in the development cohort (Brier 0.041; intercept −0.04; slope 0.96) and in 2023 (Brier 0.038; intercept −0.06; slope 0.94). SHAP identified postoperative C-reactive protein, operative duration, body mass index, ASA class, and serum sodium as the most influential predictors. Conclusions: ML models, particularly XGBoost, supported early postoperative risk stratification for 90-day EBJIS-confirmed PJI after fracture-related hip arthroplasty, with a consistently high NPV and stable calibration in a temporally independent same-center cohort. Prospective multi-center validation and impact evaluation are needed before clinical implementation.
Early Prediction of 90-Day Periprosthetic Joint Infection After Hip Arthroplasty for Proximal Femur Fracture Using Machine Learning: Development and Temporal Validation of a Predictive Model / Biavardi, N. G.; Pezone, F.; Morlini, F.; Alessio-Mazzola, M.; Pace, V.; Antinolfi, P.; Placella, G.; Salini, V.. - In: JOURNAL OF CLINICAL MEDICINE. - ISSN 2077-0383. - 15:4(2026). [10.3390/jcm15041668]
Early Prediction of 90-Day Periprosthetic Joint Infection After Hip Arthroplasty for Proximal Femur Fracture Using Machine Learning: Development and Temporal Validation of a Predictive Model
Biavardi N. G.;Pezone F.;Placella G.;Salini V.
2026-01-01
Abstract
Background: Periprosthetic joint infection (PJI) after hip arthroplasty for proximal femur fracture is a severe complication, and early postoperative identification remains challenging. This study developed and validated machine learning (ML) models for the early prediction of 90-day EBJIS 2021 “confirmed” PJI using routinely available perioperative data. Methods: We performed a single-center retrospective study including 1182 consecutive adults undergoing primary hip arthroplasty for proximal femur fracture (2015–2022). Forty-seven perioperative candidate predictors were extracted, including early postoperative laboratory values (postoperative day 1–2 and maxima within 72 h). Six algorithms were trained and compared (logistic regression, random forest, support vector machine, multilayer perceptron, XGBoost, and stacking ensemble) using a stratified 80/20 training–test split with 10-fold cross-validation, grid-search hyperparameter tuning, and class weighting. A sensitivity-prioritizing classification threshold was derived using training data only and applied unchanged to evaluation cohorts. Uncertainty was estimated via 1000 bootstrap iterations. Calibration was assessed using the Brier score and calibration intercept/slope. Temporal validation was conducted in a same-center 2023 cohort (n = 147). Model explainability used SHAP. Results: EBJIS-confirmed 90-day PJI occurred in 58/1182 (4.9%) patients. In held-out testing, the final XGBoost model demonstrated good discrimination (AUC 0.889, 95% CI 0.804–0.960) with good overall calibration (Brier score 0.043). Using a prespecified sensitivity-prioritizing threshold selected in the training set, test-set sensitivity was 100%, specificity 58.5%, PPV 11.4%, and NPV 100%. The stacking ensemble yielded the highest discrimination (AUC 0.937; 95% CI 0.89–0.98). In temporal validation (same-center 2023 cohort; n = 147), model performance remained stable (AUC 0.892; sensitivity 85.7%; NPV 99.1% at the prespecified threshold). Calibration was favorable in the development cohort (Brier 0.041; intercept −0.04; slope 0.96) and in 2023 (Brier 0.038; intercept −0.06; slope 0.94). SHAP identified postoperative C-reactive protein, operative duration, body mass index, ASA class, and serum sodium as the most influential predictors. Conclusions: ML models, particularly XGBoost, supported early postoperative risk stratification for 90-day EBJIS-confirmed PJI after fracture-related hip arthroplasty, with a consistently high NPV and stable calibration in a temporally independent same-center cohort. Prospective multi-center validation and impact evaluation are needed before clinical implementation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


