Decoding the renal-cochlear axis: explainable machine learning and phenotype clustering reveal high-risk hearing loss subtypes in CKD.

DOI	10.1080/0886022x.2026.2649658
Authors	Chen L, Wang J, Liu G, Zhao Y, Zhou Z, Li Q.
Journal	MED
Source	External record

This study develops a dual-level machine learning framework for risk stratification and phenotyping of hearing loss (HL) in patients with chronic kidney disease (CKD) using data from the National Health and Nutrition Examination Survey (NHANES). From a cohort of 3,402 CKD patients, feature selection via univariate and multivariate logistic regression identified key predictors, which were used to construct predictive models with nine machine learning algorithms. The eXtreme Gradient Boosting (XGBoost) model demonstrated superior performance, with mean area under the curve (AUC) values of 0.984 (training), 0.984 (validation), and 0.939 (testing). SHapley Additive exPlanations (SHAP) interpretation identified age as the predominant risk determinant. Subsequent Gaussian mixture modeling (GMM) clustered patients into two distinct subtypes: a low-risk subgroup (n = 1,075) with a 1.58% HL prevalence and a high-risk subgroup (n = 2,316) characterized by older age, elevated blood urea nitrogen and bicarbonate levels, and a 48.2% HL prevalence. A classifier trained on these subtypes achieved discrimination (AUC = 0.99974). A clinically a web-based tool was also developed based on the six most influential features. The findings establish a dual-level predictive framework integrating explainable ML and unsupervised clustering for HL risk assessment in CKD. This approach provides a robust strategy for the precision screening of high-risk subpopulations and supports the integration of hearing assessments into routine CKD.