More information about text formats
We read with interest the article by Kulkarni et al. , which highlights the advances in EKG pattern-recognition to help screen and identify patients with early-stage chronic diseases, particularly type 2 Diabetes (T2D).
While we recognize the rationale behind the inclusion criteria of the DISFIN study, the included population has a high insulin resistance-diabetes prevalence. However, it is important to note that the participants have a uniformly low A1c value and the time of exposure to hyperglycemia and treatment regimens are unknown.  These two variables are essential to consider since both have been described to have a direct relationship with micro and macrovascular complications that could impact EKG features and, thus, the performance of this model. 
Kulkarni et al.  do not clarify time since diagnosis of pre-diabetes or T2D and what treatment regimen each patient is undergoing. We ponder if patient classification based on the time of diagnosis, level of hyperglycemia and treatment regimen could help us better understand the onset and biological mechanisms behind EKG feature changes that help better identify subjects with hyperglycemia in all of its spectrum from pre-diabetes to T2D.
When choosing the ML technique, we noticed that the authors used a K-Fold Cross-Validation scenario for the six candidates. In our opinion, this may result in inconsistencies and skewness on the “K” folds of subsets of the dataset,...
When choosing the ML technique, we noticed that the authors used a K-Fold Cross-Validation scenario for the six candidates. In our opinion, this may result in inconsistencies and skewness on the “K” folds of subsets of the dataset, mainly because of the imbalance in the dataset. We would favor using a Stratified K-Fold Cross-Validation, an extension of the regular K-Fold Cross-Validation. This technique avoids such inconsistencies by maintaining the class-ratio of the data while generating the “K” subsets of the data. Thus, the same class distribution occurs when these “K” folds are concatenated to form the final complete dataset. Also, using Synthetic Minority Oversampling Technique (SMOTE) may result in an increased overlapping of classes and can bring in additional noise. For this, we suggest combining SMOTE with an undersampling technique, specifically Edited Nearest Neighbour (ENN), which removes the data points on the class boundary, increasing the separation between classes and reducing possible bias.
1.Kulkarni AR, Patel AA, Pipal KV, et alMachine-learning algorithm to non-invasively detect diabetes and pre-diabetes from electrocardiogramBMJ Innovations 2023;9:32-42.
2.Stratton, I. M., Adler, A. I., Neil, H. A., Matthews, D. R., Manley, S. E., Cull, C. A., Hadden, D., Turner, R. C., & Holman, R. R. (2000). Association of glycaemia with macrovascular and microvascular complications of type 2 diabetes (UKPDS 35): prospective observational study. BMJ (Clinical research ed.), 321(7258), 405–412. https://doi.org/10.1136/bmj.321.7258.405