Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students

[ad_1]

2. Literature Review

Many studies in the literature have utilized machine learning techniques for forecasting the risk of obesity/overweight and the associated health conditions.

In a prior work, based on the number of easily accessible exposome factors, the researchers presented a novel, fair machine learning approach for predicting the risk of cardiovascular diseases (CVDs) and type 2 diabetes (T2DM) [3]. They assessed their model using multi-center cohorts from internal and external validation groups. They found 5348 and 1534 individuals from the UK Biobank who, within 13 years of their initial visit, had been given the diagnosis of T2DM and CVD, respectively. As the control group, a comparable number of individuals who did not experience these medical conditions were randomly chosen. From the individual’s baseline visit, 109 simple-to-access exposure variables from six distinct groupings (physical measurements, environmental, lifestyle, mental wellness events, sociodemographics, and early-life characteristics) were considered. To predict those who were likely to contract the diseases, they used the XGBoost ensemble model. The model was contrasted with a machine learning model that considered many elements, such as biological, clinical, physical, and sociodemographic elements, and the Framingham risk score for CVD. Additionally, they examined the suggested model for any sex, racial, and age-related biases. Finally, they used SHAP, a cutting-edge explain-ability method, to analyze the model’s outcomes. Despite only utilizing exposome data, the proposed ML model has comparable results to the integrative ML model, obtaining ROC-AUC values of 0.78 ± 0.01 and 0.77 ± 0.01 for CVD and T2DM, respectively. Furthermore, the exposome-based approach outperforms the conventional Framingham risk score in predicting CVD risk. Additionally, they discovered exposome characteristics such as daytime naps, prior cigarette use, frequency of fatigue/lack of excitement, and other factors that are crucial in identifying individuals who are at risk of developing CVD and T2DM.
Along the same line, a study used the retinal scans of 3000 residents of Qatar to build deep learning models. The researchers examined certain factors, like age, sex, blood pressure, smoking habits, blood sugar levels, lipid levels, sex hormones, and body composition measurements, to predict the risks related to CMDs by analyzing pictures of the back of the eye [4]. They also investigated how age and sex influence the accuracy of predicting these health risks using eye pictures. They used deep learning models based on the MobileNet-V2 architecture to combine information from images of both eyes’ optic discs and maculae and make individual-level predictions. They could accurately predict age and sex with a small error in age prediction (2.78 years) and high accuracy in sex prediction (area under the curve: 0.97). On the other hand, the predictions of systolic blood pressure, diastolic blood pressure, hemoglobin A1c, relative fat mass, and testosterone had acceptable levels of accuracy (errors: 8.96mmHg, 6.84mmHg, 0.61%, 5.68 units, and 3.76 nmol/L, respectively). The researchers concluded that age and sex can be accurately predicted from an eye picture and that certain data related to blood pressure, hemoglobin A1c, and body fat composition can be identified in the retina.
Electronic healthcare record (EHR) data of children up to the age of two years were used to create seven machine learning models to predict pediatric obesity (2 to 7 years). The Children’s Hospital of Philadelphia provided EHR information for 860,510 patients with 11,194,579 clinical visits. After applying strict quality control measures to remove unrealistic growth values and including only participants who had all recommended health checkups by the age of seven years, 27,203 individuals (50.78% male) were chosen for model development. The goal was to predict obesity based on the Centers for Disease Control and Prevention’s definition, which considers a BMI greater than the 95th percentile, adjusted for age and sex, as obese [5]. The performance of the seven machine learning models was evaluated using various metrics commonly used for classifiers. The Cochran’s Q test and post hoc pairwise testing were used to compare the performance of the different models. The XGBoost model achieved the highest area under the curve (AUC) score of 0.81 (0.001), outperforming all of the other models. It also statistically outperformed all other models in terms of precision (30.90%), F1-score (44.60%), accuracy (66.14%), and specificity (63.27%) when the sensitivity was set at 80%.
In a study conducted between 2017 and 2018, 284 male university students from Saudi Arabia’s Eastern Province examined the relationship between cardiometabolic (CM) risk factors and blood pressure. The objective was to investigate this relationship among young Saudi males in a university setting [6]. Various measurements were taken, including the waist-to-hip ratio, blood pressure, body mass index, body adiposity index, waist-to-height ratio, body fat percentage, waist circumference, and basal metabolic rate. The United States of America Sixth Joint National Committee guidelines were used to classify blood pressure. The results indicated that blood pressure was significantly correlated with CM risk factors among young Saudi males. The prevalence of prehypertension was 31.3%, and that of hypertension was 16.2%. Furthermore, the study found that 28.5% of participants were classified as being overweight, and 14.1% were classified as being obese. Additionally, the study highlighted the strong association between a sedentary lifestyle, obesity, and cardiovascular morbidity and mortality. Unfortunately, young students tend not to consider the future risk of cardiovascular diseases associated with a sedentary lifestyle.
Waleed et al. (2021) conducted a study to assess the occurrence of adiposity and evaluate the risk of CMD among university students in the Eastern Province of Saudi Arabia. A total of 310 students (127 males and 183 females) were examined using standardized instruments to measure various adiposity indicators, including Mass of Body Fat (MBF), body fat percentage (BFP), BMI, visceral fat area (VFA), waist circumference (WC), waist-to-hip ratio (WHR), Fat Mass Index (FMI), and A Body Shape Index (ABSI). Indicators of CMD risk, such as the Conicity Index (C index), WC, and WHR, were also calculated. The results showed that most students were either classified as being overweight or obese, with males having higher levels of adiposity compared to females. Additionally, male students had significantly higher percentages of CMD risk indicators than females. Positive correlations were observed between the C index quartiles and BMI with other CMD risk indicators [2]. These findings highlight the need for the early prediction and prevention of adiposity-related health issues and for policymakers to raise awareness about healthy eating habits and the link between physical inactivity and chronic diseases among university students.
Furthermore, a previous study was conducted to accurately identify different subtypes of heart failure (HF), which could help with personalized treatment approaches. Although machine learning has been utilized in previous research to investigate the subtypes of HF, such approaches have been limited in their application to small datasets and have not comprehensively addressed the diverse causes and presentations of HF. Furthermore, validation through multiple machine learning methods and large, independent, population-based datasets has yet to be conducted. To address these limitations, researchers have used published libraries to identify and validate the HF subtypes. The researcher used four unsupervised machine learning algorithms (clustering algorithms), and they compared the results. The algorithms are K-means, hierarchical clustering, K-adenoids, and mixture modeling algorithms [7]. However, the study did not focus on how the machine learning algorithms are implemented but on the subtypes of heart failure itself and on discovering any hidden relations among other diseases, including cardiometabolic diseases, besides clinical diagnostics and laboratory tests. The researchers could identify five subtypes of heart failure with high accuracy for distinguishing between them both within and across datasets. These subtypes were also found to have good predictive accuracy for mortality within one year.
The research article by O’Sullivan et al. (2020) included pediatric studies on the relationships between whole-fat and reduced-fat dairy consumption and adiposity measurements, as well as indicators of the risk of cardiometabolic illness. Most of the research, which was primarily observational, revealed in the review that whole-fat dairy products were not linked to higher levels of weight gain or adiposity. Also, there is insufficient evidence supporting switching from whole-fat to reduced-fat dairy for better results for specific risk factors. However, whole-fat dairy intake was typically not linked to an elevated cardiometabolic risk. The analysis drew attention to the absence of randomized controlled studies comparing the health effects of whole-fat dairy to reduced-fat dairy in children, which would have produced more trustworthy data. The authors contend that obtaining improved quality data in this area requires high-quality randomized controlled studies among kids. Also, the authors emphasize the necessity to consider the type of dairy product ingested, any production or processing processes, and any potential impact change due to a person’s sex, stage of puberty, or level of body fat. Lastly, the authors stress the significance of evidence-based dietary recommendations for a child’s dairy fat consumption to address the rising public health issue of childhood obesity and lower the risk of developing chronic illnesses [8].
Arisaka O. et al. (2020) published a study aiming to assess the latest research on the association between rapid early growth and the subsequent risks of developing obesity and conditions in the future. The research specifically draws attention to the varied degrees of relevance associated with fast weight increase in the early childhood and infantile stages. They assess infantile obesity, adiposity rebound, catch-up growth (CUG), sexual dimorphism, the early prediction of future cardiometabolic risk, and the evaluation of rapid weight gain and adiposity rebound, noting that both early and late rapid weight increase throughout infancy and the early years of childhood may portend a future risk of obesity. An infant’s weight often drops in the first 7 to 14 days after birth and then rapidly increases in the next six months. The relevance of early infancy was noted as a potential predictor of future obesity, particularly for people who were born underweight and experienced rapid catch-up growth. Also, the rapid weight growth in toddlers throughout the first three years is strongly linked to cardiometabolic risk [9].
Tsai T. et al. (2020) conducted a cross-sectional study aimed at clustering cardiometabolic risk factors and sedentary behavior using a factor analysis. The study involved 210 adults aged 20–65 years who were recruited from a community in South Korea. The researchers collected data on the subjects’ anthropometric and biochemical measurements, sedentary behavior, and physical activity. A factor analysis was used to identify the patterns of cardiometabolic risk factors and sedentary behavior. The study found that sedentary behavior and cardiometabolic risk factors were positively correlated and that the clustering of these factors could be used to discover individuals who are at risk of developing cardiometabolic diseases. The study highlights the importance of reducing sedentary behavior and addressing multiple cardiometabolic risk factors to prevent cardiometabolic diseases [10].
Berkowitz S. et al. (2019) [11] investigated the association between access to social service resources and CMD risk factors using machine learning and multilevel modeling analysis. 11,638 people from the American NHANES were included in the study. To categorize individuals according to their access to resources for social services including health insurance, housing help, and food assistance, the researchers employed machine learning algorithms. The potential relationship between the availability of social service resources and the prevalence of risk factors for cardiometabolic disorders, such as but not limited to obesity, diabetes, and hypertension, was then examined using multilevel modeling analysis. The research discovered a link between social service resources and a reduced risk of obesity, diabetes, and high blood pressure. The researchers propose that increasing access to social service resources might be a successful method for lowering cardiometabolic risk variables in the populace.
Machine learning methods and a healthy diet score were used by Shang X. et al. (2020) to examine the primary dietary determinants influencing changes in cardiometabolic risk factors in children. 1550 youngsters between the ages of 5 and 12 participated in the study. The researchers used machine learning to determine the main factors in our diet that affect our risk of heart and metabolic problems over time. They also developed a score based on healthy eating guidelines to see how diet quality affects these risk factors. The study found that things like processed foods and sugary drinks greatly impact children’s risk factors. The healthy diet score was also a good predictor of changes in these risk factors. The researchers believe that if we help kids eat better can make a difference in reducing their risk of heart and metabolic problems and improving their long-term health. This study shows that machine learning and a healthy diet score can help us identify the most important dietary factors that contribute to these health issues in children, which is crucial for preventing heart and metabolic diseases [12].
Taghiyev A. et al. (2020) conducted a study that used machine learning techniques to identify the causes of obesity. They developed a hybrid model in two separate stages: the first stage includes feature selection, while the second stage includes classification. They also compared the hybrid model with other classifiers, such as Decision Trees and Logistic Regression. The hybrid model designed by the researchers gives a more accurate classification of obese people as well as a valuable technique for measuring obesity-related characteristics. They achieved 91.4% accuracy, 90.4% recall, and 92.9% specificity, which turned out to be better than DT and LR [13].
Chatterjee Ayan. et al. (2020) published a paper about “Identification of Risk Factors Associated with Obesity and Overweight”, in addition to an overview of the same topic. The used dataset contains 500 records with four parameters: gender, height, weight, and index. The index attribute includes five numbers, each indicating a different obesity level. In the preprocessing phase, they added a new feature, “BMI” to the dataset. Due to the high correlation, this feature was removed later in the model training. Moreover, they developed five classifiers, Support Vector Machine, Naïve Bayes, Decision Tree, and K-Nearest Neighbor. The result shows the Support Vector Machine (SVM) provided the most optimal classification, with 95% accuracy [14].
Ferdowsy F. et al. (2021) worked on a machine learning model that predicts obesity risk. The dataset used in this approach contains 1000 records that include both obesity and non-obesity people of different ages. They used different classification algorithms such as K-Nearest Neighbor (KNN), Logistic Regression, multilayer perceptron (MLP), SVM, Naïve Bayes (NB), Adaptive Boosting (ADA Boosting), Decision Tree, and Gradient Boosting. Moreover, they used the performance metrics to measure the performance of each classifier individually. The Logistic Regression classifier shows the best accuracy among all the other classifiers, with 97.09%, while the Gradient Boosting classifier achieves the lowest accuracy, 64.08% [15].
Avery big study analyzed Magnetic resonance imaging (MRI) imaging data of 40,032 participants from the UK Biobank. The researchers used previously collected data on three types of adipose tissue volume from up to 9041 participants to train convolutional neural networks (CNNs) to calculate deviations in the adipose tissue depots of the participants. These deviations were calculated independently of BMI and were used to uncover relationships with cardiometabolic diseases. The study found that CNNs using two-dimensional projected images were highly accurate in predicting the adipose tissue volumes. However, there was significant heterogeneity in the associations between local adiposity measures and cardiometabolic diseases. Therefore, it is found that using deep learning models with MRI data could provide highly accurate results on adipose tissue volumes and that local adiposity measures have varying associations with cardiometabolic diseases at different BMIs [16].
Machine learning techniques have been used to predict obesity by analyzing publicly available health data. In this regard, various classifiers, including LR, CART, and NB, utilized the Synthetic Minority Oversampling Technique to account for data imbalance and predict overweight status based on risk factors. The dataset included BMI as one of its main features, and the researchers applied some preprocessing techniques, such as eliminating missing values, before utilizing it. Their findings indicated that Logistic Regression was the most effective classifier for predicting obesity with the highest performance [17]. On the other hand, a systematic literature review of 93 papers was conducted to determine the machine learning models suitable for detecting obesity. The review found that obesity is closely linked to co-morbidities like CVD and chronic conditions, underscoring the significance of using machine learning techniques for early detection. The researchers noted that the most commonly used approach in the literature for detecting obesity is Artificial Neural Networks (ANN) [18].
Kerkadi A. et al. (2020) conducted a study aimed to analyze several techniques for measuring adiposity to identify persons from Qatar who were at risk of developing CMDs. Five hundred fifty-eight healthy Qatari adults between the ages of 20 and 50 were randomly chosen from survey data from the Qatar Biobank. The researchers also collected anthropometric information, such as height, along with information obtained from dual-energy X-ray absorptiometry (DXA) and CMD risk markers. Researchers employed three measurements to evaluate the accuracy of adiposity signs as predictors of CMD risk factors: a Receiver Operating Characteristics (ROC) curve, an area under the curve (AUC), and a Spearman partial correlation coefficient. The study discovered that DXA-derived adiposity indicators were superior to conventional anthropometric indicators as predictors of CMD risk. Particularly, CMD risk variables, including blood pressure, fasting glucose, triglyceride, HDL cholesterol, and HbA1c levels, were strongly correlated with DXA-derived markers like visceral adipose tissue, trunk fat, android fat, gynoid fat, and total body fat mass. These findings have important implications for public health interventions in Qatar. More than 70% of adults are overweight or obese. Identifying individuals at higher risk of developing CMDs using more accurate adiposity assessment methods like DXA scans rather than relying on traditional anthropometric measures alone can help healthcare providers develop targeted interventions to prevent or manage these diseases [19].
Research by Ashraf S. et al. (2021) sought to create an anthropometric prediction equation for visceral adiposity in people with spinal cord damage (SCI). The study highlights the important physical and SCI-related elements that affect how visceral adipose tissue (VAT) is distributed in people with SCI. The review looks at the variables that affect visceral adiposity in people with SCI and suggests that WC can work as a stand-in marker for central obesity, CMD, and associated illnesses. The study suggests that WC can be a valuable tool for healthcare providers to identify those at risk of developing central obesity and associated health complications. Prior research relied on expensive imaging. Various imaging methods, such as computed tomography (CT), MRI, DXA, and ultrasound scanning, were utilized in the techniques employed in the study. Due to their high price and restricted availability, these procedures are not usually practicable or feasible for frequent clinical usage. The study shows that WC can be a helpful tool for healthcare providers to identify those at risk of developing central obesity and related health complications, even though there is currently no SCI-specific WC or AC cut-off value to predict VAT and diagnose people at risk of central obesity, CMS, and cardiovascular disease after SCI [20].
Research was conducted in 2022 to evaluate the potential association between adult persons’ body MRI-based measurements of adipose tissue distribution and brain ages. The study used cross-sectional and longitudinal methodologies to investigate the relationships between follow-up adipose measurements and brain age gap (BAG) measurements. A subgroup of 286 people, aged between 19 and 86, who made up the study’s total of 790 participants, supplied cross-sectional body MRI data. The estimation of tissue-specific brain aging at two time periods and research into the relationships between adipose measurements and BAG were carried out using Bayesian multilevel modeling. The study also examined the cross-sectional relationships among tissue-specific BAG, comprehensive measurements of adipose tissue (body composition), and traditional anthropometric measurements (BMI and WHR) that were applied in a previous investigation. The study’s findings suggest that there is a relationship between adipose tissue distribution measurements and brain aging. However, the study’s follow-up sample size was somewhat small, which reduced the study’s statistical power. This must be taken into consideration. The body MRI data were only collected at the follow-up examination, which further reduced the statistical power of the inquiry. The subsequent loss of statistical power is demonstrated by the posterior distributions for the breadth of the body MRI, which exhibit higher levels of uncertainty than the BMI and WHR, both of which were available longitudinal measures with larger sample sizes. In conclusion, this research sheds important light on the potential relationship between adult individuals’ body MRI-determined adipose tissue distributions and brain ages [21].
To improve prediction accuracy, a new study was conducted to evaluate the drawbacks of the current risk prediction models (RPMs) for CVDs and suggested using alternative machine learning-based RPMs. The research involved testing and comparing multiple machine learning models to the traditional logistic regression analysis (LRA) model using a dataset of 460 participants in Pakistan. In addition to identifying a significantly diverse order of features, the results demonstrate that ML-based RPMs, such as artificial neural networks and linear Support Vector Machines, beat LRA in terms of prediction accuracy and discriminating capacities. The study concluded that nonlaboratory characteristics can be good substitutes for low-cost RPMs in low–middle-income nations and that tailored and localized RPMs should be favored for the exact assessment of CVD risk. However, for a significant increase in performance matrices, bigger and more complex datasets are required. The study’s findings generally imply that ML-based RPMs can enhance the functionality of current models and uncover hidden feature behavior [22].
Another study by Guarneros-Nolasco LR. et al. (2021) explored the application of machine learning algorithms to the detection and prognosis of CVDs. The study compared ten distinct machine learning algorithms’ performance using two datasets for CVD diagnosis and two for CVD prediction. Using the train–test split approach and k-fold cross-validation, the study concentrated on the top two and top four attributes/features of the datasets regarding five performance measures, including accuracy, precision, recall, F1-score, and roc-auc. The findings demonstrate that MLAs function appropriately in terms of classification and prediction, particularly when it comes to the top two features, which indicate three key risk factors, such as arrhythmia and tachycardia, and that they may be utilized to enhance existing CVD diagnostic efforts. The findings of the study reveal that age, heart rate, and blood pressure are the most significant factors, while weight, cholesterol levels, smoking status, serum creatinine levels, ejection fraction, type of chest discomfort, number of affected arteries, platelet count, and obesity are ranked as secondary and tertiary factors in terms of their associations with the outcome of interest. According to the study, the risk variables can be employed for follow-up in the early detection of CVDs, such as arrhythmia or tachycardia, and for prompt and effective treatment when required. The report suggests that other medical databases should be used to replicate the study and that mobile applications for heart disease monitoring should be created utilizing the discovered risk variables [23].
Machine learning has been applied to predict the risk of heart disease using classifiers. The Cleveland Heart dataset was utilized for training ten distinct ML classifiers from various categories, and three attribute assessors were used to choose the most essential features. A 10-fold cross-validation testing option was used to assess the classifiers’ performance, and the hyperparameter “k” was tweaked to increase precision. Using the chi-squared attribute evaluator, the SMO classifier had the greatest prediction performance with an accuracy of 86.468%. The maximum ROC area of 0.91 was given by the meta-classifier bagging with Logistic Regression. The study concluded that proper attribute selection and hyperparameter tweaking may greatly enhance machine learning classifier performance when predicting the risk of heart disease. However, because of the tiny dataset and few feature selection techniques and machine learning algorithms’, the researchers are aware of the study’s limitations. As a result, they recommend further study that integrates several datasets to enhance the classifier’s prediction performance [24].
We reviewed previous related work and summarized their key points in Table 1.

Based on the existing literature, there is a need to explore the risk of obesity and overweight among adolescents in Saudi Arabia to achieve high accuracy.

None of the current studies have explored the use of the C index to predict the risk factors for CMD that contribute to the probability of developing overweight and obesity.

Integrating the fuzzy logic approach is crucial for predicting the “risk level”, as it possesses the ability to handle uncertainty in a manner closely resembling human reasoning. This integration offers a natural way to express the risk level, thereby improving the interpretability and applicability of the model in practical scenarios.

Therefore, in the current study, we aimed to build an artificial intelligence model to predict the likelihood of CMD among university students in Saudi Arabia who are overweight or obese based on various obesity indicators. The dataset used to train the model will be sourced and will consist of information pertaining to participants who are also enrolled as students at the university. The aim of the model was to identify the most significant obesity indicators and CMD risk factors that contribute to the likelihood of overweight and obesity and to develop a predictive model that could be used to screen university students for these health conditions. The model could also be used to create personalized intervention plans for university students in Saudi Arabia who have a high risk of being overweight and obese to promote healthy lifestyles and physical activity.

[ad_2]

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More