Search Results

Now showing 1 - 3 of 3
  • Article
    Post-Hoc Mixture Models to eBLUPs from Linear Mixed-Effects Models: A Tractable Approach for Clustering Irregular Longitudinal Data
    (Taylor & Francis Ltd, 2026) Balakrishnan, N.; Hossain, Md Jobayer
    Clustering longitudinal data with irregular and sparse measurement schedules has become important in analyzing many medical data and associated decision-making. These datasets often involve observation times that vary across individuals, making trajectory-based analysis essential for uncovering meaningful patterns. Mixture-based linear mixed-effects models, such as heterogeneous linear mixed-effects models and growth mixture modeling, are commonly used for this purpose. While theoretically powerful, these methods often suffer from convergence issues and computational inefficiency in large-scale applications. This study introduces a computationally efficient two-step approach that applies a post-hoc mixture model to empirical Best Linear Unbiased Predictors (eBLUPs), derived from a fitted (piecewise) linear mixed-effects model under homogeneity assumptions. The method is then demonstrated with real clinical data, in which it effectively identified distinct growth trajectories in early childhood data involving 3,365 children across 51,711 clinic visits. The optimal number of clusters is then selected using the BIC, likelihood ratio tests, and model-based validation, achieving the best balance of model fit, classification stability, and interpretability. Simulation studies have shown that eBLUPs preserve individual-level heterogeneity and that post-hoc mixture modeling outperforms HLME across varying separability. Overall, this approach offers a robust, interpretable, and scalable alternative to traditional clustering methods for irregular longitudinal data.
  • Article
    Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention
    (Wiley, 2025) Sutcu, Muhammed; Jouda, Dana; Yildiz, Baris; Katrib, Juliano; Almustafa, Khaled Mohamad
    Stroke is a major global health concern and a leading cause of disability and mortality, emphasizing the need for early risk prediction and intervention. This study leverages statistical analysis, machine learning (ML) classification, clustering, and survival modeling to identify key stroke predictors using a dataset of 5110 records. Descriptive statistics reveal that age, glucose levels, BMI, hypertension, and heart disease are the most influential risk factors. Stroke prevalence is notably higher among hypertensive (13.25%) and heart disease patients (17.03%), as well as among former (7.91%) and current smokers (5.32%). Clustering analysis using PCA and t-SNE highlights high-risk groups with elevated glucose levels and advanced age. Among ML models, XGBoost offers the best trade-off between precision and recall, while na & iuml;ve Bayes achieves the highest recall (0.404), detecting more stroke cases despite higher false positives. Feature importance analysis ranks glucose, BMI, and age as dominant predictors, with XGBoost emphasizing cardiovascular conditions. Survival analysis confirms increasing stroke risk beyond age 60, with the Kaplan-Meier and Cox models showing a 31.9% risk increase linked to hypertension. These findings underscore the importance of early screening, lifestyle intervention, and targeted care. Future research should explore data-balancing methods like SMOTE and develop real-time tools to support clinical decision-making.
  • Article
    Citation - WoS: 1
    Citation - Scopus: 1
    Classifying the Who European Countries by Noncommunicable Diseases and Risk Factors
    (Elsevier Ireland Ltd, 2025) Bulut, Tevfik
    Background: In the twenty-first century, noncommunicable diseases (NCDs) are a major obstacle to global development and the accomplishment of the Sustainable Development Goals set forth by the United Nations. The WHO (World Health Organization) European Region lacks comprehensive understanding of NCD risk factors, the NCDs they trigger, and the more disadvantaged countries. Objective: This study aims to classify the countries in the European Region at the country level based on NCDs and their key risk factors. Methods: The Ward method, a hierarchical clustering technique based on Manhattan and Euclidean distance measures, was used. The study's dataset comes from the WHO's publicly available NCDs and key risk factors dataset. Results: The European region's countries have been categorized into two clusters based on key NCD risk factors. The second cluster consists of countries with high income levels. On the other hand, in the European Region, countries fall into three clusters based on NCDs. Countries in the third cluster, which consists of low- and upper- middle-income countries, have lower average values in four variables compared to other countries, resulting in lower overall disease prevalence. Counclusions: The prevalence of NCDs varies among clusters, with high-income countries having lower disease prevalence, particularly in diabetes and hypertension. Addressing risk factors and improving healthcare access and infrastructure are crucial in reducing the burden of NCDs in the European region.