Search Results

Now showing 1 - 2 of 2
  • Article
    Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention
    (Wiley, 2025) Sutcu, Muhammed; Jouda, Dana; Yildiz, Baris; Katrib, Juliano; Almustafa, Khaled Mohamad
    Stroke is a major global health concern and a leading cause of disability and mortality, emphasizing the need for early risk prediction and intervention. This study leverages statistical analysis, machine learning (ML) classification, clustering, and survival modeling to identify key stroke predictors using a dataset of 5110 records. Descriptive statistics reveal that age, glucose levels, BMI, hypertension, and heart disease are the most influential risk factors. Stroke prevalence is notably higher among hypertensive (13.25%) and heart disease patients (17.03%), as well as among former (7.91%) and current smokers (5.32%). Clustering analysis using PCA and t-SNE highlights high-risk groups with elevated glucose levels and advanced age. Among ML models, XGBoost offers the best trade-off between precision and recall, while na & iuml;ve Bayes achieves the highest recall (0.404), detecting more stroke cases despite higher false positives. Feature importance analysis ranks glucose, BMI, and age as dominant predictors, with XGBoost emphasizing cardiovascular conditions. Survival analysis confirms increasing stroke risk beyond age 60, with the Kaplan-Meier and Cox models showing a 31.9% risk increase linked to hypertension. These findings underscore the importance of early screening, lifestyle intervention, and targeted care. Future research should explore data-balancing methods like SMOTE and develop real-time tools to support clinical decision-making.
  • Article
    Afthd: Bayesian Accelerated Failure Time Model for High-Dimensional Time-To Data
    (Springernature, 2025) Kumari, Pragya; Bhattacharjee, Atanu; Vishwakarma, Gajendra K.; Tank, Fatih
    Analyzing high-dimensional (HD) data with time-to-event outcomes poses a formidable challenge. The accelerated failure time (AFT) model, an alternative to the Cox proportional hazard model in survival analysis, lacks sufficient R packages for HD time-to-event data under the Bayesian paradigm. To address this gap, we develop the R package afthd. This tool facilitates advanced AFT modeling, offering Bayesian analysis for univariate and multivariable scenarios. This work includes diagnostic plots and an open-source R code for working with HD data, extending the conventional AFT model to the Bayesian framework of log-normal, Weibull, and log-logistic AFT models. The methodology is rigorously validated through simulation techniques, yielding consistent results across parametric AFT models. The application part is also performed on two different real HD liver cancer datasets, which reveals the proposed method's significance by obtaining inferences for survival estimates for the disease. Our developed package afthd is competent in working with HD time-to-event data using the conventional AFT model along with the Bayesian paradigm. Other aspects, like missing values in covariates within HD data and competing risk analysis, are also covered in this article.