Search Results

Now showing 1 - 2 of 2
  • Article
    Predicting Stroke Risk Using Machine Learning: A Data-Driven Approach to Early Detection and Prevention
    (Wiley, 2025) Sutcu, Muhammed; Jouda, Dana; Yildiz, Baris; Katrib, Juliano; Almustafa, Khaled Mohamad
    Stroke is a major global health concern and a leading cause of disability and mortality, emphasizing the need for early risk prediction and intervention. This study leverages statistical analysis, machine learning (ML) classification, clustering, and survival modeling to identify key stroke predictors using a dataset of 5110 records. Descriptive statistics reveal that age, glucose levels, BMI, hypertension, and heart disease are the most influential risk factors. Stroke prevalence is notably higher among hypertensive (13.25%) and heart disease patients (17.03%), as well as among former (7.91%) and current smokers (5.32%). Clustering analysis using PCA and t-SNE highlights high-risk groups with elevated glucose levels and advanced age. Among ML models, XGBoost offers the best trade-off between precision and recall, while na & iuml;ve Bayes achieves the highest recall (0.404), detecting more stroke cases despite higher false positives. Feature importance analysis ranks glucose, BMI, and age as dominant predictors, with XGBoost emphasizing cardiovascular conditions. Survival analysis confirms increasing stroke risk beyond age 60, with the Kaplan-Meier and Cox models showing a 31.9% risk increase linked to hypertension. These findings underscore the importance of early screening, lifestyle intervention, and targeted care. Future research should explore data-balancing methods like SMOTE and develop real-time tools to support clinical decision-making.
  • Article
    A Proportional Hazards Mixture Cure Model for Subgroup Analysis: Inferential Method and an Application to Colon Cancer Data
    (MDPI, 2025) Liu, Kai; Balakrishnan, Narayanaswamy; Peng, Yingwei
    When determining subgroups with heterogeneous treatment effects in cancer clinical trials, the threshold of a variable that defines subgroups is often pre-determined by physicians based on their experience, and the optimality of the threshold is not well studied, particularly when the mixture cure rate model is considered. We propose a mixture cure model that allows optimal subgroups to be estimated for both the time to event for uncured subjects and the cure status. We develop a smoothed maximum likelihood method for the estimation of model parameters. An extensive simulation study shows that the proposed smoothed maximum likelihood method provides accurate estimates. Finally, the proposed mixture cure model is applied to a colon cancer study to evaluate the potential differences in the treatment effect of levamisole plus fluorouracil therapy versus levamisole alone therapy between younger and older patients. The model suggests that the difference in the treatment effect on the time to cancer recurrence for uncured patients is significant between patients younger than 67 and patients older than 67, and the younger patient group benefits more from the combined therapy than the older patient group.