Home » 2024 » December

Monthly Archives: December 2024

Discussion & Findings

Discussion Key findings:              My project leveraged the Cleveland and VA Long Beach datasets, in the “Heart Disease” database, which was donated to the UCI Machine Learning Repository to explore the binary classification of heart disease presence, using the available demographic and clinical features. Through exploratory data analysis (EDA), data cleaning, transformation experiments, and model […]

Continue Reading →

ASCVD (Atherosclerotic Cardiovascular Disease) Risk Score (Cleveland And VA Long Beach)

Atherosclerotic Cardiovascular Disease Risk Calculation on Cleveland Dataset The 2013 ASCVD (Atherosclerotic Cardiovascular Disease) risk score was evaluated on the Cleveland dataset, yielding key performance metrics. The score achieved an accuracy of 69.64%, indicating that approximately 70% of predictions matched actual outcomes. Precision was 63.58%, reflecting the proportion of correctly identified positive cases among all […]

Continue Reading →

Male Vs. Female

Is the Best Performing Models More Effective for Male vs. Female Population? The highest-performing models were identified in Experiment 2, showcasing robust predictive capabilities. The Random Forest classifier emerged as the top performer, achieving a mean accuracy of 88.33%, a mean precision of 91.79%, a mean recall of 82.00%, and a mean F1-score of 83.67%. […]

Continue Reading →

Transformation 3: Cleveland Only

Optimizing Feature Engineering In this third experiment, the focus is on enhancing the feature engineering component to improve model performance through targeted transformations. The following transformations were applied: (1) a logarithmic transformation for Resting Blood Pressure (trestbps) and Cholesterol (chol) to reduce skewness and stabilize variance; (2) a squared transformation of Maximum Heart Rate (thalach), […]

Continue Reading →

Transformation 2: Cleveland and VA Long Beach

Optimizing Feature Engineering In this second experiment, the focus is on enhancing the feature engineering component to improve model performance through targeted transformations. The following transformations will be applied: (1) a logarithmic transformation for Resting Blood Pressure and Cholesterol to reduce skewness and stabilize variance; (2) a squared transformation of Maximum Heart Rate, which emphasizes […]

Continue Reading →

Transformation 1: Cleveland Only

Optimizing Feature Engineering  Two custom transformers are applied to the first transformation. The custom transformers, “Log Transformer” and “Square Transformer”, are defined using BaseEstimator and TransformerMixin to enable specialized transformations within a preprocessing pipeline. The “Log Transformer” applies a logarithmic transformation (log1p, which calculates log(x+1)) to specified columns, helping to reduce skewness and handle wide-ranging […]

Continue Reading →

Exploratory Data Analysis (EDA on X_train, Cleveland Only)

Basic Descriptives of the training set: Univariate analysis of the training set: The dataset reveals several key patterns about the participants and their heart health indicators. Most participants are middle-aged, falling between 55 and 65 years old, with males making up roughly two-thirds of the dataset. When it comes to those who experience chest pain, […]

Continue Reading →

Materials and Methods

Data The UCI Machine Learning Repository is a comprehensive resource that provides databases, domain theories, and data generators widely utilized by the machine learning community for evaluating models. For the this project, I utilized the database titled “Heart Disease” available in the UCI machine Learning Repository. The “Heart Disease” database from the UCI Machine Learning […]

Continue Reading →

Literature Review

Introduction The central focus of my capstone project is to explore the effectiveness of machine learning models in predicting heart disease and assess its ability to generalize across different cities and biological sexes. This research highlights the importance of building models that not only achieve high accuracy within a specific dataset or geographic location but […]

Continue Reading →

Abstract

For this capstone project, I investigated how well machine learning models can predict heart disease, while also studying how the patient’s gender affects these predictions, as well as determining how well the same model performs across different regions. This project utilizes two clinical datasets from the publicly accessible UCI Machine Learning Repository under the collection […]

Continue Reading →