Home » Articles posted by Jonathan Asanjarani

Author Archives: Jonathan Asanjarani

List of Tables

Table 1 Basic Descriptives of the cleveland training data Table 2 Variable descriptives based on heart disease presence Table 3 Correlation Statistic between individual variables and heart disease presence Table 4 Model Results Figure 1

Continue Reading →

Data Management Plan Overview

Our data management plan ensures the organized, secure, and ethical handling of all project data. We will acquire datasets from the UCI Machine Learning Repository and follow their terms of use. The data will be stored securely on a personal computer. We will document all data processing steps, including cleaning, transformation, and analysis, ensuring transparency […]

Continue Reading →

Digital References

Sex-Specific and Regional Analysis of Heart Disease Prediction Using Machine Learning Algorithms: Insights from the UCI Irvine Public Heart Disease Datasets (Cleveland and Long Beach)Jonathan AsanjaraniCity University of New York Graduate CenterDATA 79000: Capstone Project and ThesisAdvisor: Johanna DevaneyNovember 25th, 2024 Software and Tools Used Datasets Guidelines and Methodological References Additional Resources for Citing Software […]

Continue Reading →

A Note on Technical Specifications

This project used Google Collab as the development environment. Google Collab is a cloud-based Python platform providing access to GPUs for accelerated computation. Python (version 3.8) was used in the Google Collab environment, with additional libraries and frameworks included, such as Scikit-learn, XGBoost, Pandas, NumPy, Matplotlib, and Seaborn, as detailed in the References section. The […]

Continue Reading →

Data Dictionary

Sex-Specific and Regional Analysis of Heart Disease Prediction Using Machine LearningAlgorithms: Insights from the UCI Irvine Public Heart Disease Datasets (Cleveland and LongBeach)Jonathan AsanjaraniCity University of New York Graduate CenterDATA 79000: Capstone Project and ThesisAdvisor: Johanna DevaneySignificant Variables

Continue Reading →

Digital Manifest

Sex-Specific and Regional Analysis of Heart Disease Prediction Using Machine Learning Algorithms: Insights from the UCI Irvine Public Heart Disease Datasets (Cleveland and Long Beach)Jonathan AsanjaraniCity University of New York Graduate CenterDATA 79000: Capstone Project and ThesisAdvisor: Johanna Devaney Project Components 1. Capstone Report (Print and Digital) 2. Exploratory Data Analysis (EDA) Notebook 3. Machine […]

Continue Reading →

Discussion & Findings

Discussion Key findings:              My project leveraged the Cleveland and VA Long Beach datasets, in the “Heart Disease” database, which was donated to the UCI Machine Learning Repository to explore the binary classification of heart disease presence, using the available demographic and clinical features. Through exploratory data analysis (EDA), data cleaning, transformation experiments, and model […]

Continue Reading →

ASCVD (Atherosclerotic Cardiovascular Disease) Risk Score (Cleveland And VA Long Beach)

Atherosclerotic Cardiovascular Disease Risk Calculation on Cleveland Dataset The 2013 ASCVD (Atherosclerotic Cardiovascular Disease) risk score was evaluated on the Cleveland dataset, yielding key performance metrics. The score achieved an accuracy of 69.64%, indicating that approximately 70% of predictions matched actual outcomes. Precision was 63.58%, reflecting the proportion of correctly identified positive cases among all […]

Continue Reading →

Male Vs. Female

Is the Best Performing Models More Effective for Male vs. Female Population? The highest-performing models were identified in Experiment 2, showcasing robust predictive capabilities. The Random Forest classifier emerged as the top performer, achieving a mean accuracy of 88.33%, a mean precision of 91.79%, a mean recall of 82.00%, and a mean F1-score of 83.67%. […]

Continue Reading →

Transformation 3: Cleveland Only

Optimizing Feature Engineering In this third experiment, the focus is on enhancing the feature engineering component to improve model performance through targeted transformations. The following transformations were applied: (1) a logarithmic transformation for Resting Blood Pressure (trestbps) and Cholesterol (chol) to reduce skewness and stabilize variance; (2) a squared transformation of Maximum Heart Rate (thalach), […]

Continue Reading →