Heart Failure Analysis

Overview

This project focuses on analyzing heart failure data and predicting patient outcomes based on clinical features. Using machine learning and data visualization techniques, we explore key factors influencing heart failure.

Step 1: Understanding the Dataset

The dataset comes from the UCI Machine Learning Repository, which provides open-source datasets for various machine learning tasks.

Dataset Overview

The dataset consists of 299 patient records with 13 clinical features, collected during the follow-up period.

Clinical Features:
  • Age: Patient’s age (years)
  • Anaemia: Decrease in red blood cells or hemoglobin (boolean)
  • High Blood Pressure: Whether the patient has hypertension (boolean)
  • Creatinine Phosphokinase (CPK): Level of the CPK enzyme in the blood (mcg/L)
  • Diabetes: Whether the patient has diabetes (boolean)
  • Ejection Fraction: Percentage of blood leaving the heart per contraction (percentage)
  • Platelets: Platelet count in the blood (kiloplatelets/mL)
  • Sex: Patient’s gender (binary)
  • Serum Creatinine: Level of serum creatinine in the blood (mg/dL)
  • Serum Sodium: Level of serum sodium in the blood (mEq/L)
  • Smoking: Whether the patient smokes (boolean)
  • Time: Follow-up period (days)
  • [Target] Death Event: Whether the patient died during the follow-up (boolean)

Step 2: Data Preprocessing

Before training models, the dataset needs to be cleaned and prepared.

Preprocessing Steps:

  • Handle missing values and standardize data types.
  • Normalize numerical features for better model performance.
  • Convert categorical features into appropriate encodings.

Step 3: Exploratory Data Analysis (EDA)

Understanding patterns in the dataset through visualizations.

Key Insights:

  • Investigate relationships between clinical features and heart failure.
  • Analyze correlations to identify high-risk factors.
  • Create summary statistics and graphical representations.

Step 4: Model Training & Evaluation

Applying machine learning models to predict heart failure.

Models Used:

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine (SVM)
  • Neural Networks (optional)

Performance Metrics:

  • Accuracy, Precision, Recall, and F1-score
  • ROC Curve and AUC score for model evaluation

Step 5: Data Visualization

Enhancing interpretability through visual analytics.

Tools Used:

  • Matplotlib & Seaborn: Data distribution and feature importance
  • Pandas & NumPy: Data preprocessing and statistical analysis
  • Plotly: Interactive visualizations for better insights

Conclusion

This project provides valuable insights into heart failure risks by analyzing patient data and predicting potential outcomes. By applying machine learning techniques, we aim to identify key predictors of heart failure.

Future Improvements

  • Expand dataset with real-time hospital data.
  • Implement deep learning models for enhanced accuracy.
  • Explore personalized treatment recommendations based on patient profiles.

Contributions are welcome! Feel free to fork this project, suggest improvements, or apply new machine learning models.