Can we predict heart disease? Yes!
Knowledge of the risk factors associated with heart disease helps health care professionals to identify patients at high risk of having heart disease. The main objective of this project that I led on week 4, 5 and 6 at Metis New Economy Skills Training in New York - is to develop an Intelligent Heart Disease Prediction System that uses the patient’s diagnosis data to perform the prediction.
The dataset I looked at is publicly available from the University of California; in particular, 4 databases coming from the Hungarian Institute of Cardiology in Budapest, the University Hospitals of Zurich and Basel in Switzerland, as well as the V.A. Medical Center in Long Beach and the Cleveland Clinic Foundation in the USA.
Risk factors associated with heart disease proved to be age, blood pressure, smoking habit, total cholesterol, diabetes, family history of heart disease, obesity, lack of physical activity, etc. The attributes from each patient that I considered are described in this file and will be detailed in the code section below.
To build my prediction model, I used all supervised machine learning classifiers such as Logistic Regression, K Nearest Neighbor, Decision Trees, Random Forests, various Naive Bayes implementations as well as Support Vector Machines and Generalized Linear Models (using Poisson and Ordinal regressions). I also tried deep learning techniques such as Neural Networks and the Restricted Boltzmann Machine. On the other hand, I applied feature selection and feature extraction techniques in order to improve my model.
The metrics that I wanted to optimize are Precision and Recall. The Precision is the ratio of people that actually develop heart disease out of those the model says will. A precision of 50% means only half those the model says will develop heart disease actually develop it. We need a high Precision in order to avoid predicting heart disease to healthy people!