Can we predict if a person will develop diabetes based on their Pregnancy Count, Glucose level, Blood Pressure, Skin Thickness, Insulin, BMI, DiabetesPedigreeFunction and Age?
Summary of the diabetes data
There are 768 data items in the diabetes data.
We will need to split the 768 data items into a training set and a testing set.
The "Outcome" column allows us to see which people have been diagnosed with diabetes.
I removed the Outcome column from the training set (using "pop").
Splitting data into training and testing dataframes
Then I normalized the feature columns.
(Notice that one way to normalize numeric data is to calculate the mean of each column and then replacing the original value in each cell with the number of standard deviations the original value was from the mean of that column.)
The training data X
I trained a sklearn model using the normalized features in the training set and the labels in the training set.
The actual and predicted results are shown below
Training the sklearn model and predicting results