• Now, its time to explore the data to gain insights about it.
  1. Validation set— used to evaluate the model during training, tune model hyper-parameters (optimization technique, regularization etc.), and pick the best version of the model. Picking a good validation set is essential for training models that generalize well.
  2. Test set — used to compare different models or approaches and report the model’s final accuracy. For many datasets, test sets are provided separately. The test set should reflect the kind of data the model will encounter in the real-world, as closely as feasible.
  1. Scaling numerical features using the `scaler` created earlier
  2. Encoding categorical features using the `encoder` created earlier if need be

Results and Conclusion

  1. The logistic regression model accuracy score is 0.97067. So, the model does a very good job in predicting whether a tumor is “benign” (noncancerous) or “malignant” (cancerous)
  2. The model shows no signs of over-fitting.


The work done in this project is inspired from the following:

  1. The dataset was gotten from kaggle —



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store