Regression Problems – Diabetes Dataset

This repository contains a regression analysis on the Diabetes dataset from scikit-learn, implemented as part of a university assignment.

Project Overview

The goal of this project is to compare multiple regression models for predicting disease progression using tabular medical data. The models are evaluated using 6-fold cross-validation and multiple regression performance metrics.

Models Implemented

Random Forest Regressor
Support Vector Regressor (SVR)
k-Nearest Neighbors (KNN) Regressor
Gaussian Process Regressor

Evaluation Methodology

6-fold K-Fold cross-validation
Evaluation on the test set only for model comparison
Performance metrics:
- RMSE (Root Mean Squared Error)
- MAE (Mean Absolute Error)
- Max Error
- MAPE (Mean Absolute Percentage Error)

Explainability (SHAP)

SHAP (SHapley Additive exPlanations) is used to interpret the predictions of the selected models.

SHAP summary plots
SHAP waterfall plots (for representative test samples)

How to Run

Recommended: Run on Google Colab

The easiest way to run this project is through Google Colab, as the notebook is fully self-contained and requires no local setup.

Click the “Open in Colab” button at the top of this README.
Run all cells sequentially.
(Optional) To export the results locally, uncomment the CSV download lines at the end of the notebook and re-run it.

This option is recommended for quick experimentation and reproducibility.

Alternative: Run Locally

You may also download the notebook and run it locally using Jupyter Notebook.

Steps:

Download the file diabetes_regression.ipynb from this repository.
Ensure the required Python libraries are installed (e.g. numpy, pandas, scikit-learn, matplotlib, shap).
Open the notebook and run all cells.

Note: Running locally may require additional setup compared to Google Colab.

Repository Structure

diabetes_regression.ipynb

regression_results.csv

README.md

The file regression_results.csv contains the aggregated results from all folds, models, and datasets (train/test).

Note:

The CSV file is generated by the notebook.
To regenerate it, uncomment the last lines at the end of diabetes_regression.ipynb and re-run the notebook.

Generated Results (CSV)

The file regression_results.csv contains the aggregated evaluation results for all regression models, including both training and test sets across all folds.

This file is generated automatically by the notebook during execution.

Reproducibility

A fixed random seed is used to ensure reproducible results. Feature scaling and preprocessing are applied within each fold to prevent data leakage. The dataset is loaded directly from the scikit-learn library.

Academic Context

This project was developed as part of a university assignment for the course Machine Learning at the University of Macedonia.

Disclaimer

This repository is intended strictly for educational purposes. The implementation and results should not be considered as medical or clinical advice.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
diabetes_regression.ipynb		diabetes_regression.ipynb
regression_results.csv		regression_results.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Regression Problems – Diabetes Dataset

Project Overview

Models Implemented

Evaluation Methodology

Explainability (SHAP)

How to Run

Recommended: Run on Google Colab

Alternative: Run Locally

Repository Structure

diabetes_regression.ipynb

regression_results.csv

README.md

Note:

Generated Results (CSV)

Reproducibility

Academic Context

Disclaimer

About

Uh oh!

Releases

Packages

Languages

Elarios77/regression-problems

Folders and files

Latest commit

History

Repository files navigation

Regression Problems – Diabetes Dataset

Project Overview

Models Implemented

Evaluation Methodology

Explainability (SHAP)

How to Run

Recommended: Run on Google Colab

Alternative: Run Locally

Repository Structure

diabetes_regression.ipynb

regression_results.csv

README.md

Note:

Generated Results (CSV)

Reproducibility

Academic Context

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages