ScoreCast – Cricket Score Prediction
A Flask-based web application that predicts the final T20 cricket score of a batting team given the current match situation, built with Python, scikit-learn, XGBoost, and deployed via Flask.
- Select batting team, bowling team, and venue (city) via dropdowns.
- Input current score, overs completed (e.g.,
18.2for 18 overs and 2 balls), wickets fallen, and runs scored in the last 5 overs. - Dynamic country flags update when teams are selected.
- Real-time prediction of final team score using a trained XGBoost regression model.
- Backend: Python 3.x, Flask
- Data Processing & ML: Pandas, NumPy, scikit-learn, XGBoost
- Templating: Jinja2 (Flask)
- Model Persistence:
pickle - Frontend: HTML, CSS, JavaScript
- Source: Collected ball-by-ball data from T20 international matches (post-2021 World Cup up to 2024 venues).
- File:
T20_iore_info.csv - Columns:
match_id: Unique identifierbatting_team,bowling_team: Team namesball: Over.ball format (e.g.,5.3)runs: Runs scored on that deliveryplayer_dismissed: Dismissal info (if any)city: Venue city (missing for some rows)
- City Imputation & Filtering: Fill missing
cityvalues by extracting fromvenue, then keep cities with ≥600 match entries. - Current Score: Cumulative sum of
runsgrouped bymatch_id. - Balls Left: From
ballparse overs and balls; computeballs_left = 120 - (overs_completed*6 + balls_in_over). - Wickets Left: Cumulative count of dismissals; then
wickets_left = 10 - wickets_fallen. - Current Run Rate:
current_run_rate = (current_score * 6) / balls_bowled. - Last 5 Overs Runs: Rolling sum over last 30 balls per match.
- Final Runs: Total runs per match as the target variable.
Final features:
[batting_team, bowling_team, city,
current_score, balls_left, wickets_left,
current_run_rate, last_five]
Target: final_score (total runs).
- Train/Test Split: 80% train, 20% test (
random_state=42). - Pipeline:
- ColumnTransformer:
- OneHotEncode:
batting_team,bowling_team,city - Passthrough numeric features
- OneHotEncode:
- StandardScaler for numeric columns
- XGBoostRegressor with:
n_estimators=1000learning_rate=0.2max_depth=12random_state=1
- ColumnTransformer:
- Evaluation:
- R² Score: ~0.98
- Mean Absolute Error: ~1.7 runs
- Artifact: Serialized pipeline saved to
pipeline.pklviapickle.
-
Clone the repository:
git clone https://github.com/namaniisc/ScoreCast.git cd ScoreCast -
Create a virtual environment & activate:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Ensure dataset and model:
T20_iore_info.csvin project rootpipeline.pklin project root
-
Run the Flask app:
python app.py
-
Access in browser: Navigate to
http://localhost:5000. -
Make Predictions:
- Select teams and city.
- Enter current match stats.
- Click Predict Score to see the forecast.
cricket-score-predictor/
├── app.py # Flask application
├── pipeline.pkl # Serialized ML pipeline
├── T20_iore_info.csv # Raw dataset
├── models/ # (Optional) model artifacts
├── notebooks/ # Jupyter notebooks
│ └── train_model.ipynb
├── requirements.txt # Python dependencies
├── static/
│ └── images/ # Team flag images
└── templates/
└── index.html # Frontend template
- Integrate real-time data APIs (e.g., Cricbuzz) for live inputs.
- Experiment with advanced architectures (LSTM sequences, ensemble methods).
- Add confidence intervals to predictions.
- Dockerize for containerized deployment.
Contributions are welcome! Please:
- Fork the repo
- Create a feature branch (
git checkout -b feature/YourFeature) - Commit your changes (
git commit -m 'Add some feature') - Push to branch (
git push origin feature/YourFeature) - Open a Pull Request
