In this project, we applied advanced machine learning techniques to predict athletic performance outcomes based on various physical metrics from a synthetic dataset of 10,000 evaluation experiences. The objective was to properly identify high-value athletes—classified as top performers (properly identifying Class A or Class B) and optimize the prediction process to maximize financial outcomes for an athletic management company.
We used XGBoost, a highly efficient and flexible gradient boosting algorithm, as our primary model. This model was chosen due to its ability to handle large datasets, optimize performance through hyperparameter tuning, and achieve high accuracy by learning from decision trees. Our goal was not only to predict athlete classifications but to maximize the financial value of correct identifications for Class A and Class B, which represent the most promising candidates in terms of future performance and recruitment potential.
The project involved multiple stages, starting with feature engineering to refine the input data, followed by model training and validation. To further improve the model, we conducted hyperparameter tuning and experimented with threshold manipulation to enhance the precision and recall for the high-value classes. These techniques allowed us to fine-tune the model for greater predictive accuracy, particularly in identifying top performers, which carry a significant monetary value.
The final phase of the analysis involved a detailed profit evaluation, comparing the financial outcomes of using the machine learning model versus traditional methods for correctly identifying class A and B categories for athletes (as opposed to using professional coaches for athlete evaluation). This comparison demonstrated the cost-effectiveness and scalability of using machine learning models in place of high-cost human evaluations, proving that data-driven approaches can yield substantial financial gains without sacrificing accuracy.
This study serves as a compelling example of how machine learning can be leveraged in sports performance evaluation and decision-making, offering a scalable, high-accuracy alternative to more traditional methods. By systematically refining the model and focusing on key metrics such as precision, recall, and profitability, the project highlights the powerful intersection between data science and real-world impact in athletic performance assessment.