Amazon ML Hackathon 2025 | Team AVIS | Product Price Prediction | Rank 1693 |Total Registered:82,790 | SMAPE 51.4
This repository contains Team AVISβs solution for the Unstop ML Hackathon 2025, where the task was to predict product prices from structured and unstructured catalog data.
π Team Name: AVIS π Final SMAPE: 51.4 π₯ Leaderboard Rank: #1693 βοΈ Frameworks: LightGBM + Sentence Transformers (MiniLM-L6-v2) π§ Hardware: GPU-accelerated training on Google Colab
Project Overview
The goal was to build a regression model that accurately predicts product prices using both structured features (brand, quantity, unit) and unstructured text (titles, bullet points, and product descriptions).
Component Description Text Encoder SentenceTransformer β all-MiniLM-L6-v2 Model LightGBM (GPU, regression_l1 objective) Metric SMAPE (Symmetric Mean Absolute Percentage Error) Optimization Early stopping, feature scaling, lemmatization, unit normalization Model Architecture
Text Cleaning β remove emojis, punctuation, and stopwords
Embedding Generation β use SentenceTransformer (MiniLM-L6-v2)
Feature Fusion β combine embeddings + categorical + numeric features
Training β GPU-based LightGBM regressor
Evaluation β SMAPE metric
π Results Metric Score Validation SMAPE 47.43 Public Leaderboard SMAPE 51.4 Final Rank #1693 / 82,790 π§© Tech Stack
π Python 3.12
π‘ LightGBM (GPU)
π€ SentenceTransformers (MiniLM-L6-v2)
π§Ή NLTK for text preprocessing
π¦ scikit-learn, pandas, numpy, joblib
π§ͺ How to Run
!pip install -q lightgbm sentence-transformers emoji nltk
python amazon_price_prediction.ipynb
python inference_script.py --input test.csv --output predicted_prices.csv
Highlights
β Preprocessed 95K+ records combining structured and unstructured data β Generated 384-dimensional text embeddings using MiniLM β Optimized LightGBM with GPU acceleration β Achieved SMAPE 51.4 β Top 2% out of 82,790 global participants