13 Dec 2020
Predict Future Sales Week 2
Current Competition:
- Predict futures sales
- Timeline: 2 months(01/20/2021)
- Current progress(leaderboard): 0.986 rmse.
DOING
- ETA: 12/20/20
- Re-implement the “Future Sale 3” notebook
- Future Sales 3
- Things to do:
- Data Cleanning
- Same shop but have different id (USED)
- OPTIONAL: Mis-spell in shop name and item category
- OPTIONAL: Item name cleaning
- Features Engineering
- Extract city/category from shop_name
- Lag item_cnt_month(mean)
- month/shop (USED)
- month/item (USED)
- month/shop/item (USED)
- month/city (USED)
- month/city/item (USED)
- month/shop/type (USED)
- price
- average price for item
- lag price for month/item
- delta … lag_price/avg_price (USED)
- revenue
- lag total renveue for month/shop
- delta … lag_revenue/avg_revenue (USED)
- Date:
- Month (USED)
- Days (USED)
- Months of first item/shop (USED)
- Months of first item (USED)
- Basic feature:
- Item_id (USED)
- Shop_id (USED)
- Date_block_num (USED) Note: Switching to google colab from kaggle notebook for more consistent performance.
Result:
- Current baseline model: ?(public score)
- model: single xgb with no hypertuning … 0.911 pls
- completed: 12/18/2020
Next:
- Re-implement the “top 2…” notebook
- Two step model(classification –> regression) … no improvement
- Error analysis
- Ensemble