13 Dec 2020

Predict Future Sales Week 2

Current Competition:

  • Predict futures sales
  • Timeline: 2 months(01/20/2021)
  • Current progress(leaderboard): 0.986 rmse.

DOING

  • ETA: 12/20/20
  • Re-implement the “Future Sale 3” notebook
    • Future Sales 3
    • Things to do:
      1. Data Cleanning
        • Same shop but have different id (USED)
        • OPTIONAL: Mis-spell in shop name and item category
        • OPTIONAL: Item name cleaning
      2. Features Engineering
        • Extract city/category from shop_name
        • Lag item_cnt_month(mean)
          1. month/shop (USED)
          2. month/item (USED)
          3. month/shop/item (USED)
          4. month/city (USED)
          5. month/city/item (USED)
          6. month/shop/type (USED)
        • price
          1. average price for item
          2. lag price for month/item
          3. delta … lag_price/avg_price (USED)
        • revenue
          1. lag total renveue for month/shop
          2. delta … lag_revenue/avg_revenue (USED)
        • Date:
          1. Month (USED)
          2. Days (USED)
          3. Months of first item/shop (USED)
          4. Months of first item (USED)
      3. Basic feature:
        • Item_id (USED)
        • Shop_id (USED)
        • Date_block_num (USED) Note: Switching to google colab from kaggle notebook for more consistent performance.

Result:

  • Current baseline model: ?(public score)
    • model: single xgb with no hypertuning … 0.911 pls
    • completed: 12/18/2020

Next:

  • Re-implement the “top 2…” notebook
    • Two step model(classification –> regression) … no improvement
    • Error analysis
    • Ensemble

Tags: