{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "40e0a986",
   "metadata": {},
   "source": [
    "# 🏆 Mitsui Commodity Prediction Challenge - Complete Template\n",
    "\n",
    "This notebook provides a comprehensive template that combines both **TensorFlow** and **XGBoost** approaches for commodity price prediction.\n",
    "\n",
    "## 🎯 Competition Overview\n",
    "This is a **time series prediction challenge** where I try to predict commodity price movements using:\n",
    "- **Commodity futures data** (Gold, Platinum, Copper, etc.)\n",
    "- **FX exchange rates** (USD/JPY, EUR/USD, etc.)  \n",
    "- **US Stock prices** (Energy, Materials sectors)\n",
    "- **Lagged target values** (previous predictions)\n",
    "\n",
    "## 🧠 Model Strategy\n",
    "1. **XGBoost**: Excellent for structured/tabular data, highly interpretable\n",
    "2. **TensorFlow**: Powerful for sequence patterns and complex interactions\n",
    "3. **Ensemble**: Combine both models for better performance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a5e147d5",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2025-08-13 10:32:13.309013: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
      "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
      "E0000 00:00:1755073933.328074   25286 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
      "E0000 00:00:1755073933.333572   25286 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
      "2025-08-13 10:32:13.355227: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
      "To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TensorFlow version: 2.18.1\n",
      "[name: \"/device:CPU:0\"\n",
      "device_type: \"CPU\"\n",
      "memory_limit: 268435456\n",
      "locality {\n",
      "}\n",
      "incarnation: 7255049176489413147\n",
      "xla_global_id: -1\n",
      ", name: \"/device:GPU:0\"\n",
      "device_type: \"GPU\"\n",
      "memory_limit: 3758096384\n",
      "locality {\n",
      "  bus_id: 1\n",
      "  links {\n",
      "  }\n",
      "}\n",
      "incarnation: 16489760313110061802\n",
      "physical_device_desc: \"device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6\"\n",
      "xla_global_id: 416903419\n",
      "]\n",
      "XGBoost version: 2.1.4\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "I0000 00:00:1755073938.858514   25286 gpu_device.cc:2022] Created device /device:GPU:0 with 3584 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6\n",
      "/home/asier/miniconda3/envs/tf-gpu/lib/python3.10/site-packages/xgboost/core.py:158: UserWarning: [10:32:18] WARNING: /home/conda/feedstock_root/build_artifacts/xgboost-split_1744329155408/work/src/common/error_msg.cc:45: `gpu_id` is deprecated since2.0.0, use `device` instead. E.g. device=cpu/cuda/cuda:0\n",
      "  warnings.warn(smsg, UserWarning)\n",
      "/home/asier/miniconda3/envs/tf-gpu/lib/python3.10/site-packages/xgboost/core.py:158: UserWarning: [10:32:18] WARNING: /home/conda/feedstock_root/build_artifacts/xgboost-split_1744329155408/work/src/common/error_msg.cc:27: The tree method `gpu_hist` is deprecated since 2.0.0. To use GPU training, set the `device` parameter to CUDA instead.\n",
      "\n",
      "    E.g. tree_method = \"hist\", device = \"cuda\"\n",
      "\n",
      "  warnings.warn(smsg, UserWarning)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "GPU support: True\n",
      "CUDA support in build: True\n",
      "NCCL support in build: True\n",
      "Available tree methods: ['exact', 'hist', 'approx', 'gpu_hist']\n",
      "✅ Libraries imported successfully!\n",
      "TensorFlow version: 2.18.1\n",
      "XGBoost version: 2.1.4\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/asier/miniconda3/envs/tf-gpu/lib/python3.10/site-packages/xgboost/core.py:158: UserWarning: [10:32:19] WARNING: /home/conda/feedstock_root/build_artifacts/xgboost-split_1744329155408/work/src/common/error_msg.cc:27: The tree method `gpu_hist` is deprecated since 2.0.0. To use GPU training, set the `device` parameter to CUDA instead.\n",
      "\n",
      "    E.g. tree_method = \"hist\", device = \"cuda\"\n",
      "\n",
      "  warnings.warn(smsg, UserWarning)\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "\n",
    "from keras import layers, regularizers\n",
    "from tensorflow import keras\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import os\n",
    "import warnings\n",
    "import logging\n",
    "from pathlib import Path\n",
    "import polars as pl\n",
    "from sklearn.metrics import mean_absolute_error, mean_squared_error\n",
    "\n",
    "from sklearn.preprocessing import  RobustScaler\n",
    "import xgboost as xgb\n",
    "import joblib\n",
    "\n",
    "\n",
    "print(\"TensorFlow version:\", tf.__version__)\n",
    "\n",
    "from tensorflow.python.client import device_lib\n",
    "print(device_lib.list_local_devices())\n",
    "\n",
    "\n",
    "print(\"XGBoost version:\", xgb.__version__)\n",
    "\n",
    "# Proper GPU support check\n",
    "def check_xgboost_gpu_support():\n",
    "    \"\"\"Check if XGBoost has GPU support available\"\"\"\n",
    "    try:\n",
    "        # Method 1: Try to create a DMatrix and use gpu_hist\n",
    "        dtrain = xgb.DMatrix(np.array([[1, 2], [3, 4]]), label=[1, 0])\n",
    "        params = {'tree_method': 'gpu_hist', 'gpu_id': 0, 'objective': 'binary:logistic'}\n",
    "        \n",
    "        # Try to train a small model with GPU\n",
    "        xgb.train(params, dtrain, num_boost_round=1, verbose_eval=False)\n",
    "        return True\n",
    "        \n",
    "    except Exception as e:\n",
    "        # Check specific error messages\n",
    "        error_str = str(e).lower()\n",
    "        if 'gpu_hist' in error_str or 'cuda' in error_str or 'gpu' in error_str:\n",
    "            return False\n",
    "        return False\n",
    "\n",
    "# Check GPU support\n",
    "gpu_available = check_xgboost_gpu_support()\n",
    "print(\"GPU support:\", gpu_available)\n",
    "\n",
    "# Alternative method - check build info\n",
    "try:\n",
    "    build_info = xgb.build_info()\n",
    "    if 'USE_CUDA' in build_info:\n",
    "        print(\"CUDA support in build:\", build_info['USE_CUDA'])\n",
    "    if 'USE_NCCL' in build_info:\n",
    "        print(\"NCCL support in build:\", build_info['USE_NCCL'])\n",
    "except:\n",
    "    print(\"Build info not available\")\n",
    "\n",
    "# Check available tree methods\n",
    "try:\n",
    "    # This will show available tree methods\n",
    "    dtrain = xgb.DMatrix(np.array([[1, 2]]), label=[1])\n",
    "    \n",
    "    available_methods = []\n",
    "    for method in ['exact', 'hist', 'approx', 'gpu_hist']:\n",
    "        try:\n",
    "            params = {'tree_method': method, 'objective': 'reg:squarederror'}\n",
    "            xgb.train(params, dtrain, num_boost_round=1, verbose_eval=False)\n",
    "            available_methods.append(method)\n",
    "        except:\n",
    "            pass\n",
    "    \n",
    "    print(\"Available tree methods:\", available_methods)\n",
    "    \n",
    "except Exception as e:\n",
    "    print(\"Could not check tree methods:\", e)\n",
    "# Import Libraries and Setup\n",
    "\n",
    "\n",
    "# Set up logging and suppress warnings\n",
    "logging.basicConfig(level=logging.INFO)\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Set random seeds for reproducibility\n",
    "np.random.seed(42)\n",
    "tf.random.set_seed(42)\n",
    "\n",
    "# Set pandas display options\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('display.max_rows', 100)\n",
    "\n",
    "print(\"✅ Libraries imported successfully!\")\n",
    "print(f\"TensorFlow version: {tf.__version__}\")\n",
    "print(f\"XGBoost version: {xgb.__version__}\")\n",
    "\n",
    "# Define paths\n",
    "# DATA_PATH = Path(\"/kaggle/input/mitsui-commodity-prediction-challenge\")  # Kaggle path\n",
    "DATA_PATH = Path(\".\")  # Adjust this path as needed\n",
    "TRAIN_CSV = DATA_PATH / \"train.csv\"\n",
    "TEST_CSV = DATA_PATH / \"test.csv\"\n",
    "TRAIN_LABELS_CSV = DATA_PATH / \"train_labels.csv\"\n",
    "TARGET_PAIRS_CSV = DATA_PATH / \"target_pairs.csv\"\n",
    "LAGGED_DIR = DATA_PATH / \"lagged_test_labels\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec417e00",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "▶ Execution mode: fast\n"
     ]
    }
   ],
   "source": [
    "# Execution mode flag: set to 'fast' for quick iteration or 'full' for submission training\n",
    "RUN_MODE = 'fast'  # choices: 'fast', 'full'\n",
    "assert RUN_MODE in ('fast', 'full')\n",
    "print(f\"▶ Execution mode: {RUN_MODE}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a7daad9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🚀 LOADING MODEL VERSION: V2\n",
      "==================================================\n",
      "🎯 V2: HYPERPARAMETER TUNED (JSON load)\n",
      "Source: Optuna optimization artifacts from 01 notebook\n",
      "\n",
      "✅ Loaded tuned hyperparameters from optimized_hyperparameters_V3_20250812_102901.json\n",
      "   Ensemble tuned XGB weight: 0.685\n",
      "🎯 ACTIVE CONFIGURATION - V2\n",
      "========================================\n",
      "🧠 TensorFlow Settings:\n",
      "  Architecture: 122 → 96 → 28 → 424 outputs\n",
      "  Dropout: 0.494\n",
      "  Learning Rate: 0.001673\n",
      "  L2 Regularization: 0.002534\n",
      "  Batch Size: 64\n",
      "  Epochs: 100\n",
      "  Architecture Type: CONSERVATIVE (Low overfitting risk)\n",
      "\n",
      "🌳 XGBoost Settings:\n",
      "  Max Depth: 5\n",
      "  Learning Rate: 0.100819\n",
      "  N Estimators: 192\n",
      "  Subsample: 0.659\n",
      "  ColSample ByTree: 0.919\n",
      "  L1 Regularization (alpha): 0.953074\n",
      "  L2 Regularization (lambda): 0.907725\n",
      "\n",
      "✅ Model Version V2 loaded successfully!\n",
      "\n",
      "📊 VERSION COMPARISON\n",
      "============================================================\n",
      "│ Metric          │ V1 (Conservative) │ V2 (Tuned)    │ V3 (Experimental) │\n",
      "├─────────────────┼───────────────────┼───────────────┼───────────────────┤\n",
      "│ TF Learning Rate│ 0.0001            │ 0.000242      │ 0.0001            │\n",
      "│ TF Dropout      │ 0.5               │ 0.258         │ 0.3               │\n",
      "│ TF Layer 1      │ 1024              │ 768           │ 1536              │\n",
      "│ TF Epochs       │ 50                │ 100           │ 100               │\n",
      "│ XGB Max Depth   │ 4                 │ 3             │ 5                 │\n",
      "│ XGB Learn Rate  │ 0.05              │ 0.0208        │ 0.01              │\n",
      "│ XGB N Trees     │ 200               │ 400           │ 1000              │\n",
      "│ Focus           │ Stability         │ Performance   │ Max Capacity      │\n",
      "└─────────────────┴───────────────────┴───────────────┴───────────────────┘\n",
      "\n",
      "🎮 TO SWITCH VERSIONS:\n",
      "Change MODEL_VERSION = \"V2\" to:\n",
      "  • \"V1\" for Conservative (stable, less overfitting)\n",
      "  • \"V2\" for Tuned (best known performance JSON-loaded)\n",
      "  • \"V3\" for Experimental (maximum capacity)\n",
      "Then restart the kernel and run all cells.\n"
     ]
    }
   ],
   "source": [
    "# 🔧 MODEL VERSION CONTROL SYSTEM\n",
    "MODEL_VERSION = \"V2\"  # Options: \"V1\" (Conservative), \"V2\" (Tuned), \"V3\" (Future versions)\n",
    "\n",
    "print(f\"🚀 LOADING MODEL VERSION: {MODEL_VERSION}\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "# Proper loading of tuned hyperparameters: use JSON artifacts produced by optimization (01 notebook)\n",
    "# Format: flattened keys like xgb_max_depth, tf_units_1, ensemble_xgb_weight, etc.\n",
    "# Files live in ./hparams: optimized_hyperparameters_<LABEL>.json, versioned timestamped copies, plus latest.json pointer.\n",
    "\n",
    "def _load_best_params(label: str):\n",
    "    from pathlib import Path\n",
    "    try:\n",
    "        from src.hparams import (\n",
    "            resolve_hparams_by_label,\n",
    "            resolve_latest_hparams,\n",
    "            load_hparams,\n",
    "        )\n",
    "    except ImportError:\n",
    "        print(\"⚠️ hparams module not found; falling back to static defaults.\")\n",
    "        return None\n",
    "    hp_dir = Path(\"./hparams\")\n",
    "    # Try label-specific file first\n",
    "    path = resolve_hparams_by_label(hp_dir, label) if hp_dir.exists() else None\n",
    "    if path is None:\n",
    "        # Fallback: latest.json pointer\n",
    "        path = resolve_latest_hparams(hp_dir)\n",
    "    if path and path.exists():\n",
    "        try:\n",
    "            params = load_hparams(path)\n",
    "            print(f\"✅ Loaded tuned hyperparameters from {path.name}\")\n",
    "            return params\n",
    "        except Exception as e:\n",
    "            print(f\"⚠️ Failed to load hyperparameters ({e}); using static defaults.\")\n",
    "    else:\n",
    "        print(\"ℹ️ No tuned hyperparameter file found (expected in ./hparams). Using static defaults.\")\n",
    "    return None\n",
    "\n",
    "# --- VERSION BRANCHES ---------------------------------------------------------\n",
    "if MODEL_VERSION == \"V1\":\n",
    "    print(\"📊 V1: CONSERVATIVE APPROACH\")\n",
    "    print(\"Recommended for: 2720 features → 424 targets\")\n",
    "    print(\"Focus: Stability, reduced overfitting risk\")\n",
    "    print()\n",
    "    \n",
    "    # 🧠 V1 TensorFlow Parameters (Conservative)\n",
    "    SHUFFLE_BUFFER = 50000\n",
    "    BATCH_SIZE = 64\n",
    "    EPOCHS = 50\n",
    "    DROPOUT = 0.5         # Higher dropout for regularization\n",
    "    L2_STRENGTH = 0.001   # Stronger L2 regularization\n",
    "    LEARNING_RATE = 0.0001  # Lower learning rate for stability\n",
    "    UNITS_DENSE_LAYER = 1024  # Conservative size\n",
    "    DENSE_LAYERS = 3      # Three hidden layers\n",
    "    UNITS_LAYER_1 = 1024  # Reduced from potential overfitting\n",
    "    UNITS_LAYER_2 = 512   # Tapering down\n",
    "    UNITS_LAYER_3 = 424   # Close to output size\n",
    "    \n",
    "    # 🌳 V1 XGBoost Parameters\n",
    "    XG_MAX_DEPTH = 4\n",
    "    XG_LEARNING_RATE = 0.05\n",
    "    XG_N_ESTIMATORS = 200\n",
    "    XG_SUBSAMPLE = 0.8\n",
    "    XG_COLSAMPLE_BYTREE = 0.8\n",
    "    XG_REG_ALPHA = 0.1\n",
    "    XG_REG_LAMBDA = 0.1\n",
    "\n",
    "elif MODEL_VERSION == \"V2\":\n",
    "    print(\"🎯 V2: HYPERPARAMETER TUNED (JSON load)\")\n",
    "    print(\"Source: Optuna optimization artifacts from 01 notebook\")\n",
    "    print()\n",
    "    best_params = _load_best_params(\"V2\")\n",
    "\n",
    "    # Static defaults (will be overwritten if best_params present)\n",
    "    defaults_tf = dict(\n",
    "        SHUFFLE_BUFFER=50000,\n",
    "        BATCH_SIZE=32,\n",
    "        EPOCHS=100,\n",
    "        DROPOUT=0.2575200354740318,\n",
    "        L2_STRENGTH=0.00663529191505376,\n",
    "        LEARNING_RATE=0.00024154541141515264,\n",
    "        UNITS_LAYER_1=768,\n",
    "        UNITS_LAYER_2=512,\n",
    "        UNITS_LAYER_3=192,\n",
    "    )\n",
    "    defaults_xgb = dict(\n",
    "        XG_MAX_DEPTH=3,\n",
    "        XG_LEARNING_RATE=0.02078118701762872,\n",
    "        XG_N_ESTIMATORS=400,\n",
    "        XG_SUBSAMPLE=0.6043956756975085,\n",
    "        XG_COLSAMPLE_BYTREE=0.7105039159566321,\n",
    "        XG_REG_ALPHA=0.07564898361977737,\n",
    "        XG_REG_LAMBDA=0.07177598190394643,\n",
    "    )\n",
    "    if best_params:\n",
    "        # Map flattened keys → notebook variable names\n",
    "        # TensorFlow\n",
    "        BATCH_SIZE = int(best_params.get('tf_batch_size', defaults_tf['BATCH_SIZE']))\n",
    "        EPOCHS = int(best_params.get('tf_epochs', defaults_tf['EPOCHS']))  # rarely tuned; fallback\n",
    "        DROPOUT = float(best_params.get('tf_dropout', defaults_tf['DROPOUT']))\n",
    "        L2_STRENGTH = float(best_params.get('tf_l2_reg', defaults_tf['L2_STRENGTH']))\n",
    "        LEARNING_RATE = float(best_params.get('tf_learning_rate', defaults_tf['LEARNING_RATE']))\n",
    "        UNITS_LAYER_1 = int(best_params.get('tf_units_1', defaults_tf['UNITS_LAYER_1']))\n",
    "        UNITS_LAYER_2 = int(best_params.get('tf_units_2', defaults_tf['UNITS_LAYER_2']))\n",
    "        UNITS_LAYER_3 = int(best_params.get('tf_units_3', defaults_tf['UNITS_LAYER_3']))\n",
    "        UNITS_LAYER_4 = int(best_params.get('tf_units_4', 0))\n",
    "        # Derive layer count\n",
    "        if 'tf_units_4' in best_params:\n",
    "            DENSE_LAYERS = 4\n",
    "        else:\n",
    "            DENSE_LAYERS = 3\n",
    "        SHUFFLE_BUFFER = defaults_tf['SHUFFLE_BUFFER']  # not tuned currently\n",
    "        UNITS_DENSE_LAYER = UNITS_LAYER_1\n",
    "        \n",
    "        # XGBoost\n",
    "        XG_MAX_DEPTH = int(best_params.get('xgb_max_depth', defaults_xgb['XG_MAX_DEPTH']))\n",
    "        XG_LEARNING_RATE = float(best_params.get('xgb_learning_rate', defaults_xgb['XG_LEARNING_RATE']))\n",
    "        XG_N_ESTIMATORS = int(best_params.get('xgb_n_estimators', defaults_xgb['XG_N_ESTIMATORS']))\n",
    "        XG_SUBSAMPLE = float(best_params.get('xgb_subsample', defaults_xgb['XG_SUBSAMPLE']))\n",
    "        XG_COLSAMPLE_BYTREE = float(best_params.get('xgb_colsample_bytree', defaults_xgb['XG_COLSAMPLE_BYTREE']))\n",
    "        XG_REG_ALPHA = float(best_params.get('xgb_reg_alpha', defaults_xgb['XG_REG_ALPHA']))\n",
    "        XG_REG_LAMBDA = float(best_params.get('xgb_reg_lambda', defaults_xgb['XG_REG_LAMBDA']))\n",
    "        ENSEMBLE_XGB_WEIGHT = float(best_params.get('ensemble_xgb_weight', 0.5))\n",
    "        print(f\"   Ensemble tuned XGB weight: {ENSEMBLE_XGB_WEIGHT:.3f}\")\n",
    "    else:\n",
    "        # Use defaults\n",
    "        SHUFFLE_BUFFER = defaults_tf['SHUFFLE_BUFFER']\n",
    "        BATCH_SIZE = defaults_tf['BATCH_SIZE']\n",
    "        EPOCHS = defaults_tf['EPOCHS']\n",
    "        DROPOUT = defaults_tf['DROPOUT']\n",
    "        L2_STRENGTH = defaults_tf['L2_STRENGTH']\n",
    "        LEARNING_RATE = defaults_tf['LEARNING_RATE']\n",
    "        UNITS_LAYER_1 = defaults_tf['UNITS_LAYER_1']\n",
    "        UNITS_LAYER_2 = defaults_tf['UNITS_LAYER_2']\n",
    "        UNITS_LAYER_3 = defaults_tf['UNITS_LAYER_3']\n",
    "        DENSE_LAYERS = 3\n",
    "        UNITS_DENSE_LAYER = UNITS_LAYER_1\n",
    "        XG_MAX_DEPTH = defaults_xgb['XG_MAX_DEPTH']\n",
    "        XG_LEARNING_RATE = defaults_xgb['XG_LEARNING_RATE']\n",
    "        XG_N_ESTIMATORS = defaults_xgb['XG_N_ESTIMATORS']\n",
    "        XG_SUBSAMPLE = defaults_xgb['XG_SUBSAMPLE']\n",
    "        XG_COLSAMPLE_BYTREE = defaults_xgb['XG_COLSAMPLE_BYTREE']\n",
    "        XG_REG_ALPHA = defaults_xgb['XG_REG_ALPHA']\n",
    "        XG_REG_LAMBDA = defaults_xgb['XG_REG_LAMBDA']\n",
    "        ENSEMBLE_XGB_WEIGHT = 0.5\n",
    "\n",
    "elif MODEL_VERSION == \"V3\":\n",
    "    print(\"🚀 V3: EXPERIMENTAL HIGH PERFORMANCE\")\n",
    "    print(\"Focus: Maximum capacity with advanced regularization\")\n",
    "    print(\"⚠️  Higher resource requirements\")\n",
    "    print()\n",
    "    \n",
    "    # 🧠 V3 TensorFlow Parameters (Experimental)\n",
    "    SHUFFLE_BUFFER = 100000\n",
    "    BATCH_SIZE = 16\n",
    "    EPOCHS = 100\n",
    "    DROPOUT = 0.3\n",
    "    L2_STRENGTH = 0.0001\n",
    "    LEARNING_RATE = 0.0001\n",
    "    UNITS_DENSE_LAYER = 1536\n",
    "    DENSE_LAYERS = 4\n",
    "    UNITS_LAYER_1 = 1536\n",
    "    UNITS_LAYER_2 = 1024\n",
    "    UNITS_LAYER_3 = 512\n",
    "    UNITS_LAYER_4 = 256\n",
    "    \n",
    "    # 🌳 V3 XGBoost Parameters (Experimental)\n",
    "    XG_MAX_DEPTH = 5\n",
    "    XG_LEARNING_RATE = 0.01\n",
    "    XG_N_ESTIMATORS = 1000\n",
    "    XG_SUBSAMPLE = 0.7\n",
    "    XG_COLSAMPLE_BYTREE = 0.6\n",
    "    XG_REG_ALPHA = 0.01\n",
    "    XG_REG_LAMBDA = 0.01\n",
    "\n",
    "elif MODEL_VERSION == \"V4\":\n",
    "    print(\"🎯 V4: HYPERPARAMETER TUNED\")\n",
    "    print(\"Source: Optuna optimization results\")\n",
    "    print(\"Best ensemble MAE achieved: 0.010905\")\n",
    "    print()\n",
    "    \n",
    "    # (Currently mirrors V2 defaults; could be pointed to a different RUN_LABEL later)\n",
    "    SHUFFLE_BUFFER = 50000\n",
    "    BATCH_SIZE = 32\n",
    "    EPOCHS = 100\n",
    "    DROPOUT = 0.2575200354740318\n",
    "    L2_STRENGTH = 0.00663529191505376\n",
    "    LEARNING_RATE = 0.00024154541141515264\n",
    "    UNITS_DENSE_LAYER = 768\n",
    "    DENSE_LAYERS = 3\n",
    "    UNITS_LAYER_1 = 768\n",
    "    UNITS_LAYER_2 = 512\n",
    "    UNITS_LAYER_3 = 192\n",
    "    XG_MAX_DEPTH = 3\n",
    "    XG_LEARNING_RATE = 0.02078118701762872\n",
    "    XG_N_ESTIMATORS = 400\n",
    "    XG_SUBSAMPLE = 0.6043956756975085\n",
    "    XG_COLSAMPLE_BYTREE = 0.7105039159566321\n",
    "    XG_REG_ALPHA = 0.07564898361977737\n",
    "    XG_REG_LAMBDA = 0.07177598190394643\n",
    "else:\n",
    "    print(f\"❌ Unknown MODEL_VERSION: {MODEL_VERSION}\")\n",
    "    print(\"Available versions: V1 (Conservative), V2 (Tuned), V3 (Experimental), V4 (Tuned)\")\n",
    "    raise ValueError(f\"Invalid MODEL_VERSION: {MODEL_VERSION}\")\n",
    "\n",
    "# 📋 DISPLAY CURRENT CONFIGURATION\n",
    "print(f\"🎯 ACTIVE CONFIGURATION - {MODEL_VERSION}\")\n",
    "print(\"=\" * 40)\n",
    "print(\"🧠 TensorFlow Settings:\")\n",
    "print(f\"  Architecture: {UNITS_LAYER_1} → {UNITS_LAYER_2} → {UNITS_LAYER_3} → {424} outputs\")\n",
    "print(f\"  Dropout: {DROPOUT:.3f}\")\n",
    "print(f\"  Learning Rate: {LEARNING_RATE:.6f}\")\n",
    "print(f\"  L2 Regularization: {L2_STRENGTH:.6f}\")\n",
    "print(f\"  Batch Size: {BATCH_SIZE}\")\n",
    "print(f\"  Epochs: {EPOCHS}\")\n",
    "\n",
    "if 'UNITS_LAYER_1' in globals() and UNITS_LAYER_1 >= 1536:\n",
    "    capacity = \"HIGH CAPACITY (Experimental)\"\n",
    "elif 'UNITS_LAYER_1' in globals() and UNITS_LAYER_1 >= 1024:\n",
    "    capacity = \"MODERATE CAPACITY (Balanced)\"  \n",
    "else:\n",
    "    capacity = \"CONSERVATIVE (Low overfitting risk)\"\n",
    "print(f\"  Architecture Type: {capacity}\")\n",
    "\n",
    "print(\"\\n🌳 XGBoost Settings:\")\n",
    "print(f\"  Max Depth: {XG_MAX_DEPTH}\")\n",
    "print(f\"  Learning Rate: {XG_LEARNING_RATE:.6f}\")\n",
    "print(f\"  N Estimators: {XG_N_ESTIMATORS}\")\n",
    "print(f\"  Subsample: {XG_SUBSAMPLE:.3f}\")\n",
    "print(f\"  ColSample ByTree: {XG_COLSAMPLE_BYTREE:.3f}\")\n",
    "print(f\"  L1 Regularization (alpha): {XG_REG_ALPHA:.6f}\")\n",
    "print(f\"  L2 Regularization (lambda): {XG_REG_LAMBDA:.6f}\")\n",
    "\n",
    "print(f\"\\n✅ Model Version {MODEL_VERSION} loaded successfully!\")\n",
    "\n",
    "# 🔧 VERSION COMPARISON FUNCTION\n",
    "def compare_versions():\n",
    "    \"\"\"Compare key differences between versions\"\"\"\n",
    "    print(\"\\n📊 VERSION COMPARISON\")\n",
    "    print(\"=\" * 60)\n",
    "    print(\"│ Metric          │ V1 (Conservative) │ V2 (Tuned)    │ V3 (Experimental) │\")\n",
    "    print(\"├─────────────────┼───────────────────┼───────────────┼───────────────────┤\")\n",
    "    print(\"│ TF Learning Rate│ 0.0001            │ 0.000242      │ 0.0001            │\")\n",
    "    print(\"│ TF Dropout      │ 0.5               │ 0.258         │ 0.3               │\")\n",
    "    print(\"│ TF Layer 1      │ 1024              │ 768           │ 1536              │\")\n",
    "    print(\"│ TF Epochs       │ 50                │ 100           │ 100               │\")\n",
    "    print(\"│ XGB Max Depth   │ 4                 │ 3             │ 5                 │\")\n",
    "    print(\"│ XGB Learn Rate  │ 0.05              │ 0.0208        │ 0.01              │\")\n",
    "    print(\"│ XGB N Trees     │ 200               │ 400           │ 1000              │\")\n",
    "    print(\"│ Focus           │ Stability         │ Performance   │ Max Capacity      │\")\n",
    "    print(\"└─────────────────┴───────────────────┴───────────────┴───────────────────┘\")\n",
    "\n",
    "compare_versions()\n",
    "\n",
    "print(f\"\\n🎮 TO SWITCH VERSIONS:\")\n",
    "print(f\"Change MODEL_VERSION = \\\"{MODEL_VERSION}\\\" to:\")\n",
    "print(f\"  • \\\"V1\\\" for Conservative (stable, less overfitting)\")\n",
    "print(f\"  • \\\"V2\\\" for Tuned (best known performance JSON-loaded)\")\n",
    "print(f\"  • \\\"V3\\\" for Experimental (maximum capacity)\")\n",
    "print(f\"Then restart the kernel and run all cells.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db9bf1bb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✅ Using 3 callbacks for MODEL_VERSION V2\n",
      "ℹ EarlyStopping min_delta=0.0001, ReduceLROnPlateau min_delta=0.0005\n"
     ]
    }
   ],
   "source": [
    "# 🔧 MODEL VERSION CONTROL SYSTEM\n",
    "MODEL_CALLBACKS_VERSION = \"V2\"  # Your current choice\n",
    "\n",
    "# Unified thresholds to avoid chasing tiny improvements\n",
    "EARLY_MIN_DELTA = 1e-4   # require at least this improvement in val_loss to reset patience\n",
    "LR_MIN_DELTA = 5e-4      # require this improvement to avoid LR reduction\n",
    "\n",
    "if MODEL_CALLBACKS_VERSION == \"V1\":\n",
    "    # Conservative callbacks\n",
    "    XG_EARLY_STOPPING_ROUNDS = 50\n",
    "    training_callbacks = [\n",
    "        tf.keras.callbacks.EarlyStopping(\n",
    "            patience=5,\n",
    "            restore_best_weights=True,\n",
    "            monitor='val_loss',\n",
    "            mode='min',\n",
    "            min_delta=EARLY_MIN_DELTA,\n",
    "            verbose=1\n",
    "        ),\n",
    "        tf.keras.callbacks.ReduceLROnPlateau(\n",
    "            monitor='val_loss',\n",
    "            factor=0.5,\n",
    "            patience=3,\n",
    "            min_lr=1e-7,\n",
    "            min_delta=LR_MIN_DELTA,\n",
    "            cooldown=1,\n",
    "            verbose=1\n",
    "        )\n",
    "    ]\n",
    "\n",
    "elif MODEL_CALLBACKS_VERSION == \"V2\":\n",
    "    # Tuned: earlier exit if only marginal gains; faster LR decay; protect against overfitting\n",
    "    XG_EARLY_STOPPING_ROUNDS = 50\n",
    "    training_callbacks = [\n",
    "        tf.keras.callbacks.EarlyStopping(\n",
    "            monitor='val_loss',\n",
    "            patience=4,            # slightly shorter\n",
    "            restore_best_weights=True,\n",
    "            mode='min',\n",
    "            min_delta=EARLY_MIN_DELTA,  # ignore micro improvements\n",
    "            verbose=1\n",
    "        ),\n",
    "        tf.keras.callbacks.ReduceLROnPlateau(\n",
    "            monitor='val_loss',\n",
    "            factor=0.3,\n",
    "            patience=2,            # reduce LR sooner\n",
    "            min_lr=1e-7,\n",
    "            min_delta=LR_MIN_DELTA,\n",
    "            cooldown=1,\n",
    "            verbose=1\n",
    "        ),\n",
    "        tf.keras.callbacks.ModelCheckpoint(\n",
    "            'best_model.keras',\n",
    "            monitor='val_loss',\n",
    "            save_best_only=True,\n",
    "            mode='min'\n",
    "        )\n",
    "    ]\n",
    "\n",
    "elif MODEL_CALLBACKS_VERSION == \"V3\":\n",
    "    # Experimental: still add min_delta constraints\n",
    "    XG_EARLY_STOPPING_ROUNDS = 50\n",
    "    training_callbacks = [\n",
    "        tf.keras.callbacks.EarlyStopping(\n",
    "            monitor='val_loss',\n",
    "            patience=10,\n",
    "            restore_best_weights=True,\n",
    "            mode='min',\n",
    "            min_delta=EARLY_MIN_DELTA,\n",
    "            verbose=1\n",
    "        ),\n",
    "        tf.keras.callbacks.ReduceLROnPlateau(\n",
    "            monitor='val_loss',\n",
    "            factor=0.2,\n",
    "            patience=3,\n",
    "            min_lr=1e-8,\n",
    "            min_delta=LR_MIN_DELTA,\n",
    "            cooldown=1,\n",
    "            verbose=1\n",
    "        ),\n",
    "        tf.keras.callbacks.ModelCheckpoint(\n",
    "            'best_model_v3.keras',\n",
    "            monitor='val_loss',\n",
    "            save_best_only=True,\n",
    "            mode='min'\n",
    "        )\n",
    "    ]\n",
    "\n",
    "print(f\"✅ Using {len(training_callbacks)} callbacks for MODEL_VERSION {MODEL_VERSION}\")\n",
    "print(f\"ℹ EarlyStopping min_delta={EARLY_MIN_DELTA}, ReduceLROnPlateau min_delta={LR_MIN_DELTA}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33a9bbcb",
   "metadata": {},
   "source": [
    "## 📊 Data Loading\n",
    "(Exploration removed – see 00_data_exploration.ipynb for full EDA)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08e0f5de",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "📥 Loading data files...\n",
      "✅ Training data: (1917, 558)\n",
      "✅ Training labels: (1917, 425)\n",
      "✅ Target pairs: (424, 3)\n",
      "✅ Test data: (90, 559)\n",
      "✅ lag_1: (90, 108)\n",
      "✅ lag_2: (90, 108)\n",
      "✅ lag_3: (90, 108)\n",
      "✅ lag_4: (90, 108)\n",
      "\n",
      "🎯 Data loaded successfully!\n",
      "Total unique dates in test: 90\n",
      "Total targets to predict: 424\n"
     ]
    }
   ],
   "source": [
    "# Load all data files\n",
    "print(\"📥 Loading data files...\")\n",
    "\n",
    "# Load training data\n",
    "try:\n",
    "    train_df = pd.read_csv(TRAIN_CSV)\n",
    "    train_labels_df = pd.read_csv(TRAIN_LABELS_CSV)\n",
    "    target_pairs_df = pd.read_csv(TARGET_PAIRS_CSV)\n",
    "    test_df = pd.read_csv(TEST_CSV)\n",
    "    \n",
    "    print(f\"✅ Training data: {train_df.shape}\")\n",
    "    print(f\"✅ Training labels: {train_labels_df.shape}\")  \n",
    "    print(f\"✅ Target pairs: {target_pairs_df.shape}\")\n",
    "    print(f\"✅ Test data: {test_df.shape}\")\n",
    "    \n",
    "except FileNotFoundError as e:\n",
    "    print(f\"❌ Error loading files: {e}\")\n",
    "    print(\"Make sure you're in the correct directory with the data files\")\n",
    "\n",
    "# Load lagged test labels\n",
    "lagged_files = {\n",
    "    'lag_1': LAGGED_DIR / 'test_labels_lag_1.csv',\n",
    "    'lag_2': LAGGED_DIR / 'test_labels_lag_2.csv', \n",
    "    'lag_3': LAGGED_DIR / 'test_labels_lag_3.csv',\n",
    "    'lag_4': LAGGED_DIR / 'test_labels_lag_4.csv'\n",
    "}\n",
    "\n",
    "lagged_data = {}\n",
    "for lag_name, file_path in lagged_files.items():\n",
    "    try:\n",
    "        lagged_data[lag_name] = pd.read_csv(file_path)\n",
    "        print(f\"✅ {lag_name}: {lagged_data[lag_name].shape}\")\n",
    "    except FileNotFoundError:\n",
    "        print(f\"⚠️ Warning: {file_path} not found\")\n",
    "\n",
    "print(\"\\n🎯 Data loaded successfully!\")\n",
    "print(f\"Total unique dates in test: {test_df['date_id'].nunique()}\")\n",
    "print(f\"Total targets to predict: {len([col for col in train_labels_df.columns if col.startswith('target_')])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30791dc1",
   "metadata": {},
   "source": [
    "## 🔧 Feature Engineering\n",
    "Using src.features.create_technical_features_fast for consistency (full EDA version lives in 00 notebook)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5b9efa46",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔧 Applying lightweight feature engineering (training)\n",
      "(1917, 678) (90, 679)\n"
     ]
    }
   ],
   "source": [
    "from src.features import create_technical_features_fast\n",
    "print(\"🔧 Applying lightweight feature engineering (training)\")\n",
    "train_engineered = create_technical_features_fast(train_df, limit_cols=40)\n",
    "test_engineered = create_technical_features_fast(test_df, limit_cols=40)\n",
    "print(train_engineered.shape, test_engineered.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75a7e20e",
   "metadata": {},
   "source": [
    "## 🔄 Data Preprocessing\n",
    "\n",
    "Prepare the data for both XGBoost and TensorFlow models:\n",
    "- Handle missing values appropriately\n",
    "- Scale features for neural networks\n",
    "- Create proper train/validation splits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c968d0c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔄 Starting data preprocessing...\n",
      "Training data after merge: (1917, 1102)\n",
      "Feature columns: 677\n",
      "Target columns: 424\n",
      "🔧 Handling missing values...\n",
      "Missing values in X_train: 0\n",
      "Missing values in X_test: 0\n",
      "Missing values in y_train: 0\n",
      "📏 Scaling features for TensorFlow...\n",
      "📅 Creating time-based train/validation split...\n",
      "Training set: 1534 samples\n",
      "Validation set: 383 samples\n",
      "✅ Preprocessing completed!\n",
      "Ready for modeling with 677 features and 424 targets\n"
     ]
    }
   ],
   "source": [
    "# Data Preprocessing Pipeline\n",
    "def preprocess_data(train_df, train_labels_df, test_df):\n",
    "    \"\"\"\n",
    "    Comprehensive preprocessing for both XGBoost and TensorFlow\n",
    "    \"\"\"\n",
    "    print(\"🔄 Starting data preprocessing...\")\n",
    "\n",
    "    # 0. Sort by date before any fill\n",
    "    train_df = train_df.sort_values('date_id').reset_index(drop=True)\n",
    "    test_df = test_df.sort_values('date_id').reset_index(drop=True)\n",
    "\n",
    "    # 1. Merge training data with labels\n",
    "    #train_full = train_df.merge(train_labels_df, on='date_id', how='inner')\n",
    "\n",
    "    train_full = train_df.merge(train_labels_df, on='date_id', how='inner') \\\n",
    "                         .sort_values('date_id').reset_index(drop=True)\n",
    "    \n",
    "    \n",
    "    print(f\"Training data after merge: {train_full.shape}\")\n",
    "    \n",
    "    # 2. Identify feature columns (exclude date_id and targets)\n",
    "    target_cols = [col for col in train_full.columns if col.startswith('target_')]\n",
    "    feature_cols = [col for col in train_full.columns if col not in ['date_id'] + target_cols]\n",
    "    \n",
    "    print(f\"Feature columns: {len(feature_cols)}\")\n",
    "    print(f\"Target columns: {len(target_cols)}\")\n",
    "    \n",
    "    # 3. Handle missing values\n",
    "    print(\"🔧 Handling missing values...\")\n",
    "    \n",
    "    # For features: use forward fill then median\n",
    "    X_train = train_full[feature_cols].copy()\n",
    "    X_test = test_df[feature_cols].copy()\n",
    "    \n",
    "    # Forward fill for time series continuity\n",
    "    X_train = X_train.fillna(method='ffill')\n",
    "    X_test = X_test.fillna(method='ffill')\n",
    "    \n",
    "    # Then use median for remaining NAs\n",
    "    medians = X_train.median()\n",
    "    X_train = X_train.fillna(medians)\n",
    "    X_test = X_test.fillna(medians)\n",
    "    \n",
    "    # For targets: forward fill only (preserve time series nature)\n",
    "    y_train = train_full[target_cols].copy()\n",
    "    y_train = y_train.fillna(method='ffill')\n",
    "    y_train = y_train.fillna(0)  # If still NAs, use 0\n",
    "    \n",
    "    print(f\"Missing values in X_train: {X_train.isnull().sum().sum()}\")\n",
    "    print(f\"Missing values in X_test: {X_test.isnull().sum().sum()}\")\n",
    "    print(f\"Missing values in y_train: {y_train.isnull().sum().sum()}\")\n",
    "    \n",
    "    # 4. Feature scaling for TensorFlow (XGBoost doesn't need it)\n",
    "    print(\"📏 Scaling features for TensorFlow...\")\n",
    "    \n",
    "    scaler = RobustScaler()  # More robust to outliers than StandardScaler\n",
    "    X_train_scaled = scaler.fit_transform(X_train)\n",
    "    X_test_scaled = scaler.transform(X_test)\n",
    "    \n",
    "    # Convert back to DataFrames with proper column names\n",
    "    X_train_scaled = pd.DataFrame(X_train_scaled, columns=feature_cols, index=X_train.index)\n",
    "    X_test_scaled = pd.DataFrame(X_test_scaled, columns=feature_cols, index=X_test.index)\n",
    "    \n",
    "    # 5. Time-based train/validation split (important for time series!)\n",
    "    print(\"📅 Creating time-based train/validation split...\")\n",
    "    \n",
    "    # Sort by date_id to ensure proper time ordering\n",
    "    train_dates = train_full['date_id'].sort_values()\n",
    "    split_idx = int(len(train_dates) * 0.8)  # 80% train, 20% validation\n",
    "    \n",
    "    train_date_threshold = train_dates.iloc[split_idx]\n",
    "    \n",
    "    train_mask = train_full['date_id'] <= train_date_threshold\n",
    "    val_mask = train_full['date_id'] > train_date_threshold\n",
    "    \n",
    "    # Create splits\n",
    "    X_train_split = X_train[train_mask]\n",
    "    X_val_split = X_train[val_mask]\n",
    "    X_train_scaled_split = X_train_scaled[train_mask]\n",
    "    X_val_scaled_split = X_train_scaled[val_mask]\n",
    "    \n",
    "    y_train_split = y_train[train_mask]\n",
    "    y_val_split = y_train[val_mask]\n",
    "    \n",
    "    print(f\"Training set: {X_train_split.shape[0]} samples\")\n",
    "    print(f\"Validation set: {X_val_split.shape[0]} samples\")\n",
    "    \n",
    "    return {\n",
    "        'X_train': X_train,\n",
    "        'X_test': X_test,\n",
    "        'y_train': y_train,\n",
    "        'X_train_scaled': X_train_scaled,\n",
    "        'X_test_scaled': X_test_scaled,\n",
    "        'X_train_split': X_train_split,\n",
    "        'X_val_split': X_val_split,\n",
    "        'X_train_scaled_split': X_train_scaled_split,\n",
    "        'X_val_scaled_split': X_val_scaled_split,\n",
    "        'y_train_split': y_train_split,\n",
    "        'y_val_split': y_val_split,\n",
    "        'scaler': scaler,\n",
    "        'feature_cols': feature_cols,\n",
    "        'target_cols': target_cols,\n",
    "        'train_dates': train_full['date_id'].values,\n",
    "        'test_dates': test_df['date_id'].values\n",
    "    }\n",
    "\n",
    "# Apply preprocessing\n",
    "data = preprocess_data(train_engineered, train_labels_df, test_engineered)\n",
    "\n",
    "print(\"✅ Preprocessing completed!\")\n",
    "print(f\"Ready for modeling with {len(data['feature_cols'])} features and {len(data['target_cols'])} targets\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58a9590a",
   "metadata": {},
   "source": [
    "## 🌳 XGBoost Model Training\n",
    "\n",
    "XGBoost is excellent for structured data and provides:\n",
    "- **High performance** on tabular data\n",
    "- **Feature importance** for interpretability  \n",
    "- **Robust to outliers** and missing values\n",
    "- **Fast training** and prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b69e32f1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialized MultiTargetXGB with params.\n"
     ]
    }
   ],
   "source": [
    "# Use src.models_xgb.MultiTargetXGB\n",
    "from src.models_xgb import MultiTargetXGB\n",
    "xgb_params = dict(max_depth=XG_MAX_DEPTH, learning_rate=XG_LEARNING_RATE, n_estimators=XG_N_ESTIMATORS,\n",
    "                  subsample=XG_SUBSAMPLE, colsample_bytree=XG_COLSAMPLE_BYTREE,\n",
    "                  reg_alpha=XG_REG_ALPHA, reg_lambda=XG_REG_LAMBDA, objective='reg:squarederror', eval_metric='mae',\n",
    "                  random_state=42, n_jobs=-1, verbosity=0)\n",
    "\n",
    "xgb_model = MultiTargetXGB(params=xgb_params)\n",
    "print(\"Initialized MultiTargetXGB with params.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e3dd74e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[FAST] Training XGBoost on 20 targets (subset)\n",
      "Training XGB 0/20 -> target_0\n",
      "[FAST] Average MAE (subset): 0.010872\n"
     ]
    }
   ],
   "source": [
    "# Train XGBoost (multi-target) subset (fast mode only)\n",
    "if RUN_MODE == 'fast':\n",
    "    subset_targets = data['target_cols'][:20]\n",
    "    print(f\"[FAST] Training XGBoost on {len(subset_targets)} targets (subset)\")\n",
    "    xgb_metrics = xgb_model.train(\n",
    "        data['X_train_split'], data['y_train_split'][subset_targets],\n",
    "        data['X_val_split'], data['y_val_split'][subset_targets],\n",
    "        target_cols=subset_targets, verbose=True)\n",
    "    print(f\"[FAST] Average MAE (subset): {np.mean(list(xgb_metrics.values())):.6f}\")\n",
    "else:\n",
    "    # Define subset_targets for downstream overlap logic even in full mode\n",
    "    subset_targets = data['target_cols'][:20]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5beb1f6c",
   "metadata": {},
   "source": [
    "## 🧠 TensorFlow Model Training\n",
    "\n",
    "TensorFlow excels at:\n",
    "- **Learning complex patterns** and interactions\n",
    "- **Handling high-dimensional data** \n",
    "- **Capturing non-linear relationships**\n",
    "- **Multi-target learning** with shared representations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e82144f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[FAST] Building TensorFlow dense regressor (subset 50 targets)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "I0000 00:00:1755074099.467244   25286 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3584 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 1/100\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "I0000 00:00:1755074102.802499   25907 service.cc:148] XLA service 0x7f42040063c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n",
      "I0000 00:00:1755074102.802584   25907 service.cc:156]   StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6\n",
      "2025-08-13 10:35:02.892983: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n",
      "I0000 00:00:1755074103.175167   25907 cuda_dnn.cc:529] Loaded cuDNN version 90101\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m19/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━\u001b[0m\u001b[37m━━━━━\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - loss: 2.1467 - mae: 0.8376"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "I0000 00:00:1755074106.280912   25907 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m10s\u001b[0m 186ms/step - loss: 2.1028 - mae: 0.8272 - val_loss: 0.6564 - val_mae: 0.0540 - learning_rate: 0.0017\n",
      "Epoch 2/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 1.4224 - mae: 0.6565 - val_loss: 0.5839 - val_mae: 0.0567 - learning_rate: 0.0017\n",
      "Epoch 3/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 1.0656 - mae: 0.5110 - val_loss: 0.5450 - val_mae: 0.0503 - learning_rate: 0.0017\n",
      "Epoch 4/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.8623 - mae: 0.4145 - val_loss: 0.5064 - val_mae: 0.0384 - learning_rate: 0.0017\n",
      "Epoch 5/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 16ms/step - loss: 0.7099 - mae: 0.3292 - val_loss: 0.4675 - val_mae: 0.0305 - learning_rate: 0.0017\n",
      "Epoch 6/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 16ms/step - loss: 0.6086 - mae: 0.2730 - val_loss: 0.4285 - val_mae: 0.0264 - learning_rate: 0.0017\n",
      "Epoch 7/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.5186 - mae: 0.2208 - val_loss: 0.3906 - val_mae: 0.0228 - learning_rate: 0.0017\n",
      "Epoch 8/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.4525 - mae: 0.1843 - val_loss: 0.3566 - val_mae: 0.0196 - learning_rate: 0.0017\n",
      "Epoch 9/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 16ms/step - loss: 0.3988 - mae: 0.1529 - val_loss: 0.3244 - val_mae: 0.0178 - learning_rate: 0.0017\n",
      "Epoch 10/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.3524 - mae: 0.1284 - val_loss: 0.2949 - val_mae: 0.0172 - learning_rate: 0.0017\n",
      "Epoch 11/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.3140 - mae: 0.1091 - val_loss: 0.2683 - val_mae: 0.0155 - learning_rate: 0.0017\n",
      "Epoch 12/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.2816 - mae: 0.0908 - val_loss: 0.2434 - val_mae: 0.0148 - learning_rate: 0.0017\n",
      "Epoch 13/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 13ms/step - loss: 0.2512 - mae: 0.0767 - val_loss: 0.2210 - val_mae: 0.0140 - learning_rate: 0.0017\n",
      "Epoch 14/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.2256 - mae: 0.0651 - val_loss: 0.2006 - val_mae: 0.0140 - learning_rate: 0.0017\n",
      "Epoch 15/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - loss: 0.2028 - mae: 0.0546 - val_loss: 0.1820 - val_mae: 0.0133 - learning_rate: 0.0017\n",
      "Epoch 16/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - loss: 0.1827 - mae: 0.0468 - val_loss: 0.1650 - val_mae: 0.0131 - learning_rate: 0.0017\n",
      "Epoch 17/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - loss: 0.1645 - mae: 0.0394 - val_loss: 0.1495 - val_mae: 0.0131 - learning_rate: 0.0017\n",
      "Epoch 18/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.1483 - mae: 0.0331 - val_loss: 0.1354 - val_mae: 0.0129 - learning_rate: 0.0017\n",
      "Epoch 19/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.1337 - mae: 0.0286 - val_loss: 0.1225 - val_mae: 0.0127 - learning_rate: 0.0017\n",
      "Epoch 20/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.1207 - mae: 0.0256 - val_loss: 0.1108 - val_mae: 0.0127 - learning_rate: 0.0017\n",
      "Epoch 21/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.1088 - mae: 0.0225 - val_loss: 0.1000 - val_mae: 0.0127 - learning_rate: 0.0017\n",
      "Epoch 22/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0981 - mae: 0.0203 - val_loss: 0.0903 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 23/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0884 - mae: 0.0183 - val_loss: 0.0814 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 24/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0796 - mae: 0.0173 - val_loss: 0.0733 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 25/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0717 - mae: 0.0163 - val_loss: 0.0660 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 26/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0645 - mae: 0.0158 - val_loss: 0.0594 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 27/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0580 - mae: 0.0152 - val_loss: 0.0533 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 28/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0521 - mae: 0.0148 - val_loss: 0.0479 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 29/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0468 - mae: 0.0146 - val_loss: 0.0430 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 30/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0419 - mae: 0.0144 - val_loss: 0.0385 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 31/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0376 - mae: 0.0143 - val_loss: 0.0345 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 32/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0337 - mae: 0.0142 - val_loss: 0.0309 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 33/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0302 - mae: 0.0142 - val_loss: 0.0277 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 34/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0270 - mae: 0.0141 - val_loss: 0.0248 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 35/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0242 - mae: 0.0141 - val_loss: 0.0222 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 36/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0216 - mae: 0.0141 - val_loss: 0.0198 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 37/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0194 - mae: 0.0141 - val_loss: 0.0177 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 38/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0173 - mae: 0.0141 - val_loss: 0.0158 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 39/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0155 - mae: 0.0141 - val_loss: 0.0141 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 40/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0139 - mae: 0.0141 - val_loss: 0.0126 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 41/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - loss: 0.0124 - mae: 0.0141 - val_loss: 0.0113 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 42/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0111 - mae: 0.0141 - val_loss: 0.0101 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 43/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0099 - mae: 0.0141 - val_loss: 0.0090 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 44/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0089 - mae: 0.0141 - val_loss: 0.0081 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 45/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0080 - mae: 0.0141 - val_loss: 0.0072 - val_mae: 0.0125 - learning_rate: 0.0017\n",
      "Epoch 46/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0071 - mae: 0.0141 - val_loss: 0.0065 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 47/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0064 - mae: 0.0141 - val_loss: 0.0058 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 48/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 9ms/step - loss: 0.0057 - mae: 0.0141 - val_loss: 0.0052 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 49/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0052 - mae: 0.0141 - val_loss: 0.0047 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 50/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0046 - mae: 0.0141 - val_loss: 0.0042 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 51/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0042 - mae: 0.0141 - val_loss: 0.0038 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 52/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0038 - mae: 0.0141 - val_loss: 0.0034 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 53/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0034 - mae: 0.0141 - val_loss: 0.0031 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 54/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0031 - mae: 0.0141 - val_loss: 0.0028 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 55/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0028 - mae: 0.0141 - val_loss: 0.0025 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 56/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0025 - mae: 0.0141 - val_loss: 0.0023 - val_mae: 0.0125 - learning_rate: 0.0017\n",
      "Epoch 57/100\n",
      "\u001b[1m20/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m━━━━\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - loss: 0.0023 - mae: 0.0141\n",
      "Epoch 57: ReduceLROnPlateau reducing learning rate to 0.0005019418196752667.\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0023 - mae: 0.0141 - val_loss: 0.0021 - val_mae: 0.0126 - learning_rate: 0.0017\n",
      "Epoch 58/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m-1s\u001b[0m -59987us/step - loss: 0.0022 - mae: 0.0141 - val_loss: 0.0020 - val_mae: 0.0126 - learning_rate: 5.0194e-04\n",
      "Epoch 59/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0021 - mae: 0.0140 - val_loss: 0.0020 - val_mae: 0.0125 - learning_rate: 5.0194e-04\n",
      "Epoch 60/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0020 - mae: 0.0140 - val_loss: 0.0019 - val_mae: 0.0125 - learning_rate: 5.0194e-04\n",
      "Epoch 61/100\n",
      "\u001b[1m19/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━\u001b[0m\u001b[37m━━━━━\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - loss: 0.0020 - mae: 0.0140\n",
      "Epoch 61: ReduceLROnPlateau reducing learning rate to 0.00015058254939503968.\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 12ms/step - loss: 0.0020 - mae: 0.0140 - val_loss: 0.0018 - val_mae: 0.0126 - learning_rate: 5.0194e-04\n",
      "Epoch 62/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0019 - mae: 0.0140 - val_loss: 0.0018 - val_mae: 0.0126 - learning_rate: 1.5058e-04\n",
      "Epoch 63/100\n",
      "\u001b[1m19/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━\u001b[0m\u001b[37m━━━━━\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - loss: 0.0019 - mae: 0.0140\n",
      "Epoch 63: ReduceLROnPlateau reducing learning rate to 4.51747648185119e-05.\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0019 - mae: 0.0140 - val_loss: 0.0018 - val_mae: 0.0126 - learning_rate: 1.5058e-04\n",
      "Epoch 64/100\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 10ms/step - loss: 0.0019 - mae: 0.0140 - val_loss: 0.0018 - val_mae: 0.0126 - learning_rate: 4.5175e-05\n",
      "Epoch 65/100\n",
      "\u001b[1m18/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━\u001b[0m\u001b[37m━━━━━\u001b[0m \u001b[1m0s\u001b[0m 6ms/step - loss: 0.0019 - mae: 0.0140\n",
      "Epoch 65: ReduceLROnPlateau reducing learning rate to 1.35524296638323e-05.\n",
      "\u001b[1m24/24\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 11ms/step - loss: 0.0019 - mae: 0.0140 - val_loss: 0.0018 - val_mae: 0.0126 - learning_rate: 4.5175e-05\n",
      "Epoch 65: early stopping\n",
      "Restoring model weights from the end of the best epoch: 61.\n",
      "[FAST] TensorFlow subset training complete\n"
     ]
    }
   ],
   "source": [
    "\n",
    "# TensorFlow subset training (fast mode only)\n",
    "if RUN_MODE == 'fast':\n",
    "    from src.models_tf import DenseRegressor\n",
    "    tf_params = dict(units_1=UNITS_LAYER_1, units_2=UNITS_LAYER_2, units_3=UNITS_LAYER_3,\n",
    "                     dropout=DROPOUT, l2_reg=L2_STRENGTH, learning_rate=LEARNING_RATE)\n",
    "    tf_model = DenseRegressor(input_dim=len(data['feature_cols']), output_dim=50, params=tf_params)\n",
    "    print(\"[FAST] Building TensorFlow dense regressor (subset 50 targets)\")\n",
    "    tf_model.build()\n",
    "    subset_targets_tf = data['target_cols'][:50]\n",
    "    history = tf_model.train(\n",
    "        data['X_train_scaled_split'], data['y_train_split'][subset_targets_tf],\n",
    "        data['X_val_scaled_split'], data['y_val_split'][subset_targets_tf],\n",
    "        epochs=EPOCHS, batch_size=BATCH_SIZE, callbacks=training_callbacks, verbose=1)\n",
    "    print(\"[FAST] TensorFlow subset training complete\")\n",
    "else:\n",
    "    # Ensure variables exist for later code referencing overlap logic\n",
    "    from src.models_tf import DenseRegressor\n",
    "    tf_params = dict(units_1=UNITS_LAYER_1, units_2=UNITS_LAYER_2, units_3=UNITS_LAYER_3,\n",
    "                     dropout=DROPOUT, l2_reg=L2_STRENGTH, learning_rate=LEARNING_RATE)\n",
    "    subset_targets_tf = data['target_cols'][:50]\n",
    "    tf_model = None  # Placeholder; full model built later"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e44a6b77",
   "metadata": {},
   "source": [
    "## 🎭 Model Ensemble Strategy\n",
    "\n",
    "Combining XGBoost and TensorFlow predictions often yields better results than either model alone:\n",
    "- **XGBoost**: Great at capturing feature interactions and robust patterns\n",
    "- **TensorFlow**: Excellent at learning complex non-linear relationships\n",
    "- **Ensemble**: Combines the strengths of both approaches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20940688",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[FAST] Initialized simple weighted ensemble (subset phase)\n"
     ]
    }
   ],
   "source": [
    "# Use src.ensemble.WeightedEnsemble (subset ensemble only in fast mode)\n",
    "from src.ensemble import WeightedEnsemble\n",
    "if RUN_MODE == 'fast':\n",
    "    ensemble = WeightedEnsemble(xgb_model, tf_model, xgb_weight=0.5)\n",
    "    print(\"[FAST] Initialized simple weighted ensemble (subset phase)\")\n",
    "else:\n",
    "    ensemble = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e9212aa",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[FAST] Subset validation competition-style score on 20 targets: 0.078722\n"
     ]
    }
   ],
   "source": [
    "# Optional: quick validation Sharpe-style competition score on subset targets (fast mode only)\n",
    "if RUN_MODE == 'fast':\n",
    "    from src.metrics import make_validation_dataframe, rank_correlation_sharpe_ratio\n",
    "    subset_overlap = [t for t in subset_targets_tf if t in subset_targets]\n",
    "    if subset_overlap:\n",
    "        xgb_val_preds = xgb_model.predict(data['X_val_split'])[subset_overlap]\n",
    "        tf_val_preds = tf_model.predict(data['X_val_scaled_split'])[subset_overlap]\n",
    "        val_blend = (xgb_val_preds.add(tf_val_preds, fill_value=0)) / 2\n",
    "        y_val_true = data['y_val_split'][subset_overlap]\n",
    "        merged = make_validation_dataframe(y_val_true, val_blend)\n",
    "        val_score = rank_correlation_sharpe_ratio(merged)\n",
    "        print(f\"[FAST] Subset validation competition-style score on {len(subset_overlap)} targets: {val_score:.6f}\")\n",
    "    else:\n",
    "        print(\"[FAST] No overlapping subset targets found for validation score.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05db735d",
   "metadata": {},
   "source": [
    "## 🔮 Prediction Generation\n",
    "\n",
    "This section handles the competition's specific requirements:\n",
    "- Process test data in **batches by date_id**\n",
    "- Incorporate **lagged predictions** from previous periods\n",
    "- Follow the exact **prediction format** required by `MitsuiGateway`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c4fad57",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Full for submit - to be run on submition mode only\n",
    "if RUN_MODE == 'full':\n",
    "    full_tf_model = DenseRegressor(input_dim=len(data['feature_cols']), output_dim=len(data['target_cols']), params=tf_params)\n",
    "    full_tf_model.build()\n",
    "    print(\"[FULL] Training full TensorFlow model on all targets\")\n",
    "    full_tf_model.train(\n",
    "        data['X_train_scaled'], data['y_train'][data['target_cols']],\n",
    "        data['X_val_scaled_split'], data['y_val_split'],\n",
    "        epochs=EPOCHS, batch_size=BATCH_SIZE, callbacks=training_callbacks, verbose=1)\n",
    "else:\n",
    "    full_tf_model = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1d90917",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Full for submit - to be run on submition mode only\n",
    "if RUN_MODE == 'full':\n",
    "    full_xgb_model = MultiTargetXGB(params=xgb_params)\n",
    "    print(\"[FULL] Training XGBoost on all targets\")\n",
    "    full_xgb_model.train(\n",
    "        data['X_train'], data['y_train'],\n",
    "        target_cols=data['target_cols'], verbose=True)\n",
    "else:\n",
    "    full_xgb_model = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3267339c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Full for submit - to be run on submition mode only\n",
    "if RUN_MODE == 'full':\n",
    "    from src.ensemble import WeightedEnsemble as _FullWeightedEnsemble\n",
    "    final_ensemble = _FullWeightedEnsemble(full_xgb_model, full_tf_model, xgb_weight=0.5)\n",
    "    print(\"[FULL] Final ensemble ready\")\n",
    "else:\n",
    "    final_ensemble = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80150080",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model selection: choose best of XGB, TF, Ensemble based on validation Sharpe (full mode)\n",
    "if RUN_MODE == 'full':\n",
    "    from src.metrics import make_validation_dataframe, rank_correlation_sharpe_ratio\n",
    "    # Compute validation predictions for each model\n",
    "    val_targets = data['y_val_split'].columns\n",
    "    xgb_val_pred = full_xgb_model.predict(data['X_val_split'])[val_targets]\n",
    "    tf_val_pred = full_tf_model.predict(data['X_val_scaled_split'])[val_targets]\n",
    "    ens_val_pred = (xgb_val_pred.add(tf_val_pred, fill_value=0)) / 2\n",
    "    y_val_true = data['y_val_split'][val_targets]\n",
    "    def _score(pred_df):\n",
    "        merged = make_validation_dataframe(y_val_true, pred_df[val_targets])\n",
    "        return rank_correlation_sharpe_ratio(merged)\n",
    "    scores = {\n",
    "        'xgb': _score(xgb_val_pred),\n",
    "        'tf': _score(tf_val_pred),\n",
    "        'ensemble': _score(ens_val_pred)\n",
    "    }\n",
    "    best_name = max(scores, key=scores.get)\n",
    "    if best_name == 'xgb':\n",
    "        submission_model = full_xgb_model\n",
    "    elif best_name == 'tf':\n",
    "        submission_model = full_tf_model\n",
    "    else:\n",
    "        submission_model = final_ensemble\n",
    "    print(f\"[FULL] Validation Sharpe scores -> XGB: {scores['xgb']:.5f} | TF: {scores['tf']:.5f} | Ensemble: {scores['ensemble']:.5f}\")\n",
    "    print(f\"[FULL] Selected submission model: {best_name}\")\n",
    "else:\n",
    "    submission_model = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9af21f16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Full for submit - to be run on submition mode only\n",
    "if RUN_MODE == 'full':\n",
    "    final_predictions = final_ensemble.predict(data['X_test'], data['X_test_scaled'])\n",
    "    final_predictions['date_id'] = test_df['date_id'].values\n",
    "    print(\"[FULL] Generated predictions shape:\", final_predictions.shape)\n",
    "    print(final_predictions.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcf0eb8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Full for submit - to be run on submition mode only\n",
    "if RUN_MODE == 'full':\n",
    "    import joblib, json\n",
    "    models_to_save = {\n",
    "        'xgb_model': full_xgb_model,\n",
    "        'tf_model': full_tf_model,\n",
    "        'ensemble': final_ensemble,\n",
    "        'submission_model': submission_model,\n",
    "        'scaler': data['scaler'],\n",
    "        'feature_cols': data['feature_cols'],\n",
    "        'target_cols': data['target_cols'],  # <-- added for robust predict()\n",
    "    }\n",
    "    joblib.dump(models_to_save, 'trained_models.pkl')\n",
    "    meta = {'selected_model': type(submission_model).__name__}\n",
    "    with open('model_selection_meta.json','w') as f:\n",
    "        json.dump(meta, f)\n",
    "    print(f\"[FULL] Artifacts saved -> trained_models.pkl (selected {meta['selected_model']})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a50e05c",
   "metadata": {},
   "source": [
    "## 📤 Submission Preparation & `predict()` Function\n",
    "\n",
    "This section creates the **`predict()` function** that will be called by the competition system. This is the most important part for Kaggle submission!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "490c4b1f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[FAST] Skipping predict() definition (full submission artifacts not built in fast mode)\n"
     ]
    }
   ],
   "source": [
    "# Full for submit - to be run on submition mode only\n",
    "if RUN_MODE == 'full':\n",
    "    from src.features import create_technical_features_fast\n",
    "    _ARTIFACTS = None\n",
    "\n",
    "    def _load_artifacts():\n",
    "        import joblib\n",
    "        global _ARTIFACTS, _SUBMISSION_MODEL, _TARGET_COLS\n",
    "        if _ARTIFACTS is None:\n",
    "            _ARTIFACTS = joblib.load('trained_models.pkl')\n",
    "            _SUBMISSION_MODEL = (\n",
    "                _ARTIFACTS.get('submission_model')\n",
    "                or _ARTIFACTS.get('ensemble')\n",
    "                or _ARTIFACTS.get('xgb_model')\n",
    "                or _ARTIFACTS.get('tf_model')\n",
    "            )\n",
    "            _TARGET_COLS = _ARTIFACTS.get('target_cols')\n",
    "        return _ARTIFACTS, _SUBMISSION_MODEL, _TARGET_COLS\n",
    "\n",
    "    def _ensure_full_targets(preds_df, target_cols):\n",
    "        \"\"\"Return single-row DataFrame with all target_* columns (ordered).\"\"\"\n",
    "        if len(preds_df.index) != 1:\n",
    "            preds_df = preds_df.iloc[[0]]\n",
    "        out = pd.DataFrame(index=preds_df.index)\n",
    "        for col in target_cols:\n",
    "            if col in preds_df.columns:\n",
    "                out[col] = preds_df[col].astype('float32')\n",
    "            else:\n",
    "                out[col] = 0.0\n",
    "        return out[target_cols]\n",
    "\n",
    "    def predict(test_batch, label_lags_1_batch, label_lags_2_batch, label_lags_3_batch, label_lags_4_batch):\n",
    "        try:\n",
    "            artifacts, submit_model, target_cols = _load_artifacts()\n",
    "            if target_cols is None:\n",
    "                # Fallback to file header or generic schema\n",
    "                try:\n",
    "                    header = pd.read_csv('train_labels.csv', nrows=1).columns\n",
    "                    target_cols = [c for c in header if c.startswith('target_')]\n",
    "                except Exception:\n",
    "                    target_cols = [f'target_{i}' for i in range(424)]\n",
    "\n",
    "            scaler = artifacts['scaler']\n",
    "            feature_cols = artifacts['feature_cols']\n",
    "            test_df_local = test_batch.to_pandas() if hasattr(test_batch, 'to_pandas') else test_batch\n",
    "\n",
    "            engineered = create_technical_features_fast(test_df_local, limit_cols=40)\n",
    "            # TODO: integrate lag label batches into engineered features if needed\n",
    "\n",
    "            available = [c for c in feature_cols if c in engineered.columns]\n",
    "            X = engineered[available].copy()\n",
    "            for c in feature_cols:\n",
    "                if c not in X.columns:\n",
    "                    X[c] = 0.0\n",
    "            X = X[feature_cols].ffill().fillna(0)\n",
    "            X_scaled = pd.DataFrame(scaler.transform(X), columns=feature_cols, index=X.index)\n",
    "\n",
    "            preds = None\n",
    "            errors = []\n",
    "            # Attempt different signatures (ensemble may accept (X, X_scaled))\n",
    "            for attempt in (\n",
    "                lambda: submit_model.predict(X, X_scaled),\n",
    "                lambda: submit_model.predict(X_scaled),\n",
    "                lambda: submit_model.predict(X),\n",
    "            ):\n",
    "                try:\n",
    "                    preds = attempt()\n",
    "                    break\n",
    "                except Exception as e:\n",
    "                    errors.append(str(e))\n",
    "                    continue\n",
    "            if preds is None:\n",
    "                raise RuntimeError(f\"All prediction attempts failed: {errors}\")\n",
    "\n",
    "            if not isinstance(preds, pd.DataFrame):\n",
    "                # Try to wrap numpy array\n",
    "                if hasattr(preds, 'shape') and len(preds.shape) == 2 and preds.shape[0] >= 1:\n",
    "                    # If number of columns matches targets, map directly\n",
    "                    if preds.shape[1] == len(target_cols):\n",
    "                        preds = pd.DataFrame(preds, columns=target_cols, index=X.index)\n",
    "                    else:\n",
    "                        # Create placeholder DataFrame with whatever columns we can infer\n",
    "                        preds = pd.DataFrame(preds, index=X.index)\n",
    "                else:\n",
    "                    raise ValueError(\"Prediction output not understood.\")\n",
    "\n",
    "            full_preds = _ensure_full_targets(preds, target_cols)\n",
    "            return full_preds.iloc[[0]]\n",
    "        except Exception as e:\n",
    "            print(\"predict() error:\", e)\n",
    "            # Robust fallback: all-zero schema\n",
    "            try:\n",
    "                header = pd.read_csv('train_labels.csv', nrows=1).columns\n",
    "                fallback_cols = [c for c in header if c.startswith('target_')]\n",
    "            except Exception:\n",
    "                fallback_cols = [f'target_{i}' for i in range(424)]\n",
    "            return pd.DataFrame([[0.0]*len(fallback_cols)], columns=fallback_cols)\n",
    "\n",
    "    print(\"[FULL] Robust predict() defined (full target schema guaranteed).\")\n",
    "else:\n",
    "    print(\"[FAST] Skipping predict() definition (full submission artifacts not built in fast mode)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "83a131ca",
   "metadata": {},
   "source": [
    "## 🎉 Completion\n",
    "Training + artifact export done. For exploration refer to 00 notebook; for tuning refer to 1_mitsui_hyperparameter_optimization.ipynb."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "tf-gpu",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}