GPs are flexible probabilistic models
\[f(\mathbf{X}) = \left[f(\mathbf{x_1})f(\mathbf{x_2})\dots f(\mathbf{x_n})\right]^T \sim \mathcal{N}\left(\mathbf{\mu}, K\right)\] \[K_{ij} = k(\mathbf{x_i},\mathbf{x_j}, \theta)\]
- Predictions \(f_*\) at new point(s) \(\mathbf{x}^*\) with noise \(\sigma\) \[f_* \sim \mathcal{N}\left(\bar{f_*}, K_{f_*, f_*}\right)\] \[\bar{f_*} = \mu_{X_*} + K_{X_*, X}\left[K_{X,X}+\sigma^2 I\right]^{-1} \mathbf{y}\] \[K_{f_*,f_*} = K_{X_*, X_*} - K_{X_*, X}\left[K_{X,X}+\sigma^2 I\right]^{-1} K_{X,X_*}\]