Training solution

Optimizing wrt to th eunknown paramters \( \theta_j \) we get

$$ \boldsymbol{X}^T\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{X}\boldsymbol{\theta}, $$

and if the matrix \( \boldsymbol{X}^T\boldsymbol{X} \) is invertible we have the optimal values

$$ \hat{\boldsymbol{\theta}} =\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}. $$