Suppose we have multiple outputs \(Y_1, Y_2, ..., Y_k\) that we wish to predict from our inputs \(X_0, X_1, ..., X_p\). We assume a linear model for each output (3.34, 3.35):
\[ \begin{align} Y_k &= \beta_{0k} + \sum_{j=1}^{p} X_j\beta_{jk} + \varepsilon_k\\ &= f_k(X) + \varepsilon_k \end{align} \]
We can write the model in matrix notation:
\[ \mathbf{Y} = \mathbf{XB} + \mathbf{E} \]
Here Y is the $NK $ response matrix, X is the \(N\times(p+1)\) input matrix, B is the \((p+1)\times K\) matrix and E is the \(N\times K\) matrix of errors.
A straightforward generalization of the univariate loss functio is (3.37, 3.38): \[ \begin{align} RSS(\mathbf{B}) &= \sum_{k=1}^K{\sum_{i=1}^N (y_{ik} - f_k(x_i))^2}\\ &= tr\left[(\mathbf{Y}-\mathbf{XB})^T(\mathbf{Y}-\mathbf{XB})\right] \end{align} \]
The least squares estimates is (3.39): \[\hat{\mathbf{B}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\]
If the errors \(\varepsilon=(\varepsilon_1,...,\varepsilon_K)\) are correlated, then (3.40):
\[ RSS(B; \Sigma)=\sum_{i=1}^N(y_i - f(x_i))^T\Sigma^{-1}(y_i - f(x_i)) \]
arises from multivariate Gaussian theory.