3.2.4 Multiple Outputs – Madiyar’s Page

Suppose we have multiple outputs $Y_1, Y_2, ..., Y_k$ that we wish to predict from our inputs $X_0, X_1, ..., X_p$. We assume a linear model for each output (3.34, 3.35):

\[ \begin{align} Y_k &= \beta_{0k} + \sum_{j=1}^{p} X_j\beta_{jk} + \varepsilon_k\\ &= f_k(X) + \varepsilon_k \end{align} \]

We can write the model in matrix notation:

\[ \mathbf{Y} = \mathbf{XB} + \mathbf{E} \]

Here Y is the $NK $ response matrix, X is the $N\times(p+1)$ input matrix, B is the $(p+1)\times K$ matrix and E is the $N\times K$ matrix of errors.

A straightforward generalization of the univariate loss functio is (3.37, 3.38): \[ \begin{align} RSS(\mathbf{B}) &= \sum_{k=1}^K{\sum_{i=1}^N (y_{ik} - f_k(x_i))^2}\\ &= tr\left[(\mathbf{Y}-\mathbf{XB})^T(\mathbf{Y}-\mathbf{XB})\right] \end{align} \]

The least squares estimates is (3.39): \[\hat{\mathbf{B}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Y}\]

If the errors $\varepsilon=(\varepsilon_1,...,\varepsilon_K)$ are correlated, then (3.40):

\[ RSS(B; \Sigma)=\sum_{i=1}^N(y_i - f(x_i))^T\Sigma^{-1}(y_i - f(x_i)) \]

arises from multivariate Gaussian theory.