4.4.4 L1 Regularized Logistic Regression

The L1 penalty for logistic regression, we would maximize a penalized version of 4.20 (4.31): \[ \max_{\beta_0, \beta} \left\{ \sum_{i=1}^N \left[y_i(\beta_0+\beta^Tx_i) - \log(1+e^{\beta_0+\beta^Tx_i}) \right] -\gamma\sum_{j=1}^p |\beta_j| \right\} \]

As with the lasso, we do not penalize the intercept term, and standardize the predictors for the penalty. Criterion (4.31) is concave, and a solution can be found using nonlinear programming methods. Alternatively, using the same quadratic approximations that were used in the Newton algorithm, we can solve by repeated application of a weighted lasso algorithm (4.32): \[ \mathbf{x}_j^T(\mathbf{y}-\mathbf{p})=\gamma \cdot sign(\beta_j) \]

Path algorithms such as LAR for lasso are more difficult, because the coefficient profiles are piecewise smooth rather than linear. Nevertheless, progress can be made using quadratice approximations.