If recently we used *best subset*
as a way of reducing the unnecessary model complexity, this time we are going
to use the *Ridge regression* technique.

Both the *lasso* and *ridge regression* are called shrinkage methods. The best
subset method uses least squares to fit a model with a subset of predictors.
Alternatively, shrinkage methods use all predictors but constraining and
regularising them towards zero. One major difference between them, is that
*ridge* will end up using all the predictors, while the *lasso* shrink some of
them up to the point of making them zero.

Again we will use the classic `swiss`

data set provided with R datasets.

And again we are interested in predicting infant mortality of an hypotetical commune using a multi-linear model. In the previous post we could see a quick exploratory analysis of the correlation between the different variables.

The `glmnet`

package provides methods to perform *ridge regression* and the
*lasso*. The main function in the package is `glmnet()`

. This function has
a different syntax from other model-fitting functions in R. This time we must
pass in an `x`

matrix as well as a `y`

vector, and we do not use the familiar
`y ∼ x`

syntax.

A quick look at the first rows of the matrix shows that basically contains values for the 5 predictors in each of the comunes.

The `glmnet()`

function takes an `alpha`

argument that determines what method is
used. If `alpha=0`

then *ridge regression* is used, while if `alpha=1`

then the
*lasso* is used. We will start with the former.

By default the `glmnet`

function performs *ridge regression* for an automatically
selected range of λ values (the shrinkage coefficient). The values are based on
`nlambda`

and `lambda.min.ratio`

. Associated with each value of λ is a vector
of regression coefficients. For example, the 100th value of λ, a very small
one, is closer to perform least squares:

While the 1st one is the null model containing just the intercept, due to the shrinkage of all the predictor coefficients:

But it would be better to use cross-validation to choose λ. We can do this using
`cv.glmnet`

. By default, the function performs ten-fold cross-validation:

Once we have the best lambda, we can use `predict`

to obtain the coefficients.

Next time, the *lasso*.