If recently we used best subset as a way of reducing the unnecessary model complexity, this time we are going to use the Ridge regression technique.
Both the lasso and ridge regression are called shrinkage methods. The best subset method uses least squares to fit a model with a subset of predictors. Alternatively, shrinkage methods use all predictors but constraining and regularising them towards zero. One major difference between them, is that ridge will end up using all the predictors, while the lasso shrink some of them up to the point of making them zero.
Again we will use the classic
swiss data set provided with R datasets.
And again we are interested in predicting infant mortality of an hypotetical commune using a multi-linear model. In the previous post we could see a quick exploratory analysis of the correlation between the different variables.
glmnet package provides methods to perform ridge regression and the
lasso. The main function in the package is
glmnet(). This function has
a different syntax from other model-fitting functions in R. This time we must
pass in an
x matrix as well as a
y vector, and we do not use the familiar
y ∼ x syntax.
A quick look at the first rows of the matrix shows that basically contains values for the 5 predictors in each of the comunes.
glmnet() function takes an
alpha argument that determines what method is
alpha=0 then ridge regression is used, while if
alpha=1 then the
lasso is used. We will start with the former.
By default the
glmnet function performs ridge regression for an automatically
selected range of λ values (the shrinkage coefficient). The values are based on
lambda.min.ratio. Associated with each value of λ is a vector
of regression coefficients. For example, the 100th value of λ, a very small
one, is closer to perform least squares:
While the 1st one is the null model containing just the intercept, due to the shrinkage of all the predictor coefficients:
But it would be better to use cross-validation to choose λ. We can do this using
cv.glmnet. By default, the function performs ten-fold cross-validation:
Once we have the best lambda, we can use
predict to obtain the coefficients.
Next time, the lasso.