nn.ipynb
.nn.ipynb
must be included. Then submit it to the THU's web learning page.Some students are confused about where to put the l2 regularization of linear layers. You probably think about adding the regularization term when defining the MSELoss, however we cannot get the weights of linear layers in loss layer.
Actually, L2 regularization can be put in the backward propagation of linear layer when updating the weights. With L2 regularization term, our final loss function becomes:
Here is the weight of i-th layers. So the gradient of each layer's weight becomes:
The first term is computed through backward propagation, which is what we have done in linear layer. For the second term: . It's only related to the weight of i-th layer. There is no need to do backward propagation for regularization. So it can be put in backward
function when updating the weights.