Weight decay has been considered and caculated in `SGD` operation, so there's no need to apply weight decay in `SGD` optimizer.