1.1: Logistic Regression: Gradients
So today I was waiting for an experiment to finish and since it ususally take a couple of hours to finish one of these heafty simulations so I decided to revist some of the concepts from a couple of years ago. While browsing through my textbook I came across an excercise that I could easily do with my eyes closed during my taught master days but out of context today, I felt that it has become somewhat difficult to do these simple derivations.
As the title suggests, the exercise was to derive the expression of the gradient of a two class binary logistic regression model. So without any more blabbering, let
\[\text{NLL(w)}= - \sum_{i=1}^{N} [ y_{i} \log(u_{i}) + (1-y_{i}) \log (1-u_{i}) ]\]be the negative log-likelihood of the logistic regression, where \(w\) is the weight, \(u_{i} = \text{sigmoid}(w^{T}x)\) and \(y_{i} \in \{0,1\}\).
For calculating the gradient of the NLL we will need the derivative of the sigmoid function.
\[\frac{d (\text{sigmoid}(x))}{dx} = \text{sigmoid}(x)(1 - \text{sigmoid}(x))\]Now the gradient of the NLL simply becomes,
\[\sum_{i=1}^{N} [ \frac{y_{i}}{u_{i}} - \frac{1}{1-u_{i}} + \frac{y_{i}}{1-u_{i}} ]\frac{d (\text{sigmoid}(w^{T}x))}{dw}\]And we know that
\[\frac{d (\text{sigmoid}(w^{T}x))}{dw} =\text{sigmoid}(w^{T}x)(1 - \text{sigmoid}(w^{T}x))x\]Therefore,
\[\frac{\partial \text{NLL(w)} }{\partial w} = \sum_{i=i}^{N} (u_{i} -y_{i} )x\]In case you were interested in the notebook version of this Click Here