Notes

19 Dec 2015

1.1: Logistic Regression: Gradients

So today I was waiting for an experiment to finish and since it ususally take a couple of hours to finish one of these heafty simulations so I decided to revist some of the concepts from a couple of years ago. While browsing through my textbook I came across an excercise that I could easily do with my eyes closed during my taught master days but out of context today, I felt that it has become somewhat difficult to do these simple derivations.

As the title suggests, the exercise was to derive the expression of the gradient of a two class binary logistic regression model. So without any more blabbering, let

\[\text{NLL(w)}= - \sum_{i=1}^{N} [ y_{i} \log(u_{i}) + (1-y_{i}) \log (1-u_{i}) ]\]

be the negative log-likelihood of the logistic regression, where \(w\) is the weight, \(u_{i} = \text{sigmoid}(w^{T}x)\) and \(y_{i} \in \{0,1\}\).

For calculating the gradient of the NLL we will need the derivative of the sigmoid function.

\[\frac{d (\text{sigmoid}(x))}{dx} = \text{sigmoid}(x)(1 - \text{sigmoid}(x))\]

Now the gradient of the NLL simply becomes,

\[\sum_{i=1}^{N} [ \frac{y_{i}}{u_{i}} - \frac{1}{1-u_{i}} + \frac{y_{i}}{1-u_{i}} ]\frac{d (\text{sigmoid}(w^{T}x))}{dw}\]

And we know that

\[\frac{d (\text{sigmoid}(w^{T}x))}{dw} =\text{sigmoid}(w^{T}x)(1 - \text{sigmoid}(w^{T}x))x\]

Therefore,

\[\frac{\partial \text{NLL(w)} }{\partial w} = \sum_{i=i}^{N} (u_{i} -y_{i} )x\]

In case you were interested in the notebook version of this Click Here