Skip to contents

Deriving the Hessian

Define the individual terms as fk=yklog(1+eηk)+(Nyk)log(1+eηk)f_k=y_k\log(1+e^{-\eta_k}) + (N-y_k)\log(1+e^{\eta_k}), so that f=k=1nfkf=\sum_{k=1}^n f_k. We know that ηk/θ11\partial\eta_k/\partial\theta_1\equiv 1 and ηk/θ2=k\partial\eta_k/\partial\theta_2=k. Then the derivatives with respect to θ1\theta_1 and θ2\theta_2 can be obtain with the chain rule: dfkdθi=ηkθifkηk=ηkθi[ykeηk1+eηk+(Nyk)eηk1+eηk]=ηkθi[ykeηk/2eηk/2+eηk/2+(Nyk)eηk/2eηk/2+eηk/2]=ηkθi[ykeηk/2+eηk/2eηk/2+eηk/2+N1eηk+1]=ηkθi[Neηk+1yk]\begin{align*} \frac{df_k}{d\theta_i} &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial f_k}{\partial\eta_k} \\ &= \frac{\partial \eta_k}{\partial \theta_i} \left[ y_k \frac{-e^{-\eta_k}}{1+e^{-\eta_k}} + (N-y_k) \frac{e^{\eta_k}}{1+e^{\eta_k}} \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \left[ y_k \frac{-e^{-\eta_k/2}}{e^{\eta_k/2}+e^{-\eta_k/2}} + (N-y_k) \frac{e^{\eta_k/2}}{e^{-\eta_k/2}+e^{\eta_k/2}} \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \left[ - y_k \frac{e^{-\eta_k/2}+e^{\eta_k/2}}{e^{\eta_k/2}+e^{-\eta_k/2}} + N \frac{1}{e^{-\eta_k}+1} \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \left[ \frac{N}{e^{-\eta_k}+1} - y_k \right] \\ \end{align*} Since the derivatives of ηk\eta_k with respect to θ1\theta_1 and θ2\theta_2 do not depend on the values of θ1\theta_1 and θ2\theta_2, the second order derivatives are d2fkdθidθj=ηkθiηkθjηk[Neηk+1yk]=ηkθiηkθjNeηk(eηk+1)2=ηkθiηkθjN(eηk/2+eηk/2)2=ηkθiηkθjN4cosh(ηk/2)2\begin{align*} \frac{d^2f_k}{d\theta_i d\theta_j} &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{\partial}{\partial\eta_k}\left[ \frac{N}{e^{-\eta_k}+1} - y_k \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{N e^{-\eta_k}}{(e^{-\eta_k}+1)^2} \\ &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{N}{(e^{-\eta_k/2}+e^{\eta_k/2})^2} \\ &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{N}{4\cosh(\eta_k/2)^2} \\ \end{align*} Plugging in the θ\theta-derivatives gives the Hessain contribution for term fkf_k as N4cosh(ηk/2)2[1kkk2]. \frac{N}{4\cosh(\eta_k/2)^2} \begin{bmatrix}1 & k \\ k & k^2\end{bmatrix} .

Positive definite Hessian

To show that the total Hessian for ff is positive definite for n2n \geq 2, we define vectors 𝐮k=[1k]\boldsymbol{u}_k=\begin{bmatrix}1 \\ k\end{bmatrix} and dk=N4cosh(ηk/2)2d_k=\frac{N}{4\cosh(\eta_k/2)^2}. Define the 2-nn matrix 𝐔=[𝐮1𝐮2𝐮n]\boldsymbol{U}=\begin{bmatrix}\boldsymbol{u}_1 & \boldsymbol{u}_2 & \cdots & \boldsymbol{u}_n\end{bmatrix} and a diagonal matrix 𝐃\boldsymbol{D} with Dii=diD_{ii}=d_i. The product 𝐔𝐃𝐔\boldsymbol{U}\boldsymbol{D}\boldsymbol{U}^\top is then another way of writing the Hessian for ff. Since all the vectors 𝐮k\boldsymbol{u}_k are non-parallel, and the did_i are strictly positive for all combinations of θ1\theta_1 and θ2\theta_2, this matrix has full rank (rank 2) for n2n \geq 2, and is positive definite.

Alternative reasoning

The Hessian for each fkf_k is positive semi-definite, since it can be written dk𝐮k𝐮kd_k\boldsymbol{u}_k\boldsymbol{u}_k^\top for some dk>0d_k > 0 and vector 𝐮k\boldsymbol{u}_k. The sum of a positive definite matrix and a positive semi-definite matrix is positive definite, so it’s sufficient to prove that the sum of the first two terms is positive definite. For any positive scaling constant ww, the determinant of [1111]+w[1224] \begin{bmatrix}1 & 1\\1 & 1\end{bmatrix}+w\begin{bmatrix}1 & 2\\2 & 4\end{bmatrix} is (1+w)(1+4w)(1+2w)2=1+5w+4w214w4w2=w>0(1+w)(1+4w)-(1+2w)^2=1+5w+4w^2-1-4w-4w^2=w > 0. this means that any (positively) weighted sum of those two matrices is positive definite (since each is positive semi-definite, and a positive determinant rules ot the sum being only positive semi-definite). This proves that the total Hessian is positive definite for n2n\geq 2.

Remark

Note that in the first proof of positive definiteness, we didn’t actually need to know the specific values of the 𝐮k\boldsymbol{u}_k vectors, that are proportional to the gradients fo ηk\eta_k with respect to 𝛉=(θ1,θ2)\boldsymbol{\theta}=(\theta_1,\theta_2); it was sufficient that they were non-parallel, ensuring that 𝐔\boldsymbol{U} had full rank. This means that any linear model for ηk\eta_k in a set of parameters 𝛉=(θ1,,θp)\boldsymbol{\theta}=(\theta_1,\dots,\theta_p) leads to a positive definite Hessian for this model, if the collection of gradient vectors of (η1,,ηn)(\eta_1,\dots,\eta_n) with respect to 𝛉\boldsymbol{\theta} have collective rank at least pp.