Quiz 1 Solution (StatComp 2021/22)

Deriving the Hessian

Define the individual terms as $f_k=y_k\log(1+e^{-\eta_k}) + (N-y_k)\log(1+e^{\eta_k})$ , so that $f=\sum_{k=1}^n f_k$ . We know that $\partial\eta_k/\partial\theta_1\equiv 1$ and $\partial\eta_k/\partial\theta_2=k$ . Then the derivatives with respect to $\theta_1$ and $\theta_2$ can be obtain with the chain rule: $\begin{align*} \frac{df_k}{d\theta_i} &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial f_k}{\partial\eta_k} \\ &= \frac{\partial \eta_k}{\partial \theta_i} \left[ y_k \frac{-e^{-\eta_k}}{1+e^{-\eta_k}} + (N-y_k) \frac{e^{\eta_k}}{1+e^{\eta_k}} \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \left[ y_k \frac{-e^{-\eta_k/2}}{e^{\eta_k/2}+e^{-\eta_k/2}} + (N-y_k) \frac{e^{\eta_k/2}}{e^{-\eta_k/2}+e^{\eta_k/2}} \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \left[ - y_k \frac{e^{-\eta_k/2}+e^{\eta_k/2}}{e^{\eta_k/2}+e^{-\eta_k/2}} + N \frac{1}{e^{-\eta_k}+1} \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \left[ \frac{N}{e^{-\eta_k}+1} - y_k \right] \\ \end{align*}$ Since the derivatives of $\eta_k$ with respect to $\theta_1$ and $\theta_2$ do not depend on the values of $\theta_1$ and $\theta_2$ , the second order derivatives are $\begin{align*} \frac{d^2f_k}{d\theta_i d\theta_j} &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{\partial}{\partial\eta_k}\left[ \frac{N}{e^{-\eta_k}+1} - y_k \right] \\ &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{N e^{-\eta_k}}{(e^{-\eta_k}+1)^2} \\ &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{N}{(e^{-\eta_k/2}+e^{\eta_k/2})^2} \\ &= \frac{\partial\eta_k}{\partial\theta_i} \frac{\partial\eta_k}{\partial\theta_j} \frac{N}{4\cosh(\eta_k/2)^2} \\ \end{align*}$ Plugging in the $\theta$ -derivatives gives the Hessain contribution for term $f_k$ as $\frac{N}{4\cosh(\eta_k/2)^2} \begin{bmatrix}1 & k \\ k & k^2\end{bmatrix} .$

Positive definite Hessian

To show that the total Hessian for $f$ is positive definite for $n \geq 2$ , we define vectors $\boldsymbol{u}_k=\begin{bmatrix}1 \\ k\end{bmatrix}$ and $d_k=\frac{N}{4\cosh(\eta_k/2)^2}$ . Define the 2- $n$ matrix $\boldsymbol{U}=\begin{bmatrix}\boldsymbol{u}_1 & \boldsymbol{u}_2 & \cdots & \boldsymbol{u}_n\end{bmatrix}$ and a diagonal matrix $\boldsymbol{D}$ with $D_{ii}=d_i$ . The product $\boldsymbol{U}\boldsymbol{D}\boldsymbol{U}^\top$ is then another way of writing the Hessian for $f$ . Since all the vectors $\boldsymbol{u}_k$ are non-parallel, and the $d_i$ are strictly positive for all combinations of $\theta_1$ and $\theta_2$ , this matrix has full rank (rank 2) for $n \geq 2$ , and is positive definite.

Alternative reasoning

The Hessian for each $f_k$ is positive semi-definite, since it can be written $d_k\boldsymbol{u}_k\boldsymbol{u}_k^\top$ for some $d_k > 0$ and vector $\boldsymbol{u}_k$ . The sum of a positive definite matrix and a positive semi-definite matrix is positive definite, so it’s sufficient to prove that the sum of the first two terms is positive definite. For any positive scaling constant $w$ , the determinant of $\begin{bmatrix}1 & 1\\1 & 1\end{bmatrix}+w\begin{bmatrix}1 & 2\\2 & 4\end{bmatrix}$ is $(1+w)(1+4w)-(1+2w)^2=1+5w+4w^2-1-4w-4w^2=w > 0$ . this means that any (positively) weighted sum of those two matrices is positive definite (since each is positive semi-definite, and a positive determinant rules ot the sum being only positive semi-definite). This proves that the total Hessian is positive definite for $n\geq 2$ .

Remark

Note that in the first proof of positive definiteness, we didn’t actually need to know the specific values of the $\boldsymbol{u}_k$ vectors, that are proportional to the gradients fo $\eta_k$ with respect to $\boldsymbol{\theta}=(\theta_1,\theta_2)$ ; it was sufficient that they were non-parallel, ensuring that $\boldsymbol{U}$ had full rank. This means that any linear model for $\eta_k$ in a set of parameters $\boldsymbol{\theta}=(\theta_1,\dots,\theta_p)$ leads to a positive definite Hessian for this model, if the collection of gradient vectors of $(\eta_1,\dots,\eta_n)$ with respect to $\boldsymbol{\theta}$ have collective rank at least $p$ .

Finn Lindgren

Deriving the Hessian

Positive definite Hessian

Alternative reasoning

Remark