# Let l : R k × R k → R be a smooth convex loss function. We consider the following architecture of a 2-layer neural network (i.e. with a single hidden layer): • Input: composed of n neurons...

Let l : R k × R k → R be a smooth convex loss function. We consider the following architecture of a 2-layer neural network (i.e. with a single hidden layer): • Input: composed of n neurons representing the input x ∈ R n. • Hidden layer: composed of d neurons. It computes first linear combinations of the input via a weight matrix W(1) ∈ Md,n(R) to obtain z (1) = W(1)x ∈ R d then applies the activation function to finally obtain a (1) := fW(1),σ(x) = σb(z (1)). • Output: composed of k neurons modelling k classes. It simply computes linear combinations of a (1) via a weight matrix W(2) ∈ Mk,d(R) to produce the vector hW (x) = a (2) := fW(2) (a (1)) = W(2)a (1) . 1. To train our neural network, we aim to minimise (over W = (W(1), W(2)) and using gradient descent) the global loss function. To achieve this, we need to compute the gradient of the loss function associated to a single fixed labelled training example (x, y) ∈ R n × R k : g : (W(1), W(2)) 7→ l(hW (x), y) := ly(hW (x)). Please follow the computational steps as instructed below and do not quote the formulas obtained in the lecture. (a) Let β2 = (β2,i)1≤i≤k = ∇a(2) ly(a (2)) ∈ R k and φ2 : W(2) 7→ ly(W(2)a (1)). Show that for all 1 ≤ i ≤ k and 1 ≤ j ≤ d, ∂φ2 ∂W(2) ij  W(2) = a (1) j β2,i. Write the identity above in a matrix form. (b) Let β1 = (β1,i)1≤i≤d = ∇a(1) (ly ◦ fW(2) ) (a (1)) ∈ R d . Show that for all 1 ≤ i ≤ d, β1,i = X k j=1 W (2) ji β2,j . Write the identity above in a matrix form. (c) Let φ1 : W(1) 7→ ly ◦ fW(2) ◦ fW(1),σ(x). Show that for all 1 ≤ i ≤ d and 1 ≤ j ≤ n, ∂φ1 ∂W(1) ij  W(1) = β1,iσ ′ (z (1) i )xj . Write the identity above in a matrix form. You may denote ⊙ the component-wise product of vectors (i.e. (a ⊙ b)i = aibi).
Jul 12, 2022
SOLUTION.PDF

## Get Answer To This Question

### Submit New Assignment

Copy and Paste Your Assignment Here