Hi,
eqn (1) : z1(i) = w1x1(i) + w2x2(i) + ... + w12288x12288(i) + b
eqn (2) : ŷ(i) = a1(i) = g(z1(i))
eqn (3) : L(a1(i), y(i))(i) = -y(i)log(a1(i)) - (1 - y(i))log(1 - a1(i))
eqn (4) : J = (Σi=1mL(a1(i), y(i))(i))/m
eqn (6) : ∂w1 = (Σi=1m(a1(i) - y(i))x1(i))/m
The eqn (6) should be applied for all the weights. Therefore,
∂w2 = (Σi=1m(a1(i) - y(i))x2(i))/m ...
∂w12288 = (Σi=1m(a1(i) - y(i))x12288(i))/m
eqn (7) : ∂b = ∂J/∂b = (Σi=1m(a1(i) - y(i)))/m
eqn (8) : w1 = w1 - α*∂w1
The eqn (8) should be applied for all the weights. Therefore,
w2 = w2 - α*∂w2 ...
w12288 = w12288 - α*∂w12288
eqn (9) : b = b - α*∂b
I need to discuss a few points based on my code.
- If you run the code, you will notice the training accuracy and the test accuracy is very low. But the cost is decreasing. The reason is that I am training the network only for a few numbers of iterations (100) which is not enough. You can try it by increasing the number of iterations.
- If you run the code, you will notice that it will take a considerable amount of time to train the network(more than 5mins in my laptop). The reason is that I have used for loops over the code that causes to slow down the execution of the code. In the next blog, I will show you how to speed up the training of the network using Vectorization.
Please feel free to raise any concerns/suggestions on this blog post. Let's meet in the next post.
My previous blogs,
How did I learn Machine Learning : part 1 - Create the coding environment
How did I learn Machine Learning : part 2 - Setup conda environment in PyCharm
How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch I
How did I learn Machine Learning : part 3 - Implement a simple neural network from scratch II
References:
cousera