@@ -331,6 +331,10 @@ When gradients become extremely small during the backpropogation process, the pa
When the value is $0.25$, frequent multiplication of sigmoid activation results in gradients that tend to become extraordinary small, approaching $0$. This phenomenon, known as gradient saturation, particularly effects sigmoid activation when used across multiple layers in deep neural networks. For instance, if you have a deep neural network with multiple sigmoid activation layers, the repeated multiplications of small gradients can lead to the vanishing gradient problem hindering the effective training of the model.
\subsection{What is meant by exploding gradients? Why do we not want the gradients to explode? When can sigmoid activations have an exploding gradient?}
\begin{itemize}
\item Exploding gradient is the oppsite of vanishing gradient where gradient become too large.To be more precise exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training.
\item Large change in weights at each iteration prevent gradient descent from finding minima.
\item Sigmoid can cause exploding gradient when intial weight are very large.