diff --git a/exercises/exercise1.tex b/exercises/exercise1.tex index e22529d3948ac5b673d5947bb73ea7f6a2b03205..455c4b5244712bb57086574222c7f7b65c261647 100644 --- a/exercises/exercise1.tex +++ b/exercises/exercise1.tex @@ -331,6 +331,10 @@ When gradients become extremely small during the backpropogation process, the pa When the value is $0.25$, frequent multiplication of sigmoid activation results in gradients that tend to become extraordinary small, approaching $0$. This phenomenon, known as gradient saturation, particularly effects sigmoid activation when used across multiple layers in deep neural networks. For instance, if you have a deep neural network with multiple sigmoid activation layers, the repeated multiplications of small gradients can lead to the vanishing gradient problem hindering the effective training of the model. \subsection{What is meant by exploding gradients? Why do we not want the gradients to explode? When can sigmoid activations have an exploding gradient?} - +\begin{itemize} + \item Exploding gradient is the oppsite of vanishing gradient where gradient become too large.To be more precise exploding gradients are a problem where large error gradients accumulate and result in very large updates to neural network model weights during training. + \item Large change in weights at each iteration prevent gradient descent from finding minima. + \item Sigmoid can cause exploding gradient when intial weight are very large. +\end{itemize} \end{document}