My thesis investigates the “large depth degeneracy” phenomenon in deep neural networks, where very deep networks have a hard time distinguishing between inputs on initialization. This degeneracy occurs because inputs tend to get more correlated layer-by-layer as they travel through the network, so networks with many layers may send all inputs to effectively the same output. My thesis develops an accurate method to predict the distribution of the angle between two inputs at any layer of an initialized network. We then use our predictions to demonstrate how this type of degeneracy can negatively affect the training performance of the network. Along the way, we develop an explicit formula for the joint moments of the ReLU function applied to correlated Gaussian variables.
My thesis can also be found on the University of Guelph Atrium website.
Sample code used to produce Figure 1.1 can be found at this link.
Sample code used to produce Figure 3.1 can be found at this link.
