Tese Mestrado
Learning Dynamics of Neural Networks
Miguel José De Azevedo Moreira
This thesis investigates the learning dynamics of neural networks through a combination of statistical mechanical and dynamical systems tools, within the controlled setting of the teacher–student framework. We use this setup by introducing a minimal student model trained to reproduce the outputs of a fixed teacher network via Stochastic Gradient Descent on a mean-squared error loss.
By analyzing the Hessian of the loss function, we characterize the local curvature of the landscape at and near optimal points, revealing how overparameterization and activation function choice shape the spectrum of eigenvalues and, hence, the rates of convergence. To probe transient chaotic behavior during training, we compute Local Lyapunov Spectra and observe that, even in low-dimensional teacher-student tasks, there is local chaoticity that can theoretically lead to exponentially diverging parameter trajectories before they settle into stable minima or flat manifolds.
Principal Component Analysis of these trajectories further uncovers a marked reduction in effective dimensionality over the course of training, with the majority of variance confined to leading modes that coincide with directions of minimal curvature in the Hessian.
Finally, when extending our framework to a network learning from a dataset on the MNIST classification task, we find that entropy measures derived from positive Local Lyapunov Exponents do not correlate with generalization performance, highlighting the need for alternative complexity metrics in realistic, high-dimensional settings.