Ard Louis: Simplicity Bias in Machine Learning

Описание к видео Ard Louis: Simplicity Bias in Machine Learning

Talk given by Ard Louis to the Formal Languages and Neural Networks discord on the 19th of September 2022. Thank you Ard!

Papers and resources mentioned during the talk and following discussion:
Main Paper:
Deep learning generalizes because the parameter-function map is biased towards simple functions (Valle-Pérez et al, 2019): https://arxiv.org/abs/1805.08522

Slides:
A meeting with Enrico Fermi (Dyson, 2004): https://www.nature.com/articles/427297a
Drawing an elephant with four complex parameters (Mayer et al, 2010): https://aapt.scitation.org/doi/10.111...
Approximating Continuous Functions by ReLU Nets of Minimal Width (Hanin and Sellke, 2018): https://arxiv.org/abs/1710.11278
Understanding deep learning requires rethinking generalization (Zhang et al, 2017): https://arxiv.org/abs/1611.03530
Approximation by superpositions of a sigmoidal function (Cybenko, 1989): https://link.springer.com/article/10....
Approximation capabilities of multilayer feedforward networks (Hornik, 1991): https://www.sciencedirect.com/science...
Reflections After Refereeing Papers for NIPS (Breiman, 1995) https://www.gwern.net/docs/ai/scaling...
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation (Belkin, 2021): https://arxiv.org/abs/2105.14368
To understand deep learning we need to understand kernel learning (Belkin et al, 2018): https://arxiv.org/abs/1802.01396
Double-descent curves in neural networks: a new perspective using Gaussian processes (El Harzli et al, 2022): https://arxiv.org/abs/2102.07238
Deep learning generalizes because the parameter-function map is biased towards simple functions (Valle-Pérez et al, 2019): https://arxiv.org/abs/1805.08522
Neural networks are a priori biased towards Boolean functions with low entropy (Mingard et al, 2020): https://arxiv.org/abs/1909.11522
Unbounded Spigot Algorithms for the Digits of Pi (Gibbons, 2005) https://www.cs.ox.ac.uk/people/jeremy...
The Limits of Understanding (World Science Festival, 2014):    • The Limits of Understanding  
A Preliminary Report on a General Theory
of Inductive Inference (Solomonoff, 1960): https://citeseerx.ist.psu.edu/viewdoc...
Laws of Information Conservation (Nongrowth) and Aspects of the Foundation of Probability Theory (Levin, 1974): http://alexander.shen.free.fr/library...
Input–output maps are strongly biased towards simple outputs (Dingle et al, 2018): https://www.nature.com/articles/s4146...
Generic predictions of output probability based on complexities of inputs and outputs (Kamaludin et al, 2020): https://pubmed.ncbi.nlm.nih.gov/32157...
Biologie : au cœur de l’évolution, un processus algorithmique favoriserait les formes « simples » (Laurens, 2022): https://www.lemonde.fr/sciences/artic...
Life’s Preference for Symmetry Is Like ‘A New Law of Nature’ (Golembiewski, 2022): https://www.nytimes.com/2022/03/24/sc...
Symmetry and simplicity spontaneously emerge from the algorithmic nature of evolution (Johnston et al, 2022): https://www.pnas.org/doi/10.1073/pnas...
Deep Information Propagation (Schoenholz et al, 2017): https://arxiv.org/abs/1611.01232
A Fine-Grained Spectral Perspective on Neural Networks (Yang and Salman, 2020): https://arxiv.org/abs/1907.10599
Some PAC-Bayesian Theorems (McAllester, 1999): https://link.springer.com/article/10....
Generalization for deep learning (Valle-Pérez and Louis, 2020): https://arxiv.org/abs/2012.04115
Is SGD a Bayesian sampler? Well, almost (Mingard et al, 2020): https://arxiv.org/abs/2006.15191
The Arrival of the Frequent: How Bias in Genotype-Phenotype Maps Can Steer Populations to Local Optima (Schaper and Louis, 2014): https://journals.plos.org/plosone/art...
A Closer Look at Memorization in Deep Networks (Arpit et al, 2017): https://arxiv.org/abs/1706.05394

From the discussion after:
Coding-theorem Like Behaviour and Emergence of the Universal Distribution from Resource-bounded Algorithmic Probability (Zenil et al, 2018): https://arxiv.org/abs/1711.01711
On the Practical Computational Power of Finite Precision RNNs for Language Recognition (Weiss et al, 2018): https://arxiv.org/abs/1805.04908
A Formal Hierarchy of RNN Architectures (Merrill et al, 2020): https://aclanthology.org/2020.acl-mai...
DisturbLabel: Regularizing CNN on the Loss Layer (Xie et al, 2016): https://arxiv.org/abs/1605.00055

Комментарии

Информация по комментариям в разработке