본문 바로가기

AI & Robotics

deep learning이 잘 돌아가는 이유

"Deep learning was in some ways an accidental discovery,"

"We still do not understand why it works. A theoretical framework is taking form, and I believe that we are now close to a satisfactory theory. It is time to stand back and review recent insights."

Our current era is marked by a superabundance of data—data from inexpensive sensors of all types, text, the internet, and large amounts of genomic data being generated in the life sciences. Computers nowadays ingest these multidimensional datasets, creating a set of problems dubbed the "curse of dimensionality" by the late mathematician Richard Bellman.

One of these problems is that representing a smooth, high-dimensional function requires an astronomically large number of parameters. We know that deep neural networks are particularly good at learning how to represent, or approximate, such complex data, but why? Understanding why could potentially help advance deep learning applications.

"Deep learning is like electricity after Volta discovered the battery, but before Maxwell," explains Poggio, who is the founding scientific advisor of The Core, MIT Quest for Intelligence, and an investigator in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. "Useful applications were certainly possible after Volta, but it was Maxwell's theory of electromagnetism, this deeper understanding that then opened the way to the radio, the TV, the radar, the transistor, the computers, and the internet."

https://www.youtube.com/watch?v=IIjI87Z9C7Y

A theory of deep learning that explains why and how deep networks work, and what their limitations are, will likely allow development of even much more powerful learning approaches.

The theoretical treatment by Poggio, Andrzej Banburski, and Qianli Liao points to why deep learning might overcome data problems such as "the curse of dimensionality." Their approach starts with the observation that many natural structures are hierarchical. To model the growth and development of a tree doesn't require that we specify the location of every twig. Instead, a model can use local rules to drive branching hierarchically. The primate visual system appears to do something similar when processing complex data. When we look at natural images—including trees, cats, and faces—the brain successively integrates local image patches, then small collections of patches, and then collections of collections of patches.

"The physical world is compositional—in other words, composed of many local physical interactions,"

The intuition is that a hierarchical neural network should be better at approximating a compositional function than a single "layer" of neurons, even if the total number of neurons is the same.

There is a second puzzle about what is sometimes called the unreasonable effectiveness of deep networks. Deep network models often have far more parameters than data to fit them, despite the mountains of data we produce these days. This situation ought to lead to what is called "overfitting," where your current data fit the model well, but any new data fit the model terribly. This is dubbed poor generalization in conventional models. The conventional solution is to constrain some aspect of the fitting procedure. However, deep networks do not seem to require this constraint. Poggio and his colleagues prove that, in many cases, the process of training a deep network implicitly "regularizes" the solution, providing constraints.

 

 

reference

Tomaso Poggio et al. Theoretical issues in deep networks, Proceedings of the National Academy of Sciences (2020). DOI: 10.1073/pnas.1907369117