# Universal approximation theorem

Universal approximation theorem (Redirected from Cybenko theorem) Jump to navigation Jump to search In the mathematical theory of artificial neural networks, universal approximation theorems are results[1][2] that establish the density of an algorithmically generated class of functions within a given function space of interest. Typically, these results concern the approximation capabilities of the feedforward architecture on the space of continuous functions between two Euclidean spaces, and the approximation is with respect to the compact convergence topology.

However, there are also a variety of results between non-Euclidean spaces[3] and other commonly used architectures and, more generally, algorithmically generated sets of functions, such as the convolutional neural network (CNN) architecture,[4][5] radial basis-functions,[6] or neural networks with specific properties.[7] Most universal approximation theorems can be parsed into two classes. The first quantifies the approximation capabilities of neural networks with an arbitrary number of artificial neurons ("arbitrary width" case) and the second focuses on the case with an arbitrary number of hidden layers, each containing a limited number of artificial neurons ("arbitrary depth" case).

Universal approximation theorems imply that neural networks can represent a wide variety of interesting functions when given appropriate weights. On the other hand, they typically do not provide a construction for the weights, but merely state that such a construction is possible.

Contents 1 History 2 Arbitrary-width case 3 Arbitrary-depth case 4 Graph input 5 Quantum computing 6 See also 7 References History One of the first versions of the arbitrary width case was proven by George Cybenko in 1989 for sigmoid activation functions.[8] Kurt Hornik, Maxwell Stinchcombe, and Halbert White showed in 1989 that multilayer feed-forward networks with as few as one hidden layer are universal approximators.[1] Hornik also showed in 1991[9] that it is not the specific choice of the activation function but rather the multilayer feed-forward architecture itself that gives neural networks the potential of being universal approximators. Moshe Leshno et al in 1993[10] and later Allan Pinkus in 1999[11] showed that the universal approximation property is equivalent to having a nonpolynomial activation function.

The arbitrary depth case was also studied by a number of authors, such as Zhou Lu et al in 2017,[12] Boris Hanin and Mark Sellke in 2018,[13] and Patrick Kidger and Terry Lyons in 2020.[14] The result minimal width per layer was refined in 2020[15][16] for residual networks.

Several extensions of the theorem exist, such as to discontinuous activation functions,[10] noncompact domains,[14] certifiable networks,[17] random neural networks,[18] and alternative network architectures and topologies.[14][19] Arbitrary-width case The classical form of the universal approximation theorem for arbitrary width and bounded depth is as follows.[8][9][20][21] It extends[11] the classical results of George Cybenko and Kurt Hornik.

Universal approximation theorem: Let {displaystyle C(X,Y)} denote the set of continuous functions from {displaystyle X} to {displaystyle Y} . Let {displaystyle sigma in C(mathbb {R} ,mathbb {R} )} . Note that {displaystyle (sigma circ x)_{i}=sigma (x_{i})} , so {displaystyle sigma circ x} denotes {displaystyle sigma } applied to each component of {displaystyle x} .

Then {displaystyle sigma } is not polynomial if and only if for every {displaystyle nin mathbb {N} } , {displaystyle min mathbb {N} } , compact {displaystyle Ksubseteq mathbb {R} ^{n}} , {displaystyle fin C(K,mathbb {R} ^{m}),varepsilon >0} there exist {displaystyle kin mathbb {N} } , {displaystyle Ain mathbb {R} ^{ktimes n}} , {displaystyle bin mathbb {R} ^{k}} , {displaystyle Cin mathbb {R} ^{mtimes k}} such that {displaystyle sup _{xin K}|f(x)-g(x)|

Together, the central result of [14] yields the following universal approximation theorem for networks with bounded width.

Universal approximation theorem (non-affine activation, arbitrary depth, constrained width). Let {displaystyle {mathcal {X}}} be a compact subset of {displaystyle mathbb {R} ^{d}} . Let {displaystyle sigma :mathbb {R} to mathbb {R} } be any non-affine continuous function which is continuously differentiable at at least one point, with nonzero derivative at that point. Let {displaystyle {mathcal {N}}_{d,D:d+D+2}^{sigma }} denote the space of feed-forward neural networks with {displaystyle d} input neurons, {displaystyle D} output neurons, and an arbitrary number of hidden layers each with {displaystyle d+D+2} neurons, such that every hidden neuron has activation function {displaystyle sigma } and every output neuron has the identity as its activation function, with input layer {displaystyle phi } , and output layer {displaystyle rho } . Then given any {displaystyle varepsilon >0} and any {displaystyle fin C({mathcal {X}},mathbb {R} ^{D})} , there exists {displaystyle {hat {f}}in {mathcal {N}}_{d,D:d+D+2}^{sigma }} such that {displaystyle sup _{xin {mathcal {X}}},left|{hat {f}}(x)-f(x)right|

Si quieres conocer otros artículos parecidos a **Universal approximation theorem** puedes visitar la categoría **Artificial neural networks**.

Deja una respuesta