# Reproducing kernel Hilbert space

It is not entirely straightforward to construct a Hilbert space of functions which is not an RKHS.[1] Some examples, however, have been found.[2][3] Note that L2 spaces are not Hilbert spaces of functions (and hence not RKHSs), but rather Hilbert spaces of equivalence classes of functions (for example, the functions {displaystyle f} and {displaystyle g} defined by {displaystyle f(x)=0} and {displaystyle g(x)=1_{mathbb {Q} }} are equivalent in L2). However, there are RKHSs in which the norm is an L2-norm, such as the space of band-limited functions (see the example below).

An RKHS is associated with a kernel that reproduces every function in the space in the sense that for every {displaystyle x} in the set on which the functions are defined, "evaluation at {displaystyle x} " can be performed by taking an inner product with a function determined by the kernel. Such a reproducing kernel exists if and only if every evaluation functional is continuous.

The reproducing kernel was first introduced in the 1907 work of Stanisław Zaremba concerning boundary value problems for harmonic and biharmonic functions. James Mercer simultaneously examined functions which satisfy the reproducing property in the theory of integral equations. The idea of the reproducing kernel remained untouched for nearly twenty years until it appeared in the dissertations of Gábor Szegő, Stefan Bergman, and Salomon Bochner. The subject was eventually systematically developed in the early 1950s by Nachman Aronszajn and Stefan Bergman.[4] These spaces have wide applications, including complex analysis, harmonic analysis, and quantum mechanics. Reproducing kernel Hilbert spaces are particularly important in the field of statistical learning theory because of the celebrated representer theorem which states that every function in an RKHS that minimises an empirical risk functional can be written as a linear combination of the kernel function evaluated at the training points. This is a practically useful result as it effectively simplifies the empirical risk minimization problem from an infinite dimensional to a finite dimensional optimization problem.

For ease of understanding, we provide the framework for real-valued Hilbert spaces. The theory can be easily extended to spaces of complex-valued functions and hence include the many important examples of reproducing kernel Hilbert spaces that are spaces of analytic functions.[5] Contents 1 Definition 2 Example 3 Moore–Aronszajn theorem 4 Integral operators and Mercer's theorem 5 Feature maps 6 Properties 7 Common examples 7.1 Bilinear kernels 7.2 Polynomial kernels 7.3 Radial basis function kernels 7.4 Bergman kernels 8 Extension to vector-valued functions 9 Connection between RKHS with ReLU function 10 See also 11 Notes 12 References Definition Let {displaystyle X} be an arbitrary set and {displaystyle H} a Hilbert space of real-valued functions on {displaystyle X} , equipped with pointwise addition and pointwise scalar multiplication. The evaluation functional over the Hilbert space of functions {displaystyle H} is a linear functional that evaluates each function at a point {displaystyle x} , {displaystyle L_{x}:fmapsto f(x){text{ }}forall fin H.} We say that H is a reproducing kernel Hilbert space if, for all {displaystyle x} in {displaystyle X} , {displaystyle L_{x}} is continuous at every {displaystyle f} in {displaystyle H} or, equivalently, if {displaystyle L_{x}} is a bounded operator on {displaystyle H} , i.e. there exists some {displaystyle M_{x}>0} such that {displaystyle |L_{x}(f)|:=|f(x)|leq M_{x},|f|_{H}qquad forall fin H.,}         (1) Although {displaystyle M_{x}0} for all {displaystyle i.} One can also show that {displaystyle T_{K}} maps continuously into the space of continuous functions {displaystyle C(X)} and therefore we may choose continuous functions as the eigenvectors, that is, {displaystyle varphi _{i}in C(X)} for all {displaystyle i.} Then by Mercer's theorem {displaystyle K} may be written in terms of the eigenvalues and continuous eigenfunctions as {displaystyle K(x,y)=sum _{j=1}^{infty }sigma _{j},varphi _{j}(x),varphi _{j}(y)} for all {displaystyle x,yin X} such that {displaystyle lim _{nto infty }sup _{u,v}left|K(u,v)-sum _{j=1}^{n}sigma _{j},varphi _{j}(u),varphi _{j}(v)right|=0.} This above series representation is referred to as a Mercer kernel or Mercer representation of {displaystyle K} .

Furthermore, it can be shown that the RKHS {displaystyle H} of {displaystyle K} is given by {displaystyle H=left{fin L_{2}(X),{Bigg vert },sum _{i=1}^{infty }{frac {leftlangle f,varphi _{i}rightrangle _{L_{2}}^{2}}{sigma _{i}}}0} Laplacian kernel: {displaystyle K(x,y)=e^{-{frac {|x-y|}{sigma }}},qquad sigma >0} The squared norm of a function {displaystyle f} in the RKHS {displaystyle H} with this kernel is:[9] {displaystyle |f|_{H}^{2}=int f(x)^{2},dx+int f'(x)^{2},dx.} Bergman kernels We also provide examples of Bergman kernels. Let X be finite and let H consist of all complex-valued functions on X. Then an element of H can be represented as an array of complex numbers. If the usual inner product is used, then Kx is the function whose value is 1 at x and 0 everywhere else, and {displaystyle K(x,y)} can be thought of as an identity matrix since {displaystyle K(x,y)={begin{cases}1&x=y\0&xneq yend{cases}}} In this case, H is isomorphic to {displaystyle mathbb {C} ^{n}} .

The case of {displaystyle X=mathbb {D} } (where {displaystyle mathbb {D} } denotes the unit disc) is more sophisticated. Here the Bergman space {displaystyle H^{2}(mathbb {D} )} is the space of square-integrable holomorphic functions on {displaystyle mathbb {D} } . It can be shown that the reproducing kernel for {displaystyle H^{2}(mathbb {D} )} is {displaystyle K(x,y)={frac {1}{pi }}{frac {1}{(1-x{overline {y}})^{2}}}.} Lastly, the space of band limited functions in {displaystyle L^{2}(mathbb {R} )} with bandwidth {displaystyle 2a} is a RKHS with reproducing kernel {displaystyle K(x,y)={frac {sin a(x-y)}{pi (x-y)}}.} Extension to vector-valued functions In this section we extend the definition of the RKHS to spaces of vector-valued functions as this extension is particularly important in multi-task learning and manifold regularization. The main difference is that the reproducing kernel {displaystyle Gamma } is a symmetric function that is now a positive semi-definite matrix for every {displaystyle x,y} in {displaystyle X} . More formally, we define a vector-valued RKHS (vvRKHS) as a Hilbert space of functions {displaystyle f:Xto mathbb {R} ^{T}} such that for all {displaystyle cin mathbb {R} ^{T}} and {displaystyle xin X} {displaystyle Gamma _{x}c(y)=Gamma (x,y)cin H{text{ for }}yin X} and {displaystyle langle f,Gamma _{x}crangle _{H}=f(x)^{intercal }c.} This second property parallels the reproducing property for the scalar-valued case. We note that this definition can also be connected to integral operators, bounded evaluation functions, and feature maps as we saw for the scalar-valued RKHS. We can equivalently define the vvRKHS as a vector-valued Hilbert space with a bounded evaluation functional and show that this implies the existence of a unique reproducing kernel by the Riesz Representation theorem. Mercer's theorem can also be extended to address the vector-valued setting and we can therefore obtain a feature map view of the vvRKHS. Lastly, it can also be shown that the closure of the span of {displaystyle {Gamma _{x}c:xin X,cin mathbb {R} ^{T}}} coincides with {displaystyle H} , another property similar to the scalar-valued case.

We can gain intuition for the vvRKHS by taking a component-wise perspective on these spaces. In particular, we find that every vvRKHS is isometrically isomorphic to a scalar-valued RKHS on a particular input space. Let {displaystyle Lambda ={1,dots ,T}} . Consider the space {displaystyle Xtimes Lambda } and the corresponding reproducing kernel {displaystyle gamma :Xtimes Lambda times Xtimes Lambda to mathbb {R} .}         (4) As noted above, the RKHS associated to this reproducing kernel is given by the closure of the span of {displaystyle {gamma _{(x,t)}:xin X,tin Lambda }} where {displaystyle gamma _{(x,t)}(y,s)=gamma ((x,t),(y,s))} for every set of pairs {displaystyle (x,t),(y,s)in Xtimes Lambda } .

The connection to the scalar-valued RKHS can then be made by the fact that every matrix-valued kernel can be identified with a kernel of the form of (4) via {displaystyle Gamma (x,y)_{(t,s)}=gamma ((x,t),(y,s)).} Moreover, every kernel with the form of (4) defines a matrix-valued kernel with the above expression. Now letting the map {displaystyle D:H_{Gamma }to H_{gamma }} be defined as {displaystyle (Df)(x,t)=langle f(x),e_{t}rangle _{mathbb {R} ^{T}}} where {displaystyle e_{t}} is the {displaystyle t^{text{th}}} component of the canonical basis for {displaystyle mathbb {R} ^{T}} , one can show that {displaystyle D} is bijective and an isometry between {displaystyle H_{Gamma }} and {displaystyle H_{gamma }} .

While this view of the vvRKHS can be useful in multi-task learning, this isometry does not reduce the study of the vector-valued case to that of the scalar-valued case. In fact, this isometry procedure can make both the scalar-valued kernel and the input space too difficult to work with in practice as properties of the original kernels are often lost.[10][11][12] An important class of matrix-valued reproducing kernels are separable kernels which can factorized as the product of a scalar valued kernel and a {displaystyle T} -dimensional symmetric positive semi-definite matrix. In light of our previous discussion these kernels are of the form {displaystyle gamma ((x,t),(y,s))=K(x,y)K_{T}(t,s)} for all {displaystyle x,y} in {displaystyle X} and {displaystyle t,s} in {displaystyle T} . As the scalar-valued kernel encodes dependencies between the inputs, we can observe that the matrix-valued kernel encodes dependencies among both the inputs and the outputs.

We lastly remark that the above theory can be further extended to spaces of functions with values in function spaces but obtaining kernels for these spaces is a more difficult task.[13] Connection between RKHS with ReLU function The ReLU function is commonly defined as {displaystyle f(x)=max{0,x}} and is a mainstay in the architecture of neural networks where it is used as an activation function. One can construct a ReLU-like nonlinear function using the theory of reproducing kernel Hilbert spaces. Below, we derive this construction and show how it implies the representation power of neural networks with ReLU activations.

We will work with the Hilbert space {displaystyle {mathcal {H}}=L_{2}^{1}(0)[0,infty )} of absolutely continuous functions with {displaystyle f(0)=0} and square integrable (i.e. {displaystyle L_{2}} ) derivative. It has the inner product {displaystyle langle f,grangle _{mathcal {H}}=int _{0}^{infty }f'(x)g'(x),dx.} To construct the reproducing kernel it suffices to consider a dense subspace, so let {displaystyle fin C^{1}[0,infty )} and {displaystyle f(0)=0} . The Fundamental Theorem of Calculus then gives {displaystyle f(y)=int _{0}^{y}f'(x),dx=int _{0}^{infty }G(x,y)f'(x),dx=langle K_{y},frangle } where {displaystyle G(x,y)={begin{cases}1,&x

Si quieres conocer otros artículos parecidos a Reproducing kernel Hilbert space puedes visitar la categoría Hilbert space.

Subir

Utilizamos cookies propias y de terceros para mejorar la experiencia de usuario Más información