# Kosambi–Karhunen–Loève theorem

Kosambi–Karhunen–Loève theorem (Redirected from Karhunen–Loève theorem) Jump to navigation Jump to search In the theory of stochastic processes, the Karhunen–Loève theorem (named after Kari Karhunen and Michel Loève), also known as the Kosambi–Karhunen–Loève theorem[1][2] is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. The transformation is also known as Hotelling transform and eigenvector transform, and is closely related to principal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.[3] Stochastic processes given by infinite series of this form were first considered by Damodar Dharmananda Kosambi.[4][5] There exist many such expansions of a stochastic process: if the process is indexed over [a, b], any orthonormal basis of L2([a, b]) yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.

In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.

In the case of a centered stochastic process {Xt}t ∈ [a, b] (centered means E[Xt] = 0 for all t ∈ [a, b]) satisfying a technical continuity condition, X admits a decomposition {displaystyle X_{t}=sum _{k=1}^{infty }Z_{k}e_{k}(t)} where Zk are pairwise uncorrelated random variables and the functions ek are continuous real-valued functions on [a, b] that are pairwise orthogonal in L2([a, b]). It is therefore sometimes said that the expansion is bi-orthogonal since the random coefficients Zk are orthogonal in the probability space while the deterministic functions ek are orthogonal in the time domain. The general case of a process Xt that is not centered can be brought back to the case of a centered process by considering Xt − E[Xt] which is a centered process.

Moreover, if the process is Gaussian, then the random variables Zk are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on [0, 1] is the Wiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.

The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform.

Contents 1 Formulation 2 Statement of the theorem 3 Proof 4 Properties of the Karhunen–Loève transform 4.1 Special case: Gaussian distribution 4.2 The Karhunen–Loève transform decorrelates the process 4.3 The Karhunen–Loève expansion minimizes the total mean square error 4.4 Explained variance 4.5 The Karhunen–Loève expansion has the minimum representation entropy property 5 Linear Karhunen–Loève approximations 6 Non-Linear approximation in bases 6.1 Non-optimality of Karhunen–Loève bases 7 Principal component analysis 7.1 Covariance matrix 7.2 Principal component transform 8 Examples 8.1 The Wiener process 8.2 The Brownian bridge 9 Applications 9.1 Applications in signal estimation and detection 9.1.1 Detection of a known continuous signal S(t) 9.1.2 Signal detection in white noise 9.1.3 Signal detection in colored noise 9.1.3.1 How to find k(t) 9.1.3.2 Test threshold for Neyman–Pearson detector 9.1.3.3 Prewhitening 9.1.4 Detection of a Gaussian random signal in Additive white Gaussian noise (AWGN) 10 See also 11 Notes 12 References 13 External links Formulation Throughout this article, we will consider a square-integrable zero-mean random process Xt defined over a probability space (Ω, F, P) and indexed over a closed interval [a, b], with covariance function KX(s, t). We thus have: {displaystyle forall tin [a,b]qquad X_{t}in L^{2}(Omega ,F,mathbf {P} ),} {displaystyle forall tin [a,b]qquad mathbf {E} [X_{t}]=0,} {displaystyle forall t,sin [a,b]qquad K_{X}(s,t)=mathbf {E} [X_{s}X_{t}].} We associate to KX a linear operator TKX defined in the following way: {displaystyle {begin{aligned}&T_{K_{X}}&:L^{2}([a,b])&to L^{2}([a,b])\&&:fmapsto T_{K_{X}}f&=int _{a}^{b}K_{X}(s,cdot )f(s),dsend{aligned}}} Since TKX is a linear operator, it makes sense to talk about its eigenvalues λk and eigenfunctions ek, which are found solving the homogeneous Fredholm integral equation of the second kind {displaystyle int _{a}^{b}K_{X}(s,t)e_{k}(s),ds=lambda _{k}e_{k}(t)} Statement of the theorem Theorem. Let Xt be a zero-mean square-integrable stochastic process defined over a probability space (Ω, F, P) and indexed over a closed and bounded interval [a, b], with continuous covariance function KX(s, t).

Then KX(s,t) is a Mercer kernel and letting ek be an orthonormal basis on L2([a, b]) formed by the eigenfunctions of TKX with respective eigenvalues λk, Xt admits the following representation {displaystyle X_{t}=sum _{k=1}^{infty }Z_{k}e_{k}(t)} where the convergence is in L2, uniform in t and {displaystyle Z_{k}=int _{a}^{b}X_{t}e_{k}(t),dt} Furthermore, the random variables Zk have zero-mean, are uncorrelated and have variance λk {displaystyle mathbf {E} [Z_{k}]=0,~forall kin mathbb {N} qquad {mbox{and}}qquad mathbf {E} [Z_{i}Z_{j}]=delta _{ij}lambda _{j},~forall i,jin mathbb {N} } Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and the Lebesgue measure on [a, b] with a Borel measure whose support is C.

Proof The covariance function KX satisfies the definition of a Mercer kernel. By Mercer's theorem, there consequently exists a set λk, ek(t) of eigenvalues and eigenfunctions of TKX forming an orthonormal basis of L2([a,b]), and KX can be expressed as {displaystyle K_{X}(s,t)=sum _{k=1}^{infty }lambda _{k}e_{k}(s)e_{k}(t)} The process Xt can be expanded in terms of the eigenfunctions ek as: {displaystyle X_{t}=sum _{k=1}^{infty }Z_{k}e_{k}(t)} where the coefficients (random variables) Zk are given by the projection of Xt on the respective eigenfunctions {displaystyle Z_{k}=int _{a}^{b}X_{t}e_{k}(t),dt} We may then derive {displaystyle {begin{aligned}mathbf {E} [Z_{k}]&=mathbf {E} left[int _{a}^{b}X_{t}e_{k}(t),dtright]=int _{a}^{b}mathbf {E} [X_{t}]e_{k}(t)dt=0\[8pt]mathbf {E} [Z_{i}Z_{j}]&=mathbf {E} left[int _{a}^{b}int _{a}^{b}X_{t}X_{s}e_{j}(t)e_{i}(s),dt,dsright]\&=int _{a}^{b}int _{a}^{b}mathbf {E} left[X_{t}X_{s}right]e_{j}(t)e_{i}(s),dt,ds\&=int _{a}^{b}int _{a}^{b}K_{X}(s,t)e_{j}(t)e_{i}(s),dt,ds\&=int _{a}^{b}e_{i}(s)left(int _{a}^{b}K_{X}(s,t)e_{j}(t),dtright),ds\&=lambda _{j}int _{a}^{b}e_{i}(s)e_{j}(s),ds\&=delta _{ij}lambda _{j}end{aligned}}} where we have used the fact that the ek are eigenfunctions of TKX and are orthonormal. Let us now show that the convergence is in L2. Let {displaystyle S_{N}=sum _{k=1}^{N}Z_{k}e_{k}(t).} Then: {displaystyle {begin{aligned}mathbf {E} left[left|X_{t}-S_{N}right|^{2}right]&=mathbf {E} left[X_{t}^{2}right]+mathbf {E} left[S_{N}^{2}right]-2mathbf {E} left[X_{t}S_{N}right]\&=K_{X}(t,t)+mathbf {E} left[sum _{k=1}^{N}sum _{l=1}^{N}Z_{k}Z_{ell }e_{k}(t)e_{ell }(t)right]-2mathbf {E} left[X_{t}sum _{k=1}^{N}Z_{k}e_{k}(t)right]\&=K_{X}(t,t)+sum _{k=1}^{N}lambda _{k}e_{k}(t)^{2}-2mathbf {E} left[sum _{k=1}^{N}int _{a}^{b}X_{t}X_{s}e_{k}(s)e_{k}(t),dsright]\&=K_{X}(t,t)-sum _{k=1}^{N}lambda _{k}e_{k}(t)^{2}end{aligned}}} which goes to 0 by Mercer's theorem. Properties of the Karhunen–Loève transform Special case: Gaussian distribution Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude: Theorem. The variables Zi have a joint Gaussian distribution and are stochastically independent if the original process {Xt}t is Gaussian.

In the Gaussian case, since the variables Zi are independent, we can say more: {displaystyle lim _{Nto infty }sum _{i=1}^{N}e_{i}(t)Z_{i}(omega )=X_{t}(omega )} almost surely.

The Karhunen–Loève transform decorrelates the process This is a consequence of the independence of the Zk.

The Karhunen–Loève expansion minimizes the total mean square error In the introduction, we mentioned that the truncated Karhunen–Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.

More specifically, given any orthonormal basis {fk} of L2([a, b]), we may decompose the process Xt as: {displaystyle X_{t}(omega )=sum _{k=1}^{infty }A_{k}(omega )f_{k}(t)} where {displaystyle A_{k}(omega )=int _{a}^{b}X_{t}(omega )f_{k}(t),dt} and we may approximate Xt by the finite sum {displaystyle {hat {X}}_{t}(omega )=sum _{k=1}^{N}A_{k}(omega )f_{k}(t)} for some integer N.

Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).

Proof Consider the error resulting from the truncation at the N-th term in the following orthonormal expansion: {displaystyle varepsilon _{N}(t)=sum _{k=N+1}^{infty }A_{k}(omega )f_{k}(t)} The mean-square error εN2(t) can be written as: {displaystyle {begin{aligned}varepsilon _{N}^{2}(t)&=mathbf {E} left[sum _{i=N+1}^{infty }sum _{j=N+1}^{infty }A_{i}(omega )A_{j}(omega )f_{i}(t)f_{j}(t)right]\&=sum _{i=N+1}^{infty }sum _{j=N+1}^{infty }mathbf {E} left[int _{a}^{b}int _{a}^{b}X_{t}X_{s}f_{i}(t)f_{j}(s),ds,dtright]f_{i}(t)f_{j}(t)\&=sum _{i=N+1}^{infty }sum _{j=N+1}^{infty }f_{i}(t)f_{j}(t)int _{a}^{b}int _{a}^{b}K_{X}(s,t)f_{i}(t)f_{j}(s),ds,dtend{aligned}}} We then integrate this last equality over [a, b]. The orthonormality of the fk yields: {displaystyle int _{a}^{b}varepsilon _{N}^{2}(t),dt=sum _{k=N+1}^{infty }int _{a}^{b}int _{a}^{b}K_{X}(s,t)f_{k}(t)f_{k}(s),ds,dt} The problem of minimizing the total mean-square error thus comes down to minimizing the right hand side of this equality subject to the constraint that the fk be normalized. We hence introduce βk, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function: {displaystyle Er[f_{k}(t),kin {N+1,ldots }]=sum _{k=N+1}^{infty }int _{a}^{b}int _{a}^{b}K_{X}(s,t)f_{k}(t)f_{k}(s),ds,dt-beta _{k}left(int _{a}^{b}f_{k}(t)f_{k}(t),dt-1right)} Differentiating with respect to fi(t) (this is a functional derivative) and setting the derivative to 0 yields: {displaystyle {frac {partial Er}{partial f_{i}(t)}}=int _{a}^{b}left(int _{a}^{b}K_{X}(s,t)f_{i}(s),ds-beta _{i}f_{i}(t)right),dt=0} which is satisfied in particular when {displaystyle int _{a}^{b}K_{X}(s,t)f_{i}(s),ds=beta _{i}f_{i}(t).} In other words, when the fk are chosen to be the eigenfunctions of TKX, hence resulting in the KL expansion.

Explained variance An important observation is that since the random coefficients Zk of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of Xt is simply the sum of the variances of the individual components of the sum: {displaystyle operatorname {var} [X_{t}]=sum _{k=0}^{infty }e_{k}(t)^{2}operatorname {var} [Z_{k}]=sum _{k=1}^{infty }lambda _{k}e_{k}(t)^{2}} Integrating over [a, b] and using the orthonormality of the ek, we obtain that the total variance of the process is: {displaystyle int _{a}^{b}operatorname {var} [X_{t}],dt=sum _{k=1}^{infty }lambda _{k}} In particular, the total variance of the N-truncated approximation is {displaystyle sum _{k=1}^{N}lambda _{k}.} As a result, the N-truncated expansion explains {displaystyle {frac {sum _{k=1}^{N}lambda _{k}}{sum _{k=1}^{infty }lambda _{k}}}} of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an {displaystyle Nin mathbb {N} } such that {displaystyle {frac {sum _{k=1}^{N}lambda _{k}}{sum _{k=1}^{infty }lambda _{k}}}geq 0.95.} The Karhunen–Loève expansion has the minimum representation entropy property Given a representation of {displaystyle X_{t}=sum _{k=1}^{infty }W_{k}varphi _{k}(t)} , for some orthonormal basis {displaystyle varphi _{k}(t)} and random {displaystyle W_{k}} , we let {displaystyle p_{k}=mathbb {E} [|W_{k}|^{2}]/mathbb {E} [|X_{t}|_{L^{2}}^{2}]} , so that {displaystyle sum _{k=1}^{infty }p_{k}=1} . We may then define the representation entropy to be {displaystyle H({varphi _{k}})=-sum _{i}p_{k}log(p_{k})} . Then we have {displaystyle H({varphi _{k}})geq H({e_{k}})} , for all choices of {displaystyle varphi _{k}} . That is, the KL-expansion has minimal representation entropy.

Proof: Denote the coefficients obtained for the basis {displaystyle e_{k}(t)} as {displaystyle p_{k}} , and for {displaystyle varphi _{k}(t)} as {displaystyle q_{k}} .

Choose {displaystyle Ngeq 1} . Note that since {displaystyle e_{k}} minimizes the mean squared error, we have that {displaystyle mathbb {E} left|sum _{k=1}^{N}Z_{k}e_{k}(t)-X_{t}right|_{L^{2}}^{2}leq mathbb {E} left|sum _{k=1}^{N}W_{k}varphi _{k}(t)-X_{t}right|_{L^{2}}^{2}} Expanding the right hand size, we get: {displaystyle mathbb {E} left|sum _{k=1}^{N}W_{k}varphi _{k}(t)-X_{t}right|_{L^{2}}^{2}=mathbb {E} |X_{t}^{2}|_{L^{2}}+sum _{k=1}^{N}sum _{ell =1}^{N}mathbb {E} [W_{ell }varphi _{ell }(t)W_{k}^{*}varphi _{k}^{*}(t)]_{L^{2}}-sum _{k=1}^{N}mathbb {E} [W_{k}varphi _{k}X_{t}^{*}]_{L^{2}}-sum _{k=1}^{N}mathbb {E} [X_{t}W_{k}^{*}varphi _{k}^{*}(t)]_{L^{2}}} Using the orthonormality of {displaystyle varphi _{k}(t)} , and expanding {displaystyle X_{t}} in the {displaystyle varphi _{k}(t)} basis, we get that the right hand size is equal to: {displaystyle mathbb {E} [X_{t}]_{L^{2}}^{2}-sum _{k=1}^{N}mathbb {E} [|W_{k}|^{2}]} We may perform identical analysis for the {displaystyle e_{k}(t)} , and so rewrite the above inequality as: {displaystyle {displaystyle mathbb {E} [X_{t}]_{L^{2}}^{2}-sum _{k=1}^{N}mathbb {E} [|Z_{k}|^{2}]}leq {displaystyle mathbb {E} [X_{t}]_{L^{2}}^{2}-sum _{k=1}^{N}mathbb {E} [|W_{k}|^{2}]}} Subtracting the common first term, and dividing by {displaystyle mathbb {E} [|X_{t}|_{L^{2}}^{2}]} , we obtain that: {displaystyle sum _{k=1}^{N}p_{k}geq sum _{k=1}^{N}q_{k}} This implies that: {displaystyle -sum _{k=1}^{infty }p_{k}log(p_{k})leq -sum _{k=1}^{infty }q_{k}log(q_{k})} Linear Karhunen–Loève approximations Consider a whole class of signals we want to approximate over the first M vectors of a basis. These signals are modeled as realizations of a random vector Y[n] of size N. To optimize the approximation we design a basis that minimizes the average approximation error. This section proves that optimal bases are Karhunen–Loeve bases that diagonalize the covariance matrix of Y. The random vector Y can be decomposed in an orthogonal basis {displaystyle left{g_{m}right}_{0leq mleq N}} as follows: {displaystyle Y=sum _{m=0}^{N-1}leftlangle Y,g_{m}rightrangle g_{m},} where each {displaystyle leftlangle Y,g_{m}rightrangle =sum _{n=0}^{N-1}{Y[n]}g_{m}^{*}[n]} is a random variable. The approximation from the first M ≤ N vectors of the basis is {displaystyle Y_{M}=sum _{m=0}^{M-1}leftlangle Y,g_{m}rightrangle g_{m}} The energy conservation in an orthogonal basis implies {displaystyle varepsilon [M]=mathbf {E} left{left|Y-Y_{M}right|^{2}right}=sum _{m=M}^{N-1}mathbf {E} left{left|leftlangle Y,g_{m}rightrangle right|^{2}right}} This error is related to the covariance of Y defined by {displaystyle R[n,m]=mathbf {E} left{Y[n]Y^{*}[m]right}} For any vector x[n] we denote by K the covariance operator represented by this matrix, {displaystyle mathbf {E} left{left|langle Y,xrangle right|^{2}right}=langle Kx,xrangle =sum _{n=0}^{N-1}sum _{m=0}^{N-1}R[n,m]x[n]x^{*}[m]} The error ε[M] is therefore a sum of the last N − M coefficients of the covariance operator {displaystyle varepsilon [M]=sum _{m=M}^{N-1}{leftlangle Kg_{m},g_{m}rightrangle }} The covariance operator K is Hermitian and Positive and is thus diagonalized in an orthogonal basis called a Karhunen–Loève basis. The following theorem states that a Karhunen–Loève basis is optimal for linear approximations.

Theorem (Optimality of Karhunen–Loève basis). Let K be a covariance operator. For all M ≥ 1, the approximation error {displaystyle varepsilon [M]=sum _{m=M}^{N-1}leftlangle Kg_{m},g_{m}rightrangle } is minimum if and only if {displaystyle left{g_{m}right}_{0leq m

Non-optimality of Karhunen–Loève bases To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis. Processes whose realizations have a random translation are stationary. The Karhunen–Loève basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vector Y[n] of size N that is random shift modulo N of a deterministic signal f[n] of zero mean {displaystyle sum _{n=0}^{N-1}f[n]=0} {displaystyle Y[n]=f[(n-p){bmod {N}}]} The random shift P is uniformly distributed on [0, N − 1]: {displaystyle Pr(P=p)={frac {1}{N}},qquad 0leq p

Si quieres conocer otros artículos parecidos a **Kosambi–Karhunen–Loève theorem** puedes visitar la categoría **Probability theorems**.

Deja una respuesta