# Cayley–Hamilton theorem

Cayley–Hamilton theorem Arthur Cayley, F.R.S. (1821–1895) is widely regarded as Britain's leading pure mathematician of the 19th century. Cayley in 1848 went to Dublin to attend lectures on quaternions by Hamilton, their discoverer. Later Cayley impressed him by being the second to publish work on them.[1] Cayley proved the theorem for matrices of dimension 3 and less, publishing proof for the two-dimensional case.[2][3] As for n × n matrices, Cayley stated “..., I have not thought it necessary to undertake the labor of a formal proof of the theorem in the general case of a matrix of any degree”. William Rowan Hamilton (1805–1865), Irish physicist, astronomer, and mathematician, first foreign member of the American National Academy of Sciences. While maintaining opposing position about how geometry should be studied, Hamilton always remained on the best terms with Cayley.[1] Hamilton proved that for a linear function of quaternions there exists a certain equation, depending on the linear function, that is satisfied by the linear function itself.[4][5][6] In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Rowan Hamilton) states that every square matrix over a commutative ring (such as the real or complex numbers or the integers) satisfies its own characteristic equation.

If A is a given n × n matrix and In  is the n × n identity matrix, then the characteristic polynomial of A is defined as[7] {displaystyle p_{A}(lambda )=det(lambda I_{n}-A)} , where det is the determinant operation and λ is a variable for a scalar element of the base ring. Since the entries of the matrix {displaystyle (lambda I_{n}-A)} are (linear or constant) polynomials in λ, the determinant is also a degree-n monic polynomial in λ, {displaystyle p_{A}(lambda )=lambda ^{n}+c_{n-1}lambda ^{n-1}+cdots +c_{1}lambda +c_{0}~.} One can create an analogous polynomial {displaystyle p_{A}(A)} in the matrix A instead of the scalar variable λ, defined as {displaystyle p_{A}(A)=A^{n}+c_{n-1}A^{n-1}+cdots +c_{1}A+c_{0}I_{n}~.} The Cayley–Hamilton theorem states that this polynomial expression is equal to the zero matrix, which is to say that {displaystyle p_{A}(A)=mathbf {0} } . The theorem allows An to be expressed as a linear combination of the lower matrix powers of A. When the ring is a field, the Cayley–Hamilton theorem is equivalent to the statement that the minimal polynomial of a square matrix divides its characteristic polynomial. The theorem was first proved in 1853[8] in terms of inverses of linear functions of quaternions, a non-commutative ring, by Hamilton.[4][5][6] This corresponds to the special case of certain 4 × 4 real or 2 × 2 complex matrices. The theorem holds for general quaternionic matrices.[9][nb 1] Cayley in 1858 stated it for 3 × 3 and smaller matrices, but only published a proof for the 2 × 2 case.[2] The general case was first proved by Ferdinand Frobenius in 1878.[10] Contents 1 Examples 1.1 1 × 1 matrices 1.2 2 × 2 matrices 2 Applications 2.1 Determinant and inverse matrix 2.2 n-th power of matrix 2.3 Matrix functions 2.4 Algebraic number theory 3 Proofs 3.1 Preliminaries 3.1.1 Adjugate matrices 3.2 A direct algebraic proof 3.3 A proof using polynomials with matrix coefficients 3.4 A synthesis of the first two proofs 3.5 A proof using matrices of endomorphisms 3.6 A bogus "proof": p(A) = det(AIn − A) = det(A − A) = 0 3.7 Proofs using methods of abstract algebra 4 Abstraction and generalizations 5 See also 6 Remarks 7 Notes 8 References 9 External links Examples 1 × 1 matrices For a 1 × 1 matrix A = (a), the characteristic polynomial is given by p(λ) = λ − a, and so p(A) = (a) − a(1) = 0 is trivial.

2 × 2 matrices As a concrete example, let {displaystyle A={begin{pmatrix}1&2\3&4end{pmatrix}}.} Its characteristic polynomial is given by {displaystyle p(lambda )=det(lambda I_{2}-A)=det !{begin{pmatrix}lambda -1&-2\-3&lambda -4end{pmatrix}}=(lambda -1)(lambda -4)-(-2)(-3)=lambda ^{2}-5lambda -2.} The Cayley–Hamilton theorem claims that, if we define {displaystyle p(X)=X^{2}-5X-2I_{2},} then {displaystyle p(A)=A^{2}-5A-2I_{2}={begin{pmatrix}0&0\0&0\end{pmatrix}}.} We can verify by computation that indeed, {displaystyle A^{2}-5A-2I_{2}={begin{pmatrix}7&10\15&22\end{pmatrix}}-{begin{pmatrix}5&10\15&20\end{pmatrix}}-{begin{pmatrix}2&0\0&2\end{pmatrix}}={begin{pmatrix}0&0\0&0\end{pmatrix}}.} For a generic 2 × 2 matrix, {displaystyle A={begin{pmatrix}a&b\c&d\end{pmatrix}},} the characteristic polynomial is given by p(λ) = λ2 − (a + d)λ + (ad − bc), so the Cayley–Hamilton theorem states that {displaystyle p(A)=A^{2}-(a+d)A+(ad-bc)I_{2}={begin{pmatrix}0&0\0&0\end{pmatrix}};} which is indeed always the case, evident by working out the entries of A2.

show Proof Applications Determinant and inverse matrix See also: Determinant § Relation to eigenvalues and trace, and Characteristic polynomial § Properties For a general n × n invertible matrix A, i.e., one with nonzero determinant, A−1 can thus be written as an (n − 1)-th order polynomial expression in A: As indicated, the Cayley–Hamilton theorem amounts to the identity {displaystyle p(A)=A^{n}+c_{n-1}A^{n-1}+cdots +c_{1}A+(-1)^{n}det(A)I_{n}=0.} The coefficients ci are given by the elementary symmetric polynomials of the eigenvalues of A. Using Newton identities, the elementary symmetric polynomials can in turn be expressed in terms of power sum symmetric polynomials of the eigenvalues: {displaystyle s_{k}=sum _{i=1}^{n}lambda _{i}^{k}=operatorname {tr} (A^{k}),} where tr(Ak) is the trace of the matrix Ak. Thus, we can express ci in terms of the trace of powers of A.

In general, the formula for the coefficients ci is given in terms of complete exponential Bell polynomials as[nb 2] {displaystyle c_{n-k}={frac {(-1)^{k}}{k!}}B_{k}(s_{1},-1!s_{2},2!s_{3},ldots ,(-1)^{k-1}(k-1)!s_{k}).} In particular, the determinant of A equals (−1)nc0. Thus, the determinant can be written as the trace identity: {displaystyle det(A)={frac {1}{n!}}B_{n}(s_{1},-1!s_{2},2!s_{3},ldots ,(-1)^{n-1}(n-1)!s_{n}).} Likewise, the characteristic polynomial can be written as {displaystyle -(-1)^{n}det(A)I_{n}=A(A^{n-1}+c_{n-1}A^{n-2}+cdots +c_{1}I_{n}),} and, by multiplying both sides by A−1 (note −(−1)n = (−1)n−1), one is led to an expression for the inverse of A as a trace identity, {displaystyle {begin{aligned}A^{-1}&={frac {(-1)^{n-1}}{det A}}(A^{n-1}+c_{n-1}A^{n-2}+cdots +c_{1}I_{n}),\[5pt]&={frac {1}{det A}}sum _{k=0}^{n-1}(-1)^{n+k-1}{frac {A^{n-k-1}}{k!}}B_{k}(s_{1},-1!s_{2},2!s_{3},ldots ,(-1)^{k-1}(k-1)!s_{k}).end{aligned}}} Another method for obtaining these coefficients ck for a general n × n matrix, provided no root be zero, relies on the following alternative expression for the determinant, {displaystyle p(lambda )=det(lambda I_{n}-A)=lambda ^{n}exp(operatorname {tr} (log(I_{n}-A/lambda ))).} Hence, by virtue of the Mercator series, {displaystyle p(lambda )=lambda ^{n}exp left(-operatorname {tr} sum _{m=1}^{infty }{({A over lambda })^{m} over m}right),} where the exponential only needs be expanded to order λ−n, since p(λ) is of order n, the net negative powers of λ automatically vanishing by the C–H theorem. (Again, this requires a ring containing the rational numbers.) Differentiation of this expression with respect to λ allows one to express the coefficients of the characteristic polynomial for general n as determinants of m × m matrices,[nb 3] {displaystyle c_{n-m}={frac {(-1)^{m}}{m!}}{begin{vmatrix}operatorname {tr} A&m-1&0&cdots \operatorname {tr} A^{2}&operatorname {tr} A&m-2&cdots \vdots &vdots &&&vdots \operatorname {tr} A^{m-1}&operatorname {tr} A^{m-2}&cdots &cdots &1\operatorname {tr} A^{m}&operatorname {tr} A^{m-1}&cdots &cdots &operatorname {tr} Aend{vmatrix}}~.} Examples For instance, the first few Bell polynomials are B0 = 1, B1(x1) = x1, B2(x1, x2) = x2 1 + x2, and B3(x1, x2, x3) = x3 1 + 3 x1x2 + x3.

Using these to specify the coefficients ci of the characteristic polynomial of a 2 × 2 matrix yields {displaystyle {begin{aligned}c_{2}=B_{0}=1,\[4pt]c_{1}={frac {-1}{1!}}B_{1}(s_{1})=-s_{1}=-operatorname {tr} (A),\[4pt]c_{0}={frac {1}{2!}}B_{2}(s_{1},-1!s_{2})={frac {1}{2}}(s_{1}^{2}-s_{2})={frac {1}{2}}((operatorname {tr} (A))^{2}-operatorname {tr} (A^{2})).end{aligned}}} The coefficient c0 gives the determinant of the 2 × 2 matrix, c1 minus its trace, while its inverse is given by {displaystyle A^{-1}={frac {-1}{det A}}(A+c_{1}I_{2})={frac {-2(A-operatorname {tr} (A)I_{2})}{(operatorname {tr} (A))^{2}-operatorname {tr} (A^{2})}}.} It is apparent from the general formula for cn−k, expressed in terms of Bell polynomials, that the expressions {displaystyle -operatorname {tr} (A)quad {text{and}}quad {tfrac {1}{2}}(operatorname {tr} (A)^{2}-operatorname {tr} (A^{2}))} always give the coefficients cn−1 of λn−1 and cn−2 of λn−2 in the characteristic polynomial of any n × n matrix, respectively. So, for a 3 × 3 matrix A, the statement of the Cayley–Hamilton theorem can also be written as {displaystyle A^{3}-(operatorname {tr} A)A^{2}+{frac {1}{2}}left((operatorname {tr} A)^{2}-operatorname {tr} (A^{2})right)A-det(A)I_{3}=O,} where the right-hand side designates a 3 × 3 matrix with all entries reduced to zero. Likewise, this determinant in the n = 3 case, is now {displaystyle {begin{aligned}det(A)&={frac {1}{3!}}B_{3}(s_{1},-1!s_{2},2!s_{3})={frac {1}{6}}(s_{1}^{3}+3s_{1}(-s_{2})+2s_{3})\[5pt]&={tfrac {1}{6}}left((operatorname {tr} A)^{3}-3operatorname {tr} (A^{2})(operatorname {tr} A)+2operatorname {tr} (A^{3})right).end{aligned}}} This expression gives the negative of coefficient cn−3 of λn−3 in the general case, as seen below.

Similarly, one can write for a 4 × 4 matrix A, {displaystyle A^{4}-(operatorname {tr} A)A^{3}+{tfrac {1}{2}}{bigl (}(operatorname {tr} A)^{2}-operatorname {tr} (A^{2}){bigr )}A^{2}-{tfrac {1}{6}}{bigl (}(operatorname {tr} A)^{3}-3operatorname {tr} (A^{2})(operatorname {tr} A)+2operatorname {tr} (A^{3}){bigr )}A+det(A)I_{4}=O,} where, now, the determinant is cn−4, {displaystyle {tfrac {1}{24}}!left((operatorname {tr} A)^{4}-6operatorname {tr} (A^{2})(operatorname {tr} A)^{2}+3(operatorname {tr} (A^{2}))^{2}+8operatorname {tr} (A^{3})operatorname {tr} (A)-6operatorname {tr} (A^{4})right),} and so on for larger matrices. The increasingly complex expressions for the coefficients ck is deducible from Newton's identities or the Faddeev–LeVerrier algorithm.

n-th power of matrix The Cayley–Hamilton theorem always provides a relationship between the powers of A (though not always the simplest one), which allows one to simplify expressions involving such powers, and evaluate them without having to compute the power An or any higher powers of A.

As an example, for {displaystyle A={begin{pmatrix}1&2\3&4end{pmatrix}}} the theorem gives {displaystyle A^{2}=5A+2I_{2},.} Then, to calculate A4, observe {displaystyle A^{3}=(5A+2I_{2})A=5A^{2}+2A=5(5A+2I_{2})+2A=27A+10I_{2},} {displaystyle A^{4}=A^{3}A=(27A+10I_{2})A=27A^{2}+10A=27(5A+2I_{2})+10A=145A+54I_{2},.} Likewise, {displaystyle A^{-1}={frac {A-5I_{2}}{2}}~.} {displaystyle A^{-2}=A^{-1}A^{-1}={frac {A^{2}-10A+25I_{2}}{4}}={frac {(5A+2I_{2})-10A+25I_{2}}{4}}={frac {-5A+27I_{2}}{4}}~.} Notice that we have been able to write the matrix power as the sum of two terms. In fact, matrix power of any order k can be written as a matrix polynomial of degree at most n − 1, where n is the size of a square matrix. This is an instance where Cayley–Hamilton theorem can be used to express a matrix function, which we will discuss below systematically.