Matrix inversion is a handy procedure in solving a linear system of equations based on the notion of identity matrix InRn2 whose all entries are zero except the entries along the main diagonal.

For a linear system Ax=b where ARm×n is a known matrix, bRm is a known vector, and xRn is a vector of unknown values, one can find a solution if a matrix BRn×m can be found such that BA=In since for all xRn, Inx=x. We refer to B as the inverse of A (commonly denoted as A1). Mathematically,

Ax=b A1Ax=A1b Inx=A1b x=A1b

Of course, such solution can be found if A is invertible. That is it is possible to find A1. I am writing this post to summarize possible scenarios one may face when inverting a matrix.

Before delving into the details, I intend to present a geometric view of linear systems of equations. This will help us understand why we could have exactly one, infinitely many, or no solution at all. And subsequently, why some matrices are invertible and others are not.

Problem Setup

One can think of b as a point in the m-dimensional space. Furthermore, consider Ax as a linear combination A’s column vectors where the ith column A:,i is scaled by the ith entry of x, xi. In other words, when we say that b=Ax, we mean that b is reachable from the origin (0Rm) by moving x1 units along the vector A:,1, x2 units along the vector A:,2, …, and xn units along the vector A:,n. For different xRn, we may reach different points in the m-dimensional space. The set of such points are referred to as the span of A (or range of A, column space of A). That is, the set of points reachable by a linear combination of A’s columns. Mathematically, span(A's columns)={bRm|b=Ax,xRn}.

Therefore, a solution exists for Ax=b if bspan(A‘s columns). A necessary and sufficient condition for a solution to exist for all possible values of b (Rm) is that A’s columns constitute at least one set of m linearly independent vectors. Let’s note that there is no set of m-dimensional vectors has more than m linearly indepdent vectors. The following examples provide an illustration.

  1. A=[12,00], b=[1,1]. In this example, A’s span is R1 along the vector [1,0] and no matter how x is set, we can not reach b, because it does not lie at the intersection of its plane R2 and A’s space. With regard to the remark made above, we notice that m=2, but we do not have 2 linearly independent vectors in A’s columns.
  2. A=[12,00], b=[1,0]. In this example, A is the same as (1.)’s but we moved b to make it reachable. One may note that the effective dimension of the problem at hand is one (the second entry of all the columns is zero). In other words, we have matrix of 1×2 with m=1 and we have 2 linearly dependent vectors of length 1 that can reach b through any solution that satisfies x1+2x2=1, for which there are infinitely many. Such example fulfills the sufficient and necessary condition aforementioned, but here we have more than m columns. Thus, there are infinitely many solutions rather than exactly one.
  3. A=[10,01], b=[1,1]. In this example, we made A spans R2 with exactly m=2 linearly indepdendent vectors, and so we can reach b with exactly one particular x. This example fulfils the sufficient and necessary condition with exactly n=m columns.

Based on the above, one can conclude the following:

  1. When A is a square matrix (m=n) with m mutually independent column vectors, there is exactly one solution for any bRm. Thus, A is invertible.
  2. When A is a square matrix (m=n) with linearly depdenent column vectors, there exists some b for which no solution exists. Thus, A is non-invertible and we say that A is a singular matrix.

  3. When A is a rectangular matrix (m>n), there exists some b for which no solution exists. Thus, A is non-invertible, and we say that the system is overdetermined.

  4. When A is a rectangular matrix (m<n) with m linearly indendent column vectors, there exist infinitely many solutions for any bRm. Thus, A is non-invertible, and we say that the system is underdetermined.

  5. When A is a rectangular matrix (m<n) with <m linearly indendent column vectors, it is possible that there exists some b for which no solution exists. Thus, A is non-invertible. We still say the system is underdetermined.

There are several procedures to compute A1 for invertible matrices such as the Gauss elimination method (some others are listed in the figure at the bottom of this page).

Non-Invertible Matrices

So what do we do when A is non-invertible and we are still interested in finding a solution for a system? Well, we can use a pseudo-inverse matrix with some compromise on the solution’s quality as follows.

  • If there is no solution. That is we can’t reach b and the system is overdetermined. Then, we can choose one with the least squares error. Mathematically,
minx||Axb||22

Differentiating w.r.t x and setting it to zero,

0=x((Axb)T(Axb))=ATAxATb yields x=(ATA)1ATb. Therefore, the pseudo-inverse matrix is (ATA)1AT

  • If there are infinitely many solutions. That is the system is underdetermined. Then, one can choose one with the minimum norm. Mathematically,

minx||x||22 s.t.Ax=b

With the Lagrangian multipliers trick, the above can be solved as follows.

Differentiating w.r.t x and setting it to zero,

0=x(xTxλT(Axb))=2xATλ

AT is not invertible, but AAT is.1 Therefore, we can solve for λ as follows.

λ=2(AAT)1Ax=2(AAT)1b

Thus, x=AT(AAT)1b

Interestingly, both of the above pseudo-inverses are equivalent to the Moore-Penrose pseudo-inverse which can be computed from the singular value decomposition (SVD).

Ill-Conditioned Matrices

Let’s motivate this issue with an example. Assume you are reading a measurement, which you have to scale by dividing by a very small number. If your reading deviates a little, the absolute error will extremely large. Such large oscillations might be undesirable and we’d like to dampen them. In the matrix world, dividing by a small number corresponds to to multiplying by the inverse of an ill-conditioned matrix. An ill-conditioned matrix is still invertible, but it is numerically unstable. In addition to the measurement error (here the measurement is captured by b), inverting an ill-conditioned matrix also suffers from loss of precision in floating point arithmetic.To cope with this unstability, we compromise the quality of the solution and use regularization techniques to have a stable numerical outcome. The bulk of these techniques aim to increase (or replace) the value of the smallest eigenvalue with respect to the greatest eigenvalue. Common techniques are:

  1. Truncated SVD
  2. Tikhonov Regularization (which can also be used for non-invertible matrices).

It is important to consider the regularization effect on the final solution and whether it makes sense to our system or not.

Matrix Inversion Summary

Below is a summary on matrix inversion.



Summary of Matrix Inverse Techniques.

  1. AAT is a square matrix. What is left to show is that columns of AAT are linearly independent, and thus AAT is invertible. We know that the columns of AT are linearly independent, which means its null space is 0. By contradiction, let’s assume that v is a non-zero vector that belongs to AAT’s null space, which means AATv=0. Mutiplying both sides by v yields (ATv)T(ATv)=0, i.e., ATv=0. This contradicts the fact that AT’s null space is the zero vector. Therefore, v can not be zero and AAT’s null space has no non-zero vector, and all of its columns are linearly independent.