Linear Algebra


Author: Samuel Peterson


Date Published
2021 - 12 - 05 ISO 8601
76 - 12 - 05 PB


From time to time I enjoy doing a methodical review of mathematical topics which I was introduced to during my studies in university. Sometimes these forrays involve so much time and effort that I feel it necessary to write about it when I'm finished. Indeed, that's just what I did following a study of differential geometry three years ago. Now I'm writing about the book "Linear Algebra" 2nd edition by Kenneth Hoffman and Ray Kunze.

This is a classic book on the subject. For example, it was listed as the primary reference for the linear algebra section of the qualifying exams at OSU when I was a student there, and it holds the same place for the same exam in the University of Maryland right now (which is just a mile from where I live).

It is clear to see why the book enjoys such success. Clear explanations, good structure, and fun examples and problems. But why was I reading this now, you might ask? Apart from it being a while since I formally covered the material, I was also curious about the nooks and crannies of the textbook that are often glossed over during a course. I only took an introduction to this material, as my main topic of study was probability and analysis. There was therefore much to cover of which I was only passingly familiar: cyclic decompositions, sesquilinear forms, etc. I also wanted to see what the book would say about things which I'd covered in the book "Tensor Analysis on Manifolds" such as exterior products and determinants. Before I go into such matters, I suppose a definition of linear algebra would be appreciated by the layperson.

I could just be a jerk and say it's the algebra of linear spaces... duh. It is in fact worth taking a moment to understand what algebra is before going into talking about linear algebra.

What is Algebra?

Algebra is the study of objects and operations on these objects. Invariably when one is considering an algebraic system, one has a set (or sets) of items and another set (or sets) of operations between them. For example, a person's first introduction to algebra is typically one where the set of items is the natural numbers, or the integers, or some other subset of the real numbers, and the operators (multivariate functions taking these same elements as arguments and returning another of these items) being addition and multiplication. Another example of an algebraic system would be the set of functions $$ f: \mathbb{R} \longrightarrow \mathbb{R} $$ with the operations being addition and composition.

The above description of algebra is likely not how people are introduced to the subject. For example, I asked my wife what she thought was meant by algebra (and she is a mathematically literate person) and she said something to the effect of: "it's where you solve for x". But consider the fact that solving for x in, say, the expression: $$ \sqrt{x^2 - 5x + 8} = 42, \text{ } x \in \mathbb{R} $$ involves an understanding of the real number line and the mathematical operations of multiplication, addition, and exponentiation.

The reader might correctly observe that the definition of algebra as a set of elements and operations is very broad; indeed, it is one of those fundamental topics which show up in various manifestations in a bunch of other mathematical settings. The particular setting of interest in this textbook is the algebra of linear spaces and linear operators.

What is Linear Algebra?

I have two answers for that question, as there is something of an overloading of the term linear algebra. As a topic we can think of linear algebra as the algebra on linear spaces or vector spaces, and linear functions defined on these spaces. A suitable formal definition of a vector can be found here. Informally, the key idea of vector spaces is the existence of two sets of entities, vectors and scalars in which the following hold:

  • The scalars form a field: Think rationals, reals, complex numbers -- closed under addition and multiplication, with additive and multiplicative identities and inverses.
  • over vectors, an operation of addition is defined.
  • Multiplication of a scalar and a vector is defined, and this multiplication distributes over the vector addition: \(c(\vec{x} + \vec{y}) = c\vec{x} + c\vec{y} \).

The notion of linearity of a function is an algebraic one:
For \(X\) and \(Y\), vector spaces over the same scalar field: \(\mathbb{K}\), $$ f: X \longrightarrow Y $$ A linear function obeys the following rules: $$ f(x + y) = f(x) + f(y) $$ $$ cf(x) = f(cx) $$

The term Linear Algebra is also used to describe a structure. A Linear Algebra is a vector space, where in a addition to an operation of scalar multiplication, there is multiplication of vectors with the following properties: for \(\mathbb{X}\) a linear algebra:

  • \(\mathbb{X}\) is closed under vector multiplication: for \(x,y \in \mathbb{X}\), \(xy \in \mathbb{X}\).
  • Vector multiplication is associative: for \(x,y,z \in \mathbb{X}\), \((xy)z = x(yz)\).
  • Vector multiplication distributes over addition: for \(x,y,z \in \mathbb{X}\), \(x(y + z) = xy + xz\), \((x + y)z = xz + yz\).
  • Scalar multiplication distribues over vector multiplication: for \(c\) a scalar, \(x,y \in \mathbb{X}\), \(c(xy) = (cx)y = x(cy)\).

An example of a Linear Algebra is the space of linear operators -- functions mapping vector spaces of dimension n, to vector spaces of dimension n. In which case, the vector multiplication would be composition. Perhaps the most common form in which these entities are introduced are with square matrices; when square matrices are accompanied with matrix multiplication as composition between operators as well as evaluation on vectors (in the form of column matrices), this is indeed equivalent to the Linear Algebra of linear operators.

What a Linear algebra is, then, is a structure you can form polynomials in. By that I mean objects like the following: \(a_0 I + a_1 X + \dots + a_n X^n\), where \(a_0\) is a scalar and \(X\) is a linear operator, and \(I\) is the identity operator. If you are only familiar with polynomials over, say, the real numbers: i.e. where the variable is a real number, it might be of interest to realize that this is just a special case of polynomials over linear operators on a vector space of dimension 1.

Core takeaways from Linear Algebra

Part of the impetus for writing this post was to put down in writing the big takeaways from this book. I have here a selection (probably not exhaustive) of such things.

1. \(A^{-1}\) exists if and only if \(determinant(A) \ne 0\).

i.e. You can't divide by 0

Every treatment of linear algebra ought to at least cover this one. I plan on talking more about determinants later, but it will be enough for now to state that the determinant is a particular scalar-valued function on the space of operators. To begin to see how elementary this is, let us consider the simplest of linear algebraic settings: where the dimension of our space is 1. In this case the vectors and operators are indistinguishable from the scalars which the vector space is over. Consider, then, the equation: $$ Ax = b $$ where A is an operator on \(\mathbb{C}^1\), and \(x\) and \(b\) are vectors in \(\mathbb{C}^1\). Now in the case where the dimension of the space is 1, an operator is just a scalar, and its operation on a vector (again just a scalar in the 1 dimensional case) is just scalar multiplication. In this case, the determinant of the operator \(A\) is just the scalar value: \(A\), and the inverse (provided it exists: i.e. provided the determinant of A is not 0): \(A^{-1}\) is just the reciprocal: \(\frac{1}{A}\), and the equation is solved by simply dividing by the scalar, A: $$ A^{-1}Ax = A^{-1}b $$ $$ \frac{A}{A} x = \frac{b}{A} $$ $$ x = \frac{b}{A} $$ Our statement about determinants and their relation to inverses is then equivalent to the following idea with which everyone aught to be familiar: you cannot divide by 0.

This fact shows up in all the linear algebra introductions that I'm aware of because because they all start by trying to answer the multidimensional case of the question: "how does one solve for x?" Typically this begins with problems much like those grappled with 300 years ago: how to solve for systems of multiple linear equations simultaneously. For example, finding \(x_1\text{, } x_2 \in \mathbb{R}\) such that the following equations hold simultaneously: $$ \alpha x_1 + \beta x_2 = b_1 $$ $$ \gamma x_1 + \delta x_2 = b_2 $$ Now this system of equations may or may not have a solution depending on the coefficients in the problem. The typical linear algebra course will at the very least develop the notion of matrices, and matrix multiplication to abstract the problem so that instead of a system of equations, we consider it as a single equation: $$ A\vec x = \vec b $$ Where A is now a matrix, and x and b are also matrices (typically with 1 column, called column-vectors). In this language now, the system given as an example above becomes: $$ \begin{pmatrix} \alpha & \beta \\ \gamma & \delta\end{pmatrix} \begin{pmatrix} x_1 \\ x_2\end{pmatrix} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}$$

When put into such a light, the question of: when can I solve for \(\vec x\) in the equation \(A\vec x = \vec b\), becomes when can I "divide by", or invert A? Just as the answer is: whenever \(A \ne 0\) in the case of one equation with one unknown, the analogous answer for the case of multiple equations is: Whenever \(det(A) \ne 0\).

2. Polynomials and the decomposition of linear operators

(note, in this section, \(T\) will be a linear operator on a vector space of dimension \(n\). Scientific applications of Linear algebra often make use of eigenvalue decompositions of operators (or characteristic value decompositions in this book). However in most science courses I was exposed to, the existence of such decompositions was rarely in doubt as the typical operators discussed were hermitian . This is not always the case. However, even when an operator is not diagonalizable, there are some nice ways to decompose them into simpler elements.

When I stated what a Linear Algebra is, I summed it up by saying a Linear Algebra is a structure you can form polynomials in. I emphasized that because polynomials are incredibly important. In the context of this section, they are instrumental in understanding the decomposition of linear operators. The major decompositions which are covered in the book are the "Primary Decomposition Theorem", and the "Cyclic Decomposition Theorem". But before giving the summary of these two theorems, some terminology regarding polynomials of linear operators need to be brought up.

2.1 Polynomials

First, the polynomials we use for decompositions have linear operators as the variable argument. Multiplication of two operators is just composition, and when we are considering a specific basis, this composition is equivalent to matrix multiplication between the matrix representations of the two operators. Scaler multiplication works as expected. So in short, if \(f\) is a polynomial of degree \(p\) on the space of operators \(L(V)\), then $$ f: L(V) \longrightarrow L(V) $$ $$ f(T) = a_0I + a_1T + \dots + a_pT^p $$ where \(\{a_0, \dots, a_p\}\) are scalars over which the underlying vector space \(V\) is defined, and \(I\) is the identity operator (our the operator equivalent of 1, the multiplicative identity).

Polynomials have prime factors (irreducible multiplicative components) just like integers. So for a given polynomial \(p\) if $$ p = p_1^{d_1}p_2^{d_2} \dots p_r^{d_r} $$ Where \(p_i\) are irreducible polynomials, then we say that the collection \(\{p_1, \dots, p_r\}\) are the prime factors of \(p\) and \(d_i\) is the multiplicity of prime factor \(i\).

\(T\)-annihilator of a subspace \(W\): this is a polynomial, \(q\) where \(q(T)\gamma = 0\) for all \(\gamma \in W\).

The characteristic polynomial: For a given operator \(T\), its characteristic polynomial, \(f\), can by obtained formally by computing \(det(xI - T)\) and then considering the variable x as being over the space of linear operators.

Finally the minimal polynomial: For a given operator \(T\), its minimal polynomial, \(p\), is the lowest degree monic(leading coefficient equal to 1) polynomial such that \(p(T) = 0\). The minimal polynomial and the characteristic polynomial are related in a very close way: Not only does \(p\) divide \(f\) (i.e. \(f(T) = 0\) as well), but they have the same prime factors (with potentially different multiplicities.) This is the Cayley-Hamilton theorem.

2.2 Decompositions

Now we are ready to summarize the main decompositions of linear operators. Note that these hold for arbitrary operators on a finite dimensional vector space.

Primary decomposition: \(T\) can be decomposed into a direct sum of operators on disjoint subspaces. In this case, the subspaces are the null spaces of the prime factors in the minimal polynomial of \(T\). The operators on these spaces are polynomials in \(T\).

Cyclic decomposition: \(T\) can be separated into a direct sum of operators on T-invariant subspaces: \(Z_1, \dots, Z_r\), for which the T-annihilator of \(Z_i\) is a collection of prime factors of the minimal polynomial which divides the annihilator \(Z_{i-1}\) for when \(i \gt 1\).

2.3 Polynomials again

Before I close on this topic, the proofs of these theorems show that these results are in large part due to the way in which polynomials factor. They feel like corollaries to some very profound abstractions which the book seems only to hint at. This is one of my favorite parts of mathematics as a field of study: when you see that useful statements are the result of something far more fundamental. The desire to know more about polynomials is one of the core motivators I have to pursue the topic of abstract algebra in the future.

3. The Grassman product, the Grassman Ring, and alternating linear functionals

The grassman product, otherwise known as an exterior product or wedge product was also introduced in the book "Tensor Analysis on Manifolds", about which I have written before. This book, "Linear Algebra" was the first description I've read regarding the "grassman ring".

First a brief summary of the exterior product if you don't want to read the wikipedia link above. Let's start with Tensors, or multi-linear scalar valued functions. For a vector space \(V\) of dimension \(n \lt \infty\), an alternating linear functional of rank \(k\) is a functional such that permuting any two arguments changes the sign of the value. We'll denote the space of alternating functionals of rank \(k\) by $$\bigwedge^kV^*$$.

Our description above can then be stated as: $$\text{if } w \in \bigwedge^kV^* \text{, then }$$ $$w(v_1, \dots, v_s, \dots, v_p, \dots, v_k) = -w(v_1, \dots, v_p, \dots, v_s, \dots, v_k).$$

The reason for denoting the space of alternating linear functionals with \(\bigwedge^k\) is that a basis for \(\bigwedge^k\) can be constructed by means of an exterior product: \(\wedge\), which is also known as the Grassman product:



If \(M\) is an r-dimensional alternating linear functional and \(N\) is an s-dimensional alternating linear functional, then $$ \wedge: \bigwedge^r \times \bigwedge^s \longrightarrow \bigwedge^{r+s} $$ $$ M \wedge N = \frac{1}{r!s!} \sum_{\sigma} sgn(\sigma) M \otimes N \circ \sigma $$ Where the sum is over all permutations \(\sigma\) of \(r+s\) items, and $$ sgn(\sigma) = -1 ^ {d_{\sigma}} $$ Where \(d_{\sigma}\) is the number of switches between two elements that results in \(\{1, 2, 3, \dots, r+s\}\) to be transformed into \(\sigma\).



It so happens that given a basis \(\{f_1, \dots, f_n\}\) for the dual space \(V^*\), a basis for \(\bigwedge^r V^*\) is the set of grassman products of the form \(f_{i_1} \wedge f_{i_2} \wedge \dots \wedge f_{i_r}\) where \(\{i_1, \dots, i_r\}\) is an increasing subsequence of \(\{1, \dots, n\}\). So for a vector space of dimension \(n\) the dimension of the space \(\bigwedge^r V^*\) is the number of ways to choose r items out of n: \(\binom{n}{r} = \frac{n!}{r!(n-r)!}\). If we also adopt the convention that \(\bigwedge^0V^*\) is just the scalar field which V is defined over, then we observe that the dimension of the space of \(\bigwedge^rV^*\) follows the pattern in Pascal's triangle for row \(n+1\). In particular, for \(r=n\) it the dimension is 1 (which will be mentioned in just a moment when we go back to determinants).

With all this said then, we can define a space which takes all of the n+1 non-trivial spaces of the form \(\bigwedge^rV^*\), put them in a tuple and define a ring on it, using the grassman product defined above. The chapter which culminates in the introduction of this particular mathematical structure simply states: "The Grassman ring is useful in many areas of mathematics." Exactly which areas they do not elaborate. I can think of one area which I have already written about. In my post on Tensor Analysis on Manifolds I write among other things about the space of differential elements. These are alternating tensor fields which define the basic differential elements for integration on tangent fields. They also form a grassman ring.

Before finishing this section on alternating functions, I would like to share with you a truly marvelous explanation for the fact about the determinant which I mentioned above: that a linear operator, \(A\), is invertible if and only if \(det(A) \neq 0\). You see, in introductions to linear algebra, this fact is often proved in a rather unedifying manner involving a computation on a matrix in a given basis, induction, and then proving that the determinant is invariant under coordinate transformations. I have gone over and grokked these sorts of proofs a bunch of times and then have promptly forgotten them -- because they are so un-illuminating. I can't even fill in the details of this kind of proof for you now, and I read one yesterday.

We get a far simpler explanation when we consider the space \(\bigwedge^nV^*\) where V is a vector space of dimension \(n\). First let us note that the dimension of \(\bigwedge^nV^*\) is 1. That is to say that any two alternating linear functionals on n elements are scalar multiples of each other. Given a linear operator, \(A\), consider the map \(A_*: \bigwedge^nV^* \longrightarrow \bigwedge^nV^*\) induced by \(A\) defined in the following way: $$A_* f(v_1, \dots, v_n) = f(Av_1, \dots, Av_n).$$ Such a map is linear from a 1-dimensional space onto itself. This is only the case when \(A_*\) is a multiplication by a scalar. i.e. \(A_*f = cf\). This scalar is the determinant of A.

Now for the theorem that A is invertible if and only if \(det(A) \neq 0\).

  • Suppose \(det(A) \neq 0\)
    • A key fact to prove that A is invertible in this case is that for an alternating linear functional, \(f\), \(f(v_1, \dots, v_n) \neq 0\) can only happen if \(\{v_1, \dots, v_n\}\) is linearly independent.
    • Using this fact, and the supposition that \(A_*f = det(A)f \neq 0\) for a non-zero \(f\), then by the definition of \(A_*\), \(A_*f(v_1, \dots, v_n) = f(Av_1, \dots, Av_n)\), then A must map a linearly independent set of n vectors to a linearly independent set. A is therefore invertible.
  • Suppose A is invertible.
    • Take a non-zero linear functional \(f\), and consider \(A_*f\) applied to a linearly independent collection \(\{A^-1v_1, \dots, A^-1v_n\}\) such that \(f(v_1, \dots, v_n) \neq 0\) (the existence of such a set is certain when A is invertible). Then \(A_*f(A^-1v_1, \dots, A^-1v_n) = det(A)f(Av_1, \dots, Av_n) \neq 0\). This can only happen if \(det(A) \neq 0\).

You might raise the following objection: "hey, you only asserted that this constant associated with \(A_*\) is the determinant of A. Is that in fact so?" Short answer is yes, but let's show it. Suppose \(f\) is a non-zero element of \(\bigwedge^nV^*\). Consider \(A_*f(e_1,e_2,\dots,e_n)\), where \(\{e_1, \dots, e_n\}\) is an independent collection of n elements in \(V\). We will take this as our basis with respect to which we demonstrate the equivalence of the determinant. Note that although the determinant of an operator is independent of the choice of basis, the formula for calculating the determinant is often given with respect to a particular basis.

$$ A_*f(e_1, e_2, \dots, e_n) = f(Ae_1, Ae_2, \dots, Ae_n) $$ $$ = f(\sum_j A_{1,j} e_j, \sum_j A_{2,j}e_j, \dots, \sum_j A_{n,j}e_j) \tag{1}$$ Notice in this last equality, when we distribute the sums, any term where we have two of same basis vectors as arguments will be 0. Therefore, all non-zero terms in this expansion lie in the set of permutations of \(\{e_1, \dots, e_n\}\). Note too that any individual term in the expansion of (1) has can be written as $$ A_{1,\sigma_1}A_{2,\sigma_2}\dots A_{n,\sigma_n}f(e_{\sigma_1}, \dots, e_{\sigma_n}) $$. Where \(\sigma\) is a specific permutation: \(\{\sigma_1, \dots, \sigma_n\}\). Note too that since \(f\) is alternating, $$ f(e_{\sigma_1}, \dots, e_{\sigma_n}) = sgn(\sigma)f(e_1, \dots, e_n) \tag{2}$$ Using (1) and (2) we obtain the following result: $$ A_*f(e_1, e_2, \dots, e_n) = \sum_\sigma sgn(\sigma)A_{1,\sigma_1} A_{2,\sigma_2}\dots A_{n,\sigma_n} f(e_1, \dots, e_n)$$ Now, since we also know that $$ A_*f(e_1, e_2, \dots, e_n) = c f(e_1, \dots, e_n)\text{, }\forall f \in \bigwedge^nV^*$$ for some c, then $$ c = \sum_\sigma sgn(\sigma)A_{1,\sigma_1} A_{2,\sigma_2}\dots A_{n,\sigma_n} $$ but this is just the formula for the determinant of A in the basis \(\{e_1, \dots, e_n\}\).

Before I close on this topic, I'd like to urge the reader to look up Hermann Grassmann, who invented many of the abstractions which comprise our notion of linear spaces, but the importance of whose work was not appreciated for well over 50 years. It would be hard to overstate how important linear spaces are to modern science and mathematics: Just about all of classical mechanics are stated within such an abstraction of space, the theory of relativity describes space as a manifold which is locally approximated by a linear space, differential equations (and by extension quantum mechanics) makes use of function spaces which are infinite-dimensional generalizations linear spaces, and the list goes on. Given how thoroughly I rely on these algebraic abstractions when I think about spatial problems, I find it incredible that they were only invented approximately 200 years ago, and only appreciated around the turn of the 20th century.

Some remarks on linear algebra and pedagogy

So who should learn linear algebra? Where does it belong in the curriculum of education? I'm tempted to adapt a line from Plato who, perhaps apocryphally, put the words: "let no one ignorant of geometry enter" onto the entrance of his academy in Athens. So I might say "let no one ignorant of algebra enter". In which case I would admit I'm being a bit hyperbolic, but it gets the point across that some foundational knowledge of algebra should exist before seriously beginning a study of almost anything. I will also add that I removed geometry rather than simply add algebra to this exhortation, since if I had to guess algebra is closer to the core of the universal source-code. Plato might not have found fault with my change since, as far as I understand it, geometry was the about as abstract as it got for mathematics of the time. Furthermore, when we consider linear spaces, algebra is a powerful lens through which to understand geometry.

First off, I'll start by defining the target audience. In an enlightened civilization I think mathematics would be universally regarded as important without regard to its applicability, but I understand that attitude is just not where we're at on the whole. So I'll just be focussing on people involved in disciplines which use anything more involved than arithmetic. That said, I'll make one blanket statement: if you cannot recognize what a mathematically rigorous argument looks like, you are not an educated person.

In my experience in teaching recitations and classes in Mathemetics and Physics during my undergraduate and graduate studies, I do think that some reformation of how mathematics is taught or which branches of mathematics is emphasised in the highschool to early college years should be considered. I think a major deficiency in US secondary education is in the abstractions of mathematics. For example, I find that in most cases when people coming out of high school are asked about linear algebra, they might say "isn't that matrices?" rather than, say, "vector spaces, right?" (Please note that I'm still setting a pretty low bar here). The point is that their takeaway is more a mechanistic form of computation rather than a conception of space. This tendency of seeing math as a collection of recipes or tools rather than a collection of useful abstractions to house models in shows up in other branches; I'll just keep to the above example, as the point of this post is not about math education specifically, but more about linear algebra (plus the diagnosis I have is admittedly not the result of systematic research, but by casual observation which could well be mistaken in many key ways).

One reason why we might be under-teaching abstractions is that our math education has an incorrect focus. In particular, calculus. It seems to me that high school mathematical curricula (and the first two years of college, even) have this conception that mathematics is like this monotonous path upward in sophistication building up inexorably to limits, with algebra beyond simply "solving for x" as being under the label pre-calculus. This misses the mark considerably. The truth is that solid understanding of sets, spaces, and functions between them are really what you want your students to know. Limits are a pretty minor addition to these ideas. It's also worth pointing out that limits are in no way employed in the material in this book. In fact, I'd say that if you could teach the material in "Linear Algebra" to a high school student well then that person would be quite better equipped for, say, an introductory physics course than a typical AP high school graduate.

It could be that there is just an insufficient value placed on mastery. I understand that perspective if one doesn't know if they should focus on A or B; however, there is no doubt that the time taken to grok the contents of the book "Linear Algebra" is not a waste, and few items take a higher priority.