In this module you will learn
In life, we encounter situations where we do one thing after another. For example, you put on your socks and then your shoes. We might call this whole operation (of first putting on your socks and then your shoes) “getting dressed”, and it is an example of function composition.
Definition 10.0.1. Composition of Functions.
Let \(f:A\to B\) and \(g:B\to C\text{.}\) The composition of \(g\) and \(f\text{,}\) notated \(g\circ f\), is the function \(h:A\to C\) defined by
\begin{equation*}
h(x)=g\circ f(x) = g\Big(f(x)\Big).
\end{equation*}
We can formalize the shoes-socks example with mathematical notation.
\begin{equation*}
\text{getting dressed}= (\text{putting on shoes})\circ (\text{putting on socks})
\end{equation*}
Or, if \(X\) represented putting on socks, \(S\) represented putting on shoes, and \(D\) represented getting dressed, \(D=S\circ X\text{.}\)
This real-life example has utility when talking to children. Getting dressed is a complicated operation, but by breaking it up into simpler operations, even a young child can understand the process. In this vein, we can understand complicated linear transformations by breaking them up into the composition of simpler ones.
For example, define
\begin{equation*}
\mathcal{T}:\R^{2}\to\R^{2}\qquad\text{by}\qquad \mathcal{T}(\vec x)=\matc{\sqrt{2}&-\sqrt{2}\\{\sqrt{2}}/{2} & {\sqrt{2}}/{2}}\vec x.
\end{equation*}
It’s hard to understand what \(\mathcal{T}\) does just by looking at inputs and outputs. However, if we were keen enough to notice that
\begin{equation*}
\mathcal{T}=\mathcal{S}\circ \mathcal{R}
\end{equation*}
where \(\mathcal{R}\) was rotation counter-clockwise by \(45^{\circ}\) and \(\mathcal{S}\) was a stretch in the \(\xhat\) direction by a factor of \(2\text{,}\) suddenly \(\mathcal{T}\) wouldn’t be such a mystery.
How does one figure out the “simple transformations” that when composed give the full transformation? In truth, there are many, many methods and there are whole books written about how to find these decompositions efficiently. We will encounter two algorithms for this, but for now our methods will be ad hoc.
Example 10.0.2.
Let \(\mathcal{U}:\R^{2}\to\R^{2}\) be the transformation given by \(M=\matc{\sqrt{2}/2&-\sqrt{2}/2\\0&0}\text{,}\) let \(\mathcal{R}:\R^{2}\to\R^{2}\) be rotation counter clockwise by \(45^{\circ}\text{,}\) and let \(\mathcal{P}:\R^{2}\to\R^{2}\) be projection onto the \(x\)-axis. Write \(\mathcal{U}\) as the composition (in some order) of \(\mathcal{R}\) and \(\mathcal{P}\text{.}\)
Solution.
We will use \(\xhat\) and \(\yhat\) to determine whether \(\mathcal{U}\) is \(\mathcal{R}\circ\mathcal{P}\) or \(\mathcal{P}\circ \mathcal{R}\text{.}\)
Computing,
\begin{equation*}
\mathcal{U}(\xhat)=\matc{\frac{\sqrt{2}}{2}\\0}\\\qquad\text{and}\qquad \mathcal{U}(\yhat)=\mat{-\frac{\sqrt{2}}{2}\\0}.
\end{equation*}
For \(\mathcal{R}\circ \mathcal{P}\text{:}\)
\begin{align*}
\mathcal{R}\circ \mathcal{P}(\xhat)=\mathcal{R}(\mathcal{P}(\xhat))&=\mathcal{R}\mat{1\\0}=\mat{\frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2}}\\
\mathcal{R}\circ \mathcal{P}(\yhat)=\mathcal{R}(\mathcal{P}(\yhat))&=\mathcal{R}\mat{0\\0}=\mat{0\\0}.
\end{align*}
For \(\mathcal{P}\circ \mathcal{R}\text{:}\)
\begin{align*}
\mathcal{P}\circ \mathcal{R}(\xhat)=\mathcal{P}(\mathcal{R}(\xhat))&=\mathcal{P}\mat{\frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2}}=\matc{\frac{\sqrt{2}}{2}\\0}\\
\mathcal{P}\circ \mathcal{R}(\yhat)=\mathcal{P}(\mathcal{R}(\yhat))&=\mathcal{P}\mat{-\frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2}}=\mat{-\frac{\sqrt{2}}{2}\\0}.
\end{align*}
Since \(\mathcal{P}\circ \mathcal{R}\) agrees with \(\mathcal{U}\) on the standard basis (i.e., \(\mathcal{P}\circ\mathcal{R}\) and \(\mathcal{U}\) output the same vectors when \(\xhat\) and \(\yhat\) are input), they must agree for all vectors. Therefore \(\mathcal{U}=\mathcal{P}\circ \mathcal{R}\text{.}\)
Section 10.1 Compositions and Matrix Products
Let \(\mathcal{A}:\R^{2}\to\R^{2}\) and \(\mathcal{B}:\R^{2}\to\R^{2}\) be matrix transformations with matrices
\begin{equation*}
M_{A}=\mat{1&2\\0&2}\qquad\text{and}\qquad M_{B}=\mat{-1&-1\\-2&0}.
\end{equation*}
(Make sure you understand why \(\mathcal{A}\neq M_{A}\) before continuing!)
Define \(\mathcal{T}=\mathcal{A}\circ \mathcal{B}\text{.}\) Since \(\mathcal{T}:\R^{2}\to\R^{2}\) is a linear transformation, we know \(\mathcal{T}\) has a matrix \(M_{T}\) which is \(2\times 2\text{.}\) We can find \(M_{T}\) by the usual methods. First, compute some input-output pairs for \(\mathcal{T}\text{.}\)
\begin{equation*}
\mathcal{T}\mat{1\\0}= \mathcal{A}\left( \mathcal{B}\mat{1\\0}\right) = \mathcal{A}\mat{-1\\-2}=\mat{-5\\-4}\qquad\text{and}\qquad \mathcal{T}\mat{0\\1}= \mathcal{A}\left( \mathcal{B}\mat{0\\1}\right) = \mathcal{A}\mat{-1\\0}=\mat{-1\\0}
\end{equation*}
Letting \(M_{T}=\mat{a&b\\c&d}\text{,}\) we use the input-output pairs to see
\begin{equation*}
\mat{a\\c}=M_{T}\mat{1\\0}=\mat{-5\\-4}\qquad \text{and}\qquad \mat{b\\d}=M_{T}\mat{0\\1}=\mat{-1\\0},
\end{equation*}
and so
\begin{equation*}
M_{T}=\mat{-5&-1\\-4&0}.
\end{equation*}
We found \(M_{T}\text{,}\) the matrix for \(\mathcal{T}\text{,}\) using traditional techniques, but could we have used \(M_{A}\) and \(M_{B}\) to somehow find \(M_{T}\text{?}\) As it turns out, yes, we could have!
By definition,
\begin{equation*}
\mathcal{A}\vec x=M_{A}\vec x\qquad\text{and}\qquad \mathcal{B}\vec x=M_{B}\vec x,
\end{equation*}
since \(\mathcal{A}\) and \(\mathcal{B}\) are matrix transformations. Therefore,
\begin{equation*}
\mathcal{A}(\mathcal{B}\vec x) = M_{A}(M_{B}\vec x).
\end{equation*}
But, matrix multiplication is associative, and so
\begin{equation*}
M_{A}(M_{B}\vec x)=(M_{A}M_{B})\vec x.
\end{equation*}
Thus \(M_{A}M_{B}\) must be a matrix for \(\mathcal{A}\circ \mathcal{B}=\mathcal{T}\text{.}\) Indeed, computing the matrix product, we see
\begin{equation*}
M_{A}M_{B}=\mat{1&2\\0&2}\mat{-1&-1\\-2&0}=\mat{-5&-1\\-4&0}=M_{T}.
\end{equation*}
The fact that matrix multiplication corresponds to function composition is no coincidence. It is the very reason matrix multiplication is defined the way it is. This is reiterated in the following theorem.
Theorem 10.1.1.
If \(\mathcal{P}:\R^{a}\to\R^{b}\) and \(\mathcal{Q}:\R^{c}\to \R^{a}\) are matrix transformations with matrices \(M_{P}\) and \(M_{Q}\text{,}\) then \(\mathcal{P}\circ \mathcal{Q}\) is a matrix transformation whose matrix is given by the matrix product \(M_{P}M_{Q}\text{.}\)
It should now be clear why the order of matrix multiplication matters. The order of function composition matters (you must put on your socks before your shoes!), and since matrix multiplication corresponds to function composition, the order of matrix multiplication must matter.