This site is being phased out.

Functions of several variables: derivatives and integrals

From Mathematics Is A Science
Jump to navigationJump to search

The derivative of a function of several variables

We progress to the $n$-dimensional case.

First let's look at the point-slope form of linear functions: $$l(x_1,...,x_n)=p+m_1(x-a_1)+...+m_n(x-a_n),$$ where $p$ is the $z$-intercept, $m_1,...,m_n$ are the chosen slopes of the plane along the axes, and $a_1,...,a_n$ are the coordinates of the chosen point in ${\bf R}^n$. Let's express this expression, as before, in terms of the dot product with the increment of the independent variable: $$l(X)=p+M\cdot (X-A),$$ where

  • $M=<m_1,...,m_n>$ is the vector of slopes,
  • $A=(a_1,...,a_n)$ is the point in ${\bf R}^n$, and
  • $X=(x_1,...,x_n)$ is our variable point in ${\bf R}^n$.

Then we can say that the vector $N=<m_1,...,m_n,1>$ is perpendicular to this “plane” in ${\bf R}^n$. The conclusion holds independently from any choice of a coordinate system!

The linear approximations of a function $z=f(X)$ at $X=A$ in ${\bf R}^n$ are linear functions with $n$ slopes in the directions of the axes: $$\frac{\Delta f}{\Delta x_k}=\frac{f(a_1,...,a_{k-1},x,a_{k+1},a_n)-f(a_1,...,a_n)}{x_k-a_k}.$$ Their limits are what we are interested in.

Definition. The partial derivative of $z=f(X)=f(x_1,...,x_n)$ with respect $x_k$ at $X=A=(a_1,...,a_n)$ are defined to be the limit of the difference quotient with respect to $x_k$ at $x_k=a_k$ denoted by: $$\frac{\partial f}{\partial x_k}(A) \text{ or } f_k'(A).$$

The following is an obvious conclusion.

Theorem. The partial derivative of $z=f(X)$ at $X=A=(a_1,...,a_n)$ is found as the derivative of $z=f(x_1,...,x_n)$ with respect to $x_k$: $$\frac{\partial f}{\partial x_k}(A) = \frac{d}{dx_k}f(x_1,...,x_n)\bigg|_{x_k=a_k}.$$

Definition. Suppose $z=f(X)$ is defined at $X=A$ and $$l(X)=f(A)+M\cdot (X-A)$$ is any of its linear approximations at that point. Then, $z=l(X)$ is called the best linear approximation of $f$ at $X=A$ if the following is satisfied: $$\lim_{X\to A} \frac{ f(X) -l(X) }{||X-A||}=0.$$ In that case, the function $f$ is called differentiable at $X=A$. Then vector $M$ is called the gradient or the derivative of $f$ at $A$.

In other words, we stick to the functions that look like ${\bf R}^n$ on a small scale!

Notation. There are multiple ways. First the Leibniz-style: $$f'(A),$$ and the Lagrange style: $$\frac{df}{dX}(A) \text{ and }\frac{dx}{dX}(A).$$ The following is very common is science and engineering: $$\nabla f(A) \text{ and } \operatorname{grad}f(A).$$ Note that the gradient notation is to be read as: $$\bigg(\nabla f\bigg)(A),\ \bigg(\operatorname{grad}f\bigg)(A,$$ i.e., the gradient is computed and then evaluated at $X=A$.

Theorem. For a function differentiable as $X=A$, there is only one best linear approximation at $A$.

Proof. By contradiction.... $\blacksquare$

Below is a visualization of a differentiable function of three variables given by its level surfaces:

Function of three variables zoomed in.png

Not only these surfaces look like planes when we zoom in, they also progress in a parallel and uniform fashion.

Theorem. If $$l(X)=f(A)+M\cdot (X-A)$$ is the best linear approximation of $z=f(X)$ at $X=A$, then $$M=\operatorname{grad}f(A)=\left<\frac{\partial f}{\partial x_1}(A),..., \frac{\partial f}{\partial x_n}(A)\right>.$$

Exercise. Prove the theorem.

The gradient serves as the derivative of a differentiable function. When it is not differentiable, combining its variables, $x$ and $y$, into one, $X=(x,y)$, may be a bad idea but the partial derivatives might still make sense.

Non-differentiable.png

Warning: Even though the derivative of a parametric curve in ${\bf R}^n$ at a point and the derivative of a function of $n$ at a point are both vectors in ${\bf R}^n$, don't read anything into it.


Just as in dimensions $1$, differentiation is a special kind of function too, a function of functions: $$ \newcommand{\ra}[1]{\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{ccccccccccccccc} f & \mapsto & \begin{array}{|c|}\hline\quad \frac{d}{dX} \quad \\ \hline\end{array} & \mapsto & g=f' \end{array}$$ We need to understand how this function operates.

Sum Rule for differentials.png

Here, the bars are stacked on top of each other, then the heights are added to each other and so are the height differences.

Theorem (Sum Rule). The sum of two functions differentiable at a point is differentiable at that point and its derivative is equal to the sum of their derivatives; i.e., for any two functions $f,g$ differentiable at $X=A$, we have: $$(f+ g)'(A)= f'(A) + g'(A).$$

Proof. $\blacksquare$

Theorem (Constant Multiple Rule). A multiple of a function differentiable at a point is differentiable at that point and its derivative is equal to the multiple of the function's derivative; i.e., for any function $f$ differentiable at $X=A$ and any real $c$, we have: $$(c\cdot f)'(A) = c\cdot f'(A).$$

Proof. $\blacksquare$

Theorem (Product Rule). The product of two functions differentiable at a point is differentiable at that point and its derivative is found as a combination of these functions and their derivatives; specifically, given two functions $f,g$ differentiable at $X=A$, we have: $$(f \cdot g)'(A) = f(A)\cdot g'(A) + f'(A)\cdot g(A).$$

Proof. $\blacksquare$

Theorem (Quotient Rule). The quotient of two functions differentiable at a point is differentiable at that point and its derivative is found as a combination of these functions and their derivatives; specifically, given two functions $f,g$ differentiable at $X=A$, we have: $$\left( \frac{f}{g} \right)'(A) = \frac{f'(A)\cdot g(A) - f(A)\cdot g'(A)}{g(A)^2},$$ provided $g(A) \ne 0$.

Proof. $\blacksquare$

Theorem (Algebra of Derivatives). Suppose $f$ and $g$ are differentiable at $X=A$. Then, at $X=A$, we have: $$\begin{array}{|ll|ll|} \hline \text{SR: }& (f+g)'=f'+g' & \text{CMR: }& (cf)'=cf'& \text{ for any real }c\\ \text{PR: }& (fg)'=f'g+fg'& \text{QR: }& (f/g)'=\frac{f'g-fg'}{g^2} &\text{ provided }g\ne 0\\ \hline \end{array}$$ $$\begin{array}{|ll|ll|} \hline \text{SR: }& \nabla(f+g)=\nabla f+\nabla g & \text{CMR: }& \nabla (cf)=c\nabla f& \text{ for any real }c\\ \text{PR: }& \nabla(fg)=\nabla f\, g+f\nabla g& \text{QR: }& \nabla (f/g)=\frac{\nabla f\, g-f \nabla g}{g^2} &\text{ provided }g\ne 0\\ \hline \end{array}$$

What about the compositions? There are two...

Recall first how we create a trip plan as a parametric curve $X=F(t)$: the times and the places put on a simple automotive map:

Trip map.png

We then bring the terrain map of the area a function of two variables $z=f(x,y)$:

Function of two variables -- surface 1.png

In order to answer how fast will we be climbing we form their composition: $$\begin{array}{|ccccc|} \hline &\text{trip map} & & \bigg|\\ \hline t&\longrightarrow & (x,y) &\longrightarrow & z\\ \hline &\bigg| & &\text{terrain map}\\ \hline \end{array}$$

Map and terrain.png
  • If we double our horizontal speed (with the same terrain), the climb will be twice as fast.
  • If we double steepness of the terrain (with the horizontal speed), the climb will be twice as fast.

It follows that the speed of the climb is proportional to both our horizontal speed and the steepness of the terrain. This number is computed as the dot product of:

  • the derivative of the parametric curve $F$ of the trip, i.e., the horizontal velocity $\left< \frac{dx}{dt}, \frac{dy}{dt} \right>$, and
  • the gradient of the terrain function $f$, i.e., $\left< \frac{\partial z}{\partial x}, \frac{\partial z}{\partial x} \right> $.

Notice how the intermediate variable, $x$, is “cancelled” in the Leibniz notation: $$\frac{dz}{\not{dX}}\cdot\frac{\not{dX}}{dt}=\frac{dz}{dt}.$$ And $$\frac{du}{\not{dz}}\cdot\frac{\not{dz}}{dX}=\frac{du}{dX}.$$

Theorem (Chain Rule). The composition of a function differentiable at a point and a function differentiable at the image of that point is differentiable at that point and its derivative is found as a product of the two derivatives.

  • Part I: If $X=F(t)$ is a differentiable at $t=a$ parametric curve and $z=f(X)$ is differentiable at $X=F(a)$ function of several variables, then we have:

$$(f\circ F)'(t) = f'(F(t))\cdot F'(t).$$

  • Part II: If $z=f(X)$ is differentiable at $X=A$ function of several variables and $u=g(z)$ is differentiable at $z=f(A)$ numerical function, then we have:

$$(g\circ f)'(t) = g'(f(X))\cdot f'(X).$$

Proof. $\blacksquare$

What is the meaning of the composition in the second part of the Chain Rule?



The choice of the variable may change. If we are exploring the terrain represented by a function of two or more variables, going north-south or east-west is just two possibilities.




Where do matrices come from?

Matrices can appear as representations of linear functions as we saw above. Matrices can also in systems of linear equations.

Problem 1: Suppose we have coffee that costs $\$3$ per pound. How much do we get for $\$60$?

Solution: $$3x=60\ \Longrightarrow\ x=\frac{60}{3}.$$

Problem 2: Given: Kenyan coffee - $\$2$ per pound, Colombian coffee - $\$3$ per pound. How much of each do you need to have $6$ pounds of blend with the total price of $\$14$?

The setup is the following. Let $x$ be the weight of the Kenyan coffee and let $y$ be the weight of Colombian coffee. Then the total price of the blend is $\$ 14$. Therefore, we have a system: $$\begin{cases} x&+y &= 6 ,\\ 2x&+3y &= 14. \end{cases}$$

Solution: From the first equation, we derive: $y=6-x$. Then substitute into the second equation: $2x+3(6-x)=14$. Solve the new equation: $-x=-4$, or $x=4$. Substitute this back into the first equation: $(4)+y=6$, then $y=2$.

But it was so much simpler for the Problem 1! How can we mimic this equation and get a single equation for the system in Problem 2? The only difference seems to be that we make a blend one one ingredient or two.

Let's collect the data in tables first: $$\begin{array}{|ccc|} \hline 1\cdot x&+1\cdot y &= 6 \\ \hline 2\cdot x&+3\cdot y &= 14\\ \hline \end{array}\leadsto \begin{array}{|c|c|c|c|c|c|c|} \hline 1&\cdot& x&+&1&\cdot& y &=& 6 \\ 2&\cdot& x&+&3&\cdot&y &=& 14\\ \hline \end{array}\leadsto \begin{array}{|c|c|c|c|c|c|c|} \hline 1& & & &1& & & & 6 \\ &\cdot& x&+& &\cdot& y &=& \\ 2& & & &3& & & & 14\\ \hline \end{array}$$ We see tables starting to appear... We call these tables matrices.

The four coefficients of $x,y$ form the first table: $$A = \left[\begin{array}{} 1 & 1 \\ 2 & 3 \end{array}\right].$$ It has two rows and two columns. In other words, this is a $2 \times 2$ matrix.

The second table is on the right; it consists of the two “free” terms in the right hand side: $$B = \left[\begin{array}{} 6 \\ 14 \end{array}\right].$$ This is a $2 \times 1$ matrix.

The third table is less visible; it is made of the two unknowns: $$X = \left[\begin{array}{} x \\ y \end{array}\right].$$ This is a $2 \times 1$ matrix.

How does this construction help? Both $X$ and $B$ are column-vectors in dimension $2$ and matrix $A$ makes the latter from the former. This is very similar to multiplication of numbers; after all they are column-vectors in dimension $1$... Let's align the two problems: $$\begin{array}{ccc} \dim 1:&a\cdot x=b& \Longrightarrow & x = \frac{b}{a}& \text{ provided } a \ne 0,\\ \dim 2:&A\cdot X=B& \Longrightarrow & X = \frac{B}{A}& \text{ provided } A \ne 0, \end{array}$$ if we can just make sense of the new algebra...

Here $AX=B$ is a matrix equation and it's supposed to capture the system of equations in Problem 2. Let's compare the original system of equations to $AX=B$: $$\begin{array}{} x&+y &= 6 \\ 2x&+3y &=14 \end{array}\ \leadsto\ \left[ \begin{array}{} 1 & 1 \\ 2 & 3 \end{array} \right] \cdot \left[ \begin{array}{} x \\ y \end{array} \right] = \left[ \begin{array}{} 6 \\ 14 \end{array} \right]$$ We can see these equations in the matrices. First: $$1 \cdot x + 1 \cdot y = 6\ \leadsto\left[ \begin{array}{} 1 & 1 \end{array} \right] \cdot\left[ \begin{array}{} x \\ y \end{array} \right] = \left[ \begin{array}{} 6 \end{array} \right].$$ Second: $$3x+5y=35\ \leadsto\ \left[ \begin{array}{} 2 & 3 \end{array} \right] \left[ \begin{array}{} x \\ y \end{array} \right] = \left[ \begin{array}{} 14 \end{array} \right].$$ This suggests what the meaning of $AX$ should be. We “multiply” the rows in $A$ by column(s) in $X$!

Before we study matrix multiplication in general, let's see what insight our new approach provides for our original problem...

The initial solution has the following geometric meaning. We can think of the two equations $$\begin{array}{ll} x&+y &= 6 ,\\ 2x&+3y &= 14, \end{array}$$ as representations of two lines on the plane. Then the solution $(x,y)=(4,2)$ is the point of their intersection:

Solution of linear system.png

The new point of view is very different: instead of the locations, we are after the directions.

We are to solve a vector equation; i.e., to find these unknown coefficients: $$x\cdot\left[\begin{array}{c}1\\2\end{array}\right]+y\cdot\left[\begin{array}{c}1\\3\end{array}\right]=\left[\begin{array}{c}6\\14\end{array}\right].$$ In other words, we need to find a way to stretch these two vectors so that the resulting combination is the vector on the right:

Solution of linear system 2.png

Matrix multiplication

Idea: Match a row of $A$ and a column of $B$, pairwise multiply, then add the results. The sum is a single entry in $AB$!

MultiplyingMatrices.png

This is the simplest case: $$\left[ \begin{array}{} 1 & 2 & 0 & 1 & -1 \end{array} \right] \left[ \begin{array}{} 1 \\ 0 \\ 2 \\ 3 \\ 2 \end{array} \right] = \left[ \begin{array}{} 2 \end{array} \right]$$ In the above, we have a $1 \times 5$ multiplied with a $5 \times 1$. We think of the $5$'s as cancelling to yield a $1 \times 1$ matrix.

Let's see where $2$ comes from. Here's the algebra: $$\begin{array}{} & 1 & 2 & 0 & 1 & -1 \\ \times & 1 & 0 & 2 & 3 & 2 \\ \hline \\ & 1 +& 0 +& 0 +& 3 -& 2 =2 \end{array}$$ We multiply vertically then add the results horizontally.

Note: the product of a $1 \times 5$ matrix and a $5 \times 1$ matrix. Think of these as vectors, then the result is the dot product.

Example. Even though these are the same two matrices, this is on the opposite end of the spectrum (the columns and rows are very short):

Matrix multiplication 1.png

$$\left[ \begin{array}{} 1 \\ 0 \\ 2 \\ 3 \\ 2 \end{array} \right] \left[ \begin{array}{} 1 & 2 & 0 & 1 & -1 \end{array} \right] = \left[ \begin{array}{} 1 & 2 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 & 1 \\ 2 & 4 & 0 & 2 & -2 \\ \vdots \end{array} \right]$$ Here the left hand side is $5 \times 1$ multiplied with a $1 \times 5$ yielding a $5 \times 5$. It's very much like the multiplication table. $\square$

Example. $$\left[ \begin{array}{} 1 & 2 & 0 \\ 3 & 4 & 1 \end{array} \right] \left[ \begin{array}{} 1 & 2 \\ 0 & 1 \\ 1 & 1 \end{array} \right] = \left[ \begin{array}{} 1\cdot 1 + 2 \cdot 0 + 0\cdot 1 & ... \\ \vdots \end{array} \right]$$ Here we have a $2 \times 3$ multiplied with a $3 \times 2$ to yield a $2 \times 2$. $\square$

So, an $m \times n$ matrix is a table of real numbers with $m$ rows and $n$ columns:

RowsAndColumns.png.

Notation: $A = \{a_{ij}\}$ (or $[a_{ij}]$). Here $a_{ij}$ represents the position in the matrix:

RowsAndColumns2.png

Where is $a_{2,1}$? It's the entry at the $2^{\rm nd}$ row, $1^{\rm st}$ column: $$a_{21}=3\ \longleftrightarrow\ A = \left[ \begin{array}{} (*) & (*) \\ 3 & (*) \\ (*) & (*) \end{array} \right]$$

One can also think of $a_{ij}$ as a function of two variables, when the entries are give by a formula.

Example. $A = \{i+j\}$, $3 \times 3$, what is it? $$a_{ij} = i+j.$$ To find the entries plug in $i=1,2,3$ and $j=1,2,3$: $$\begin{array}{} a_{11} &= 1 + 1 &= 2 \\ a_{12} &= 1+2 &= 3 \\ a_{13} &= 1+3 &= 4 \\ {\rm etc} \end{array}$$ Now form a matrix: $$\left[ \begin{array}{c|ccc} i \setminus j & 1 & 2 & 3 \\ \hline \\ 1 & 2 & 3 & 4 \\ 2 & 3 & 4 & 5 \\ 3 & 4 & 5 & 6 \end{array} \right]$$ What's the difference from a function? Compare $a_{ij}$ with $f(x,y)=x+y$. First, $i,j$ are positive integers, while $x,y$ are real. Second the function is defined on the $(x,y)$-plane:

Lines parallel to axes.png

For the matrix it looks the same, but, in fact, it's “transposed” and flipped, just as in a spreadsheet: $$\begin{array}{lll} &1&2&3&...&n\\ 1\\ 2\\ ...\\ m \end{array}$$

Algebra of matrices...

Let's concentrate on a single entry in the product $C=\{c_{pq}\}$ of matrices $A=\{a_{ij}\},B=\{b_{ij}\}$. Observe, $c_{pq}$ is the "inner product" of $p^{\rm th}$ row of $A$ and $q^{\rm th}$ column of $B$. Consider:

  • $p$th row of $A$ is (first index $p$)

$\left[ \begin{array}{} a_{p1} & a_{p2} & ... & a_{pn} \end{array} \right]$

  • $q^{\rm th}$ column of $B$ is (second index, $q$)

$$\left[ \begin{array}{} b_{1q} \\ b_{2q} \\ \vdots \\ b_{nq} \end{array} \right]$$ So, $$c_{pq} = a_{p1}b_{1q} + a_{p2}b_{2q} + ...+ a_{pn}b_{nq}.$$ Now we can write the whole thing, the matrix product of an $n \times m$ matrix $\{a_{ij}\}$ and an $m \times k$ matrix $\{b_{ij}\}$ is $C = \{c_{pq}\}$, an $n \times k$ matrix with $$c_{pq} = \sum_{i=1}^n a_{pi}b_{iq}.$$

When can we both add and multiply? Given $m \times n$ matrix $A$ and $p \times q$ matrix $B$,

  • 1. $A+B$ makes sense only if $m=p$ and $n=q$.
  • 2. $AB$ makes sense only if $n=p$.

Both make sense only if both $A,B$ are $n \times n$. These are called square matrices.

Linear functions are matrices and matrices are linear functions

Linear functions on the plane


Example (collapse on axis). The latter is the “degenerate” case such as the following. Let's consider this very simple function: $$\begin{cases}u&=2x,\\ v&=0,\end{cases} \ \leadsto\ \left[ \begin{array}{} u \\ v \end{array}\right]= \left[\begin{array}{} 2 & 0 \\ 0 & 0 \end{array} \right] \cdot \left[ \begin{array}{} x \\ y \end{array} \right]$$ Below, one can see how this function collapses the whole plane to the $x$-axis:

Linear function dim 2 eigens -- projection on x-axis.png

In the mean time, the $x$-axis is stretched by a factor of $2$. $\square$

Example (stretch-shrink along axes). Let's consider this function: $$\begin{cases}u&=2x,\\ v&=4y.\end{cases}$$ Here, this linear function is given by the matrix: $$F=\left[ \begin{array}{ll}2&0\\0&4\end{array}\right].$$

Linear function dim 2 eigens -- stretch-shrink along axes 1.png

What happens to the rest of the plane? Since the stretching is non-uniform, the vectors turn. However, this is not a rotation but rather “fanning out”:

Linear function dim 2 eigens -- stretch-shrink along axes 2.png

$\square$

Example (stretch-shrink along axes). A slightly different function is: $$\begin{cases}u&=-x,\\ v&=4y.\end{cases} $$ It is simple because the two variables are fully separated.

Linear function dim 2 eigens -- stretch-shrink along axes.png

The slight change to the function produces a similar but different pattern: we see the reversal of the direction of the ellipse around the origin. Here, the matrix of $F$ is diagonal: $$F=\left[ \begin{array}{ll}-1&0\\0&4\end{array}\right].$$ $\square$

Example (collapse). Let's consider a more general linear function: $$\begin{cases}u&=x&+2y,\\ v&=2x&+4y,\end{cases} \ \Longrightarrow\ F=\left[ \begin{array}{cc}1&2\\2&4\end{array}\right].$$

Linear function dim 2 eigens -- projection.png

It appears that the function has a stretching in one direction and a collapse in another.

The determinant is zero: $$\det F=\det\left[ \begin{array}{cc}1&2\\2&4\end{array}\right]=1\cdot 4-2\cdot 2=0.$$ That's why there is a whole line of points $X$ with $FX=0$. To find it, we solve this equation: $$\begin{cases}x&+2y&=0,\\ 2x&+4y&=0,\end{cases} \ \Longrightarrow\ x=-2y.$$ $\square$

Example (stretch-shrink). Let's consider this function: $$\begin{cases}u&=-x&-2y,\\ v&=x&-4y.\end{cases} $$ Here, the matrix of $F$ is not diagonal: $$F=\left[ \begin{array}{cc}-1&-2\\1&-4\end{array}\right].$$

Linear function dim 2 eigens -- stretch-shrink 1.png

$\square$

Example (stretch-shrink). Let's consider this linear function: $$\begin{cases}u&=x&+2y,\\ v&=3x&+2y.\end{cases} $$ Here, the matrix of $F$ is not diagonal: $$F=\left[ \begin{array}{cc}1&2\\3&2\end{array}\right].$$

Linear function dim 2 eigens -- stretch-shrink.png

$\square$

Example (skewing-shearing). Consider a matrix with repeated (and, therefore, real) eigenvalues: $$F=\left[ \begin{array}{cc}-1&2\\0&-1\end{array}\right].$$ Below, we replace a circle with an ellipse to see what happens to it under such a function:

Linear function dim 2 eigens -- improper.png

There is still angular stretch-shrink but this time it is between the two ends of the same line. To see clearer, consider what happens to a square:

Linear function dim 2 eigens -- improper -- square.png

The plane is skewed, like a deck of cards:

Linear function dim 2 eigens -- skewed deck.png

Another example is when wind blows at the walls with its main force at their top pushing in the direction of the wind while their bottom is held by the foundation. Such a skewing can be carried out with any image editing software such as MS Paint. $\square$

Example (rotation). Consider a rotation through $90%$ degrees: $$\begin{cases}u&=& -y,\\ v&=x&&,\end{cases} \ \leadsto\ \left[ \begin{array}{} u \\ v \end{array}\right] = \left[\begin{array}{} 0&1\\-1&0 \end{array} \right] \cdot \left[ \begin{array}{} x \\ y \end{array} \right]$$

Linear function dim 2 eigens -- rotation 90.png


Consider a rotation through $45%$ degrees: $$\begin{cases}u&=\cos \pi/4\ x& -\sin \pi/4\ y,\\ v&=\sin \pi/4\ x&+\cos \pi/4\ y&.\end{cases} \ \leadsto\ \left[ \begin{array}{} u \\ v \end{array}\right] = \left[\begin{array}{rrr} \cos \pi/4\ y & -\sin \pi/4\ y \\ \sin \pi/4\ y & \cos \pi/4\ y \end{array} \right] \cdot \left[ \begin{array}{} x \\ y \end{array} \right]$$

Linear function dim 2 eigens -- rotation.png

$\square$

Example (rotation with stretch-shrink). Let's consider a more complex function: $$\begin{cases}u&=3x&-13y,\\ v&=5x&+y.\end{cases} $$ Here, the matrix of $F$ is not diagonal: $$F=\left[ \begin{array}{cc}3&-13\\5&1\end{array}\right].$$

Linear function dim 2 eigens -- rotation and stretch.png

$\square$

Matrix operations

The properties of matrix multiplication are very much like the ones for numbers.

What's not working is commutativity: $$AB \neq BA,$$ generally. Indeed, let try to put random numbers in two matrices and see if they “commute”: $$\begin{array}{lll} \left[\begin{array}{lll} 1&2\\ 2&3 \end{array}\right] \cdot \left[\begin{array}{lll} 2&1\\ 2&3 \end{array}\right] = \left[\begin{array}{lll} 2\cdot 2+2\cdot 2=6&\square\\ \square&\square \end{array}\right] &\text{ Now in reverse...} \\ \left[\begin{array}{lll} 2&1\\ 2&3 \end{array}\right] \cdot \left[\begin{array}{lll} 1&2\\ 2&3 \end{array}\right] = \left[\begin{array}{lll} 2\cdot 1+1\cdot 2=4&\square\\ \square&\square \end{array}\right] &\text{ ...already different!} \end{array}$$

To understand the idea of non-commutativity one might think about matrices as operation carried out in manufacturing. Suppose $A$ is polishing and $B$ is painting. Clearly, $$AB \neq BA.$$

Mathematically, we have: $$\begin{array}{} x \stackrel{A}{\rightarrow} y \stackrel{B}{\rightarrow} z\\ x \stackrel{B}{\rightarrow} y \stackrel{A}{\rightarrow} z \end{array}$$ Does the results have to be the same? Not in general. Let's put them together: $$ \newcommand{\ra}[1]{\!\!\!\!\!\!\!\xrightarrow{\quad#1\quad}\!\!\!\!\!} \newcommand{\da}[1]{\left\downarrow{\scriptstyle#1}\vphantom{\displaystyle\int_0^1}\right.} % \begin{array}{llllllllllll} x & \ra{A} & y \\ \da{B} & \searrow ^C & \da{B} \\ y & \ra{A} & z \end{array} $$ This is called a commutative diagram if $C$ makes sense, i.e., $C=AB=BA$.

Let's, again, think of matrices as functions. Then non-commutativity isn't surprising. Indeed, take $$f(x)=\sin x,\ g(x)=e^x.$$ Then $$\begin{array}{} gf &\neq fg ;\\ e^{\sin x} &\neq \sin e^x \end{array}$$ Functions don't commute, in general, and neither do matrices.

But some do commmute. For example, $A^2=A \cdot A$ makes sense, as well as: $$A^n = A \cdot ... \cdot A$$ (where there are $n$ $A$'s). Then $$A^m \cdot A^n = A^n A^m,$$ i.e., powers commute.

Also, consider diagonal matrices $A = \{a_{ij}\}$ with $a_{ij}=0$ if $i \neq j$. Such a matrix commute with any matrix. In particular, define $$I_n = \left[ \begin{array}{} 1 & 0 & 0 & ... \\ 0 & 1 & 0 & ... \\ \vdots \\ 0 & 0 & ... & 1 \end{array} \right]$$ or, $$a_{ij}=\left\{ \begin{array}{} 1 & i = j \\ 0 & {\rm otherwise}. \end{array} \right.$$ It is called the identity matrix.

Proposition. $IA=A$ and $AI=A$.

Provided, of course, $A$ has the appropriate dimensions:

  • $I_nA=A$ makes sense if $A$ is $n \times m$.
  • $AI_n=A$ makes sense if $A$ is $m \times n$.

Proof. Go back to the definition: for $AI_n=A$. Suppose $AI_n = \{c_{pq}\}$: $$c_{pq} = a_{p1}b_{1q} + a_{p2}b_{2q} + ... + a_{pn}b_{nq}. (AB)$$ But $$b_{iq} = \left\{ \begin{array}{} 1 & i=q \\ 0 & i \neq q \end{array} \right. ,$$ so $$b_{iq}=0,0,...,0,1,0,...,0.$$ Substitute these: $$c_{pq}=a_{p1}\cdot 0 + a_{p2} \cdot 0 + ... + a_{p,q-1}\cdot 0 + a_{pq}\cdot 1 + a_{p,q+1} \cdot 0 + ... + a_{pn} \cdot 0 = a_{pq}.$$ The left hand side is the $pq$ entry of $AI_n$ and the right hand side is the $pq$ entry of $A$, so $AI_n = A$. The proof of the other identity $I_nA=A$ is similar. $\blacksquare$

So, $I_n$ behaves like $1$ among numbers.

Both addition and multiplication can be carried out for $n \times m$ matrices, $n$ fixed.

You can take this idea quite far. We can define the exponent of a matrix $A$: $$e^A=I + A + \frac{1}{2}A^2 + \frac{1}{3!}A^3 + ... + \frac{1}{n!}A^n + ... $$

Theorem (Properties of matrix multiplication).

  • Associative Law 1: $(AB)C=A(BC)$.
  • Associative Law 2: $r(AB)=A(rB)=(rA)B$.
  • Distributive Law 1: $A(B+C)=AB+AC$.
  • Distributive Law 2: $(B+C)A = BA + CA$.

Proof. Suppose $$A = \{a_{ij}\},\ B=\{b_{ij}\},\ C=\{c_{ij}\},$$ and $$B+C=D=\{d_{ij}\},\ S=(B+C)A = \{s_{ij}\}.$$ Consider: $$s_{pq} = d_{p1}a_{1q} + d_{p2}a_{2q} + ... + d_{pn}a_{nq},$$ where $d_{ij}=b_{ij}+c_{ij}$. Substitute $$s_{pq} = (b_{p1}+c_{p1})a_{1q} + (b_{p2}+c_{p2})q_{2q} + ... + (b_{pn}+c_{pn})a_{nq}$$ expand and rearrange $$\begin{array}{} &= b_{p1}a_{1q} + c_{p1}a_{1q} + ... + b_{pn}a_{nq} + c_{pn}a_{nq} \\ &= (b_{p1}a_{1q}+...+b_{pn}a_{nq}) + (c_{p1}a_{1q} + ... + c_{pn}a_{nq}) \\ &= pq \text{ entry of } BA + pq \text{ entry of } CA. \end{array}$$ So $(B+C)A = BA + CA$. $\blacksquare$

Let's rewrite this computation with $\Sigma$-notation. $$\begin{array}{} s_{pq} &= \sum_{i=1}^n d_{pi}a_{iq} \\ &= \sum_{i=1}^n(b_{pi}+c_{pi})a_{iq} \\ &= \sum_{i=1}^n(b_{pi}a_{iq} + c_{pi}a_{iq}) \\ &= \sum_{i=1}^nb_{pi}a_{iq} + \sum_{i=1}^n c_{pi}a_{iq}. \end{array}$$

Back to systems of linear equations. Recall

  • equation: $ax=b$, $a \neq 0 \rightarrow$ solve $x=\frac{b}{a}$
  • system: $AX=B$, ?? $\rightarrow$ solve $X = \frac{B}{A}$.

To do that we need to define division of matrices!

Specifically, define an analogue of the reciprocal $\frac{1}{a}$, the inverse $A^{-1}$. Both notation and meaning are of the inverse function!

But is it $X=BA^{-1}$ or $A^{-1}B$? We know these don't have to be the same! We define division via multiplication. Recall:

  • $x=\frac{1}{a}$ if $ax=1$ or $xa=1$.

Similarly,

  • $X=A^{-1}$ if $AX=I$ or $XA=I$.

Question: Which one is it? Answer: Both!

The first one is called the "right-inverse" of $A$ and the second one is called the "left-inverse" of $A$.

Definition. Given an $n \times n$ matrix $A$, its inverse is a matrix $B$ that satisfies $AB=I$, $BA=I$.

Note: "its" is used to hide the issue. It should be "an inverse" or "the inverse".

Question: Is it well defined? Existence: not always.

How do we know? As simple as this. Go to $n=1$, then matrices are numbers. Then $A=0$ does not have the inverse.

Definition. If the inverse exists, $A$ is called invertible. Otherwise, $A$ is singular.

Theorem. If $A$ is invertible, then its inverse is unique.

Proof. Suppose $B,B'$ are inverses of $A$. Then (compact proof): $$B=BI=B(AB')=(BA)B'=IB'=B'.$$ With more details, try (1) $BA=I$ and (2) $AB'=I$. Multiply the equation by $B'$ on both sides: $$(BA)B'=IB'.$$ Use associativity on the left and the definition of multiplicative inverse on the right: $$B(AB')=B'.$$ Use the second equation: $$BI=B'.$$ Use the definition of multiplicative inverse on the left: $$B=B'.$$ $\blacksquare$

So, if $A$ is invertible, then $A^{-1}$ is the inverse.

Example. What about $I$? $I^{-1}=I$, because $II=I$. $\square$

Example. Verify: $$\left[ \begin{array}{} 3 & 0 \\ 0 & 3 \end{array} \right]^{-1} = \left[ \begin{array}{} \frac{1}{3} & 0 \\ 0 & \frac{1}{3} \end{array} \right]$$ It can be easily justified without computation: $$(3I)^{-1} = \frac{1}{3}I.$$

Generally, given $$A = \left[ \begin{array}{} a & b \\ c & d \end{array} \right],$$ find the inverse.

Start with: $$A^{-1} = \left[ \begin{array}{} x & y \\ u & v \end{array} \right].$$ Then $$\left[ \begin{array}{} a & b \\ c & d \end{array} \right] \left[ \begin{array}{} x & y \\ u & v \end{array} \right] = \left[ \begin{array}{} 1 & 0 \\ 0 & 1 \end{array} \right]$$ Solve this matrix equation. Expand: $$\left[ \begin{array}{} ax+ba & ay+bv \\ cx+du & cy+dv \end{array} \right] = \left[ \begin{array}{} 1 & 0 \\ 0 & 1 \end{array} \right]$$ Break apart: $$\begin{array}{} ax+bu=1, & ay+bv=0 \\ cx+du=0, & cy+dv=1 \end{array}$$ The equations are linear.

Exercise. Solve the system.

A shortcut formula: $$A^{-1}=\frac{1}{ad-bc} \left[ \begin{array}{} d & -b \\ -c & a \end{array} \right]$$

Let's verify: $$\begin{array}{} AA^{-1} &= \left[ \begin{array}{} a & b \\ c & d \end{array} \right] \frac{1}{ad-bc} \left[ \begin{array}{} d & -b \\ -c & a \end{array} \right] \\ &= \frac{1}{ad-bc}\left[ \begin{array}{} a & b \\ c & d \end{array} \right] \left[ \begin{array}{} d & -b \\ -c & a \end{array} \right] \\ &= \frac{1}{ad-bc} \left[ \begin{array}{} ad-bc & ab-ba \\ cd - dc & -bc + da \end{array} \right] \\ &= \left[ \begin{array}{} 1 & 0 \\ 0 & 1 \end{array} \right] = I \end{array}$$ $\square$

Exercise. Verify also $A^{-1}A=I$.

To find $A^{-1}$, solve a system. And it might not have a solution: $\frac{1}{ad-bc} = ?$.

Fact: For $$A=\left[ \begin{array}{} a & b \\ c & d \end{array} \right],$$ the inverse $A^{-1}$ exists if and only if $ad-bc \neq 0$. Here, $D=ad-bc$ is called the determinant of $A$.

Example. $$0 = \left[ \begin{array}{} 0 & 0 \\ 0 & 0 \end{array} \right]$$ is singular, by formula. $\square$

In dimension $n$, the zero matrix is singular: $0 \cdot B = 0$, not $I$. Then, just like with real numbers, $\frac{1}{0}$ undefined.

Example. Other singular matrices: $$\left[ \begin{array}{} 1 & 1 \\ 1 & 1 \end{array} \right], \left[ \begin{array}{} 0 & 0 \\ 0 & 1 \end{array} \right], \left[ \begin{array}{} 1 & 3 \\ 1 & 3 \end{array} \right]$$

What do they have in common?

What if we look at them as pairs of vectors: $$\left\{ \left[ \begin{array}{} 1 \\ 1 \end{array} \right], \left[ \begin{array}{} 1 \\ 1 \end{array} \right] \right\}, \left\{ \left[ \begin{array}{} 0 \\ 0 \end{array} \right], \left[ \begin{array}{} 0 \\ 1 \end{array} \right] \right\}, \left\{ \left[ \begin{array}{} 1 \\ 1 \end{array} \right], \left[ \begin{array}{} 3 \\ 3 \end{array} \right] \right\}$$

What's so special about them as opposed to this: $$\left\{ \left[ \begin{array}{} 1 \\ 0 \end{array} \right], \left[ \begin{array}{} 0 \\ 1 \end{array} \right] \right\}$$ The vector are linearly dependent! $\square$

Theorem. If $A$ is invertible, then so is $A^{-1}$. And $({A^{-1}})^{-1}=A$.

Proof. $\blacksquare$

Theorem. If $A,B$ are invertible, then so is $AB$, and $$(AB)^{-1} = B^{-1}A^{-1}.$$

This can be explained by thinking about matrices as functions:

InverseOfProduct.png

Proof. $\blacksquare$

Determinants

Given a matrix or a linear operator $A$, it is either singular or non-singular: $${\rm ker \hspace{3pt}}A \neq 0 {\rm \hspace{3pt} or \hspace{3pt}} {\rm ker \hspace{3pt}}A=0$$

If the kernel is zero, then $A$ is invertible. Recall, matrix $A$ is singular when its column vectors are linearly dependent ($i^{\rm th}$ column of $A = A(e_i)$).

The goal is to find a function that determines whether $A$ is singular. It is called the determinant.

Specifically, we want: $${\rm det \hspace{3pt}}A = 0 {\rm \hspace{3pt} iff \hspace{3pt}} A {\rm \hspace{3pt} is \hspace{3pt} singular}.$$

Start with dimension $2$. $A = \left[ \begin{array}{} a & b \\ c & d \end{array} \right]$ is singular when $\left[ \begin{array}{} a \\ c \end{array} \right]$ is a multiple of $\left[ \begin{array}{} b \\ d \end{array} \right]$

Let's consider that: $$\left[ \begin{array}{} a \\ c \end{array} \right] = x\left[ \begin{array}{} b \\ d \end{array} \right] \rightarrow \begin{array}{} a = bx \\ c = dx \\ \end{array}$$ So $x$ has to exist. Then, $$x=\frac {a}{b} = \frac{c}{d}.$$ That the condition. (Is there a division by 0 here?)

Rewrite as: $ad-bc=0 \longleftrightarrow A$ is singular, and we want $\det A =0$.This suggests that $${\rm det \hspace{3pt}}A = ad-bc,$$

Let's make this the definition.

Theorem. A $2\times 2$ matrix $A$ is singular iff $\det A=0$.

Proof. ($\Rightarrow$) Suppose $\left[ \begin{array}{} a & b \\ c & d \end{array} \right]$ is singular, then $\left[ \begin{array}{} a \\ c \end{array} \right] = x \left[ \begin{array}{} b \\ d \end{array} \right]$, then $a=xb, c=xd$, then substitute:

$${\rm det \hspace{3pt}}A = ad-bc = (xb)d - b (xd)=0.$$

($\Leftarrow$) Suppose $ad-bc=0$, then let's find $x$, the multiple.

  • Case 1: assume $b \neq 0$, then choose $x = \frac{a}{b}$. Then

$$\begin{array}{} xb &= \frac{a}{b}b &= a \\ xd &= \frac{a}{b}d &= \frac{ad}{b} = \frac{bc}{b} = c. \end{array}$$ So $$x \left[ \begin{array}{} b \\ d \end{array} \right] = \left[ \begin{array}{} a \\ c \end{array} \right].$$

  • Case 2: assume $a \neq 0$, etc.

We defined $ \det A$ with requirement that $ \det A = 0$ iff $A$ is singular.

But $ \det A$ could be $k(ad-bc)$, $k \neq 0$. Why $ad-bc$?

Because...

  • Observation 1: $ \det I_2=1$
  • Observation 2: Each entry of $A$ appears only once.

What if we interchange rows or columns? $$ \det \left[ \begin{array}{} c & d \\ a & b \end{array} \right] = cb-ad = - \det \left[ \begin{array}{} a & b \\ c & d \end{array} \right].$$ Then the sign changes. So

  • Observation 3: $ \det A=0$ is preserved under this elementary row operation.

Moreover, $ \det A$ is preserved up to a sign! (later)

  • Observation 4: If $A$ has a zero row or column, then $ \det A=0$.

To be expected -- we are talking about linear independence!

  • Observation 5: $ \det A^T= \det A$.

$$ \det \left[ \begin{array}{} a & c \\ b & d \end{array} \right] = ad-bc$$

  • Observation 6: Signs of the terms alternate: $+ad-bc$.
  • Observation 7: $ \det A \colon {\bf M}(2,2) \rightarrow {\bf R}$ is linear... NOT!

$$\begin{array}{} \det 3\left[ \begin{array}{} a & b \\ c & d \end{array} \right] &= \det \left[ \begin{array}{} 3a & 3b \\ 3c & 3d \end{array} \right] \\ &= 3a3d-3b3c \\ &= 9(ad-bc) \\ &= 9 \det \left[ \begin{array}{} a & b \\ c & d \end{array} \right] \end{array}$$ not linear, but quadratic. Well, not everything in linear algebra has to be linear...

Let's try to step up from dimension 2 with this simple matrix:

$$B = \left[ \begin{array}{} e & 0 & 0 \\ 0 & a & b \\ 0 & c & d \end{array} \right]$$

We, again, want to define ${\rm det \hspace{3pt}}B$ so that ${\rm det \hspace{3pt}}B=0$ if and only if $B$ is singular, i.e., the columns are linearly dependent.

Question: What does the value of $e$ do to the linear dependence?

  • Case 1, $e=0 \rightarrow B$ is singular. So, $e=0 \rightarrow {\bf det \hspace{3pt}}B=0$.
  • Case 2, $e \neq 0 \rightarrow$ the vectors are:

$$\left[ \begin{array}{} e & 0 & 0 \\ 0 & a & b \\ 0 & c & d \end{array} \right]$$ and $v_1=\left[ \begin{array}{} e \\ 0 \\ 0 \end{array} \right]$ is not a linear combination of the other two $v_2,v_3$.

So, we only need to consider the linear independence of those two, $v_2,v_3$!

Observe that $v_2,v_3$ are linearly independent if and only if ${\rm det \hspace{3pt}}A \neq 0$, where $$A = \left[ \begin{array}{} a & b \\ c & d \end{array} \right]$$

Two cases together: $e=0$ or ${\rm det \hspace{3pt}}A=0 \longleftrightarrow B$ is singular.

So it makes sense to define: $${\rm det \hspace{3pt}}B = e \cdot {\rm det \hspace{3pt}}A$$ With that we have: ${\rm det \hspace{3pt}}B=0$ if and only if $B$ is singular.

Let's review the observations above...

  • (1) $\det I_3 = 1$.
  • (4), (5) still hold.
  • (7) still not linear: $\det (2B) = 8 \det B.$

It's cubic.

So far so good...

Now let's give the definition of ${\rm det}$ in dimension three.

Definition in dimension 3 via that in dimension 2...

We define via "expansion along the first row."

$A = $ 3DDeterminant.png

$$\det A = a \det \left[ \begin{array}{} e & f \\ h & i \end{array} \right] - b \det \left[ \begin{array}{} d & f \\ g & i \end{array} \right] + c \det \left[ \begin{array}{} d & e \\ g & h \end{array} \right]$$ Observe that the first term is familiar from before.

3DDeterminantExplained.png

Expand farther using the formula for $2 \times 2$ determinant: $$\begin{array}{} {\det} _1 A &= a(ei-fh) - b(di-fg)+c(dh-eg) \\ &= (aei+bfg+cdh)-(afh+bdi+ceg) \end{array}$$

Observe:

  • each entry appears twice -- once with $+$, once with $-$.

3DDeterminantExplained2.png

Prior observation:

  • in each term (in determinant) every column appears exactly once, as does every row.

The determinant helps us with problems from before.

Example. Given $(1,2,0),(-1,0,2),(2,-2,1)$. Are they linearly independent? We need just yes or no; no need to find the actual dependence.

Note: this is similar to discriminant that tells us how many solutions a quadratic equation has:

  • $D<0 \rightarrow 0$ solutions,
  • $D=0 \rightarrow 1$ solution,
  • $D>0 \rightarrow 2$ solutions.

Consider: $$\begin{array}{} \det \left[ \begin{array}{} 1 & -1 & 2 \\ 2 & 0 & -2 \\ 3 & 2 & 1 \end{array} \right] &= 1 \cdot 0 \cdot 1 + (-1)(-2)3 + 2\cdot 2 \cdot 2 - 3 \cdot 0 \cdot 2 - 1(-2)2 - 2(-1)1 \\ &= 6 + 8 + 4 + 2 \\ &\neq 0 \end{array}$$ So, they are linearly independent.

Back to matrices.

Recall the definition, for:

3DDeterminant.png

$A = \left[ \begin{array}{} a & b & c \\ d & e & f \\ g & h & i \end{array} \right]$ of expansion along the first row:

(*) $$\begin{array}{} {\det} _1A &= a \det \left[ \begin{array}{} e & f \\ h & i \end{array} \right] - b \det \left[ \begin{array}{} d & f \\ g & i \end{array} \right]+ c \det \left[ \begin{array}{} d & e \\ g & h \end{array} \right] \\ &=(aei+bfg+cdh)-(afh+bdi+ceg) \end{array}$$ Let's try the expansion along the second row.

3DDeterminantExplained3.png

(**) $$\begin{array}{} {\det} _2A &= d \det \left[ \begin{array}{} b & c \\ h & i \end{array} \right] - e \det \left[ \begin{array}{} a & c \\ g & i \end{array} \right] + f \det \left[ \begin{array}{} a & b \\ g & h \end{array} \right] \\ &=d(bi-ha) - e(ai-cg) + f(ah-bg) \\ &=dbi+\ldots \end{array}$$ Let' try to match $(*)$ and $(**)$.

Wrong sign! But it's wrong for all terms, fortunately!

To fix the formula, flip the signs, $$\begin{array}{} {\det}_{2} A &= -d \det \left[ \begin{array}{} b & c \\ h & i \end{array} \right] + e \det \left[ \begin{array}{} a & c \\ g & i \end{array} \right] - f \det \left[ \begin{array}{} a & b \\ g & h \end{array} \right] \end{array}$$

Conclusion: the expansion formula depends on the row chosen.

But it's the same procedure:

  • $3$ determinants, $2 \times 2$,
  • with coefficients cut out row and column,
  • signs alternate.

The difference is alternating signs: start with $-$ or $+$, depending on the row chosen.


Vector fields

Work along a path.png

Gravity

A familiar problem about a ball thrown in the air has a solution: its trajectory is a parabola. However, we also know that if we throw really-really hard (like a rocket) the ball will start to orbit the Earth following an ellipse.

Ball parabola or ellipse.png

The motion of two planets (or a star and a planet, or a planet and a satellite, etc.) is governed by a single force: the gravity. Recall how this force operates.

Newton's Law of Gravity: The force of gravity between two objects is given by the formula: $$F = G \frac{mM}{r^2},$$ where:

  • $F$ is the force between the objects;
  • $G$ is the gravitational constant;
  • $m$ is the mass of the first object;
  • $M$ is the mass of the second object;
  • $r$ is the distance between the centers of the masses.

or, in the vector form (with the first object is located at the origin): $$F=-G mM\frac{X}{||X||^3}.$$

This is what we know.

  • When the Earth is seen as “large” in comparison to the size of the trajectory, the gravity forces are assumed to be parallel in all locations (the orbit is a parabola).
  • When the Earth is seen as “small” in comparison to the size of the trajectory, the gravity forces are assumed to point radially toward that point (the orbit) may be an ellipse, or a hyperbola, or a parabola.
Ball parabola or ellipse -- gravity.png

When the size and, therefore, the shape of the Earth matter, things get complicated...

This substitution for the simplest case of a perfectly spherical Earth is justified below.