This site is being phased out.

Affine approximation

From Mathematics Is A Science
(Redirected from Best affine approximation)
Jump to navigationJump to search

Consider the idea of "linear approximation", or more precisely, "affine approximation", in dimension $1$ and then see how it translates into dimension $2$ (see Functions of several variables).

As the picture shows, a curve is approximated by a straight line while a surface by a plane.

Affine approximation dim 1 and 2.jpg

Dim $1$: given a function $f: {\bf R} \rightarrow {\bf R}$,

an affine function $l: {\bf R} {\rightarrow} {\bf R}$ is its approximation.

Dim $n$ parametric curve: given a function $f: {\bf R} {\rightarrow} {\bf R}^n$,

an affine function $l: {\bf R} {\rightarrow} {\bf R}^n$ is its approximation.

Dim $n$ function of several variables: given a function $f: {\bf R}^n {\rightarrow} {\bf R}$, graph $\subset {\bf R}^n \times {\bf R} = {\bf R}^{n+1}$,

an affine function $l: {\bf R}^n {\rightarrow} {\bf R}$,

The last item is a plane in dimension $2$, or a hyperplane in ${\bf R}^n \times {\bf R}$.

The idea comes from the fact that if you zoom in on the graph of a differentiable function, it looks like a straight line.

Affine approximation zooming in.jpg

Example. Let

$$f(x) = | x |.$$

Then

$${\nabla}_1 f(0) = 1,$$

$${\nabla}_{-1} f(0) = 1,$$

i.e. they are not aligned! Hence $f'(0)$ does not exist as there is no affine approximation.

Recall, an affine function $L: {\bf R}^n {\rightarrow} {\bf R}^m$ (graph is a hyperplane) can be written as

$$L(x) = u_0 + A(x),$$

where $u_0 \in {\bf R}^m, x \in {\bf R}^n$, and $A: {\bf R}^n {\rightarrow} {\bf R}^m$ linear. $A$ is a matrix $A(x) = Ax$ of dimension $m \times n$.

Example. Consider the above case with $m = 1$. Then

$$L(x) = u_0 + A(x), $$

where $A$ is of dimension $1 \times n$ and $x$ is of dimension $n \times 1, u \in {\bf R}$ a number, $x \in {\bf R}^n$. In this special case, $A$ is a vector and

$$Ax = A \cdot x {\rm \hspace{3pt} (inner \hspace{3pt} product)}.$$


Example. Consider the above case with $m = 3$. Let further $A = ( 1, 2, 3 )$ be a linear map, $u_0 = 5$. Then

$$\begin{array}{} L(x) &= 5 + ( 1, 2, 3 ) \cdot ( x_1, x_2, x_3 ) &= 5 + x_1 + 2x_2 + 3x_3 \end{array}$$

a hyperplane.

Definition. Let $f: {\bf R}^n {\rightarrow} {\bf R}^m$, $a$ an interior point of $D(f)$. The best affine approximation $T: {\bf R}^n {\rightarrow} {\bf R}^m$ is an affine function satisfying the conditions:

  1. $f(a) = T(a)$;
  2. $\displaystyle\lim_{x \rightarrow a} \frac{f(x) - T(x)}{|| x - a ||} = 0.$

Note 1: The best affine approximation is well defined. Why?

Note 2: Let

$$\displaystyle\lim_{x \rightarrow a} ( f(x) - T(x) ) = 0.$$

Then

$$f(x) - T(x) = 0$$

if $f$ and $T$ are continuous, hence

$$f(a) = T(a) {\rightarrow} (1).$$


Affine approximations as tangents.jpg

Let $f: {\bf R}^n {\rightarrow} {\bf R}^m$. What is the form of $T$?

$$T(x) = z + L( x - a ),$$

where $z$ is constant and L is linear. Further,

$$\begin{array}{} f(a) &= T(a) {\rm \hspace{3pt} (by \hspace{3pt}} (1) ) \\ &= z + L( a - a ) \\ &= z + L(0) = z, \end{array}$$

so

$$T(x) = f(a) + L( x - a ) {\rm \hspace{3pt} for \hspace{3pt} each \hspace{3pt}} a,$$

or

$$T(x) = f(a) + L_a( x - a ) {\rm \hspace{3pt} for \hspace{3pt} each \hspace{3pt}} a,$$

where the first term is constant and the second is a linear map evaluated at $x-a$. This linear map $L_a$ is called the total derivative of $f$ at $x = a$.

Affine approximations for paraboloid.jpg

Example. Let

$$f( x_1, x_2 ) = x_1^2 + x_2^2 {\rm \hspace{3pt} and \hspace{3pt}} a = ( 1, 2 ). $$

The graph is a paraboloid. Check that

$$T(x) = T( x_1, x_2 ) = 5 + 2 ( x_1 - 1 ) + 4 ( x_2 - 2 )$$

s the best affine approximation. By definition:

$$\begin{array}{} (1) T(a) &= T( 1, 2 ) \\ &= 5 + 2 ( x_1 - 1 ) + 4 ( x_2 - 2 ) \\ &= 5 + 2 ( 1 - 1 ) + 4 ( 2 - 2 ) \\ &= 5 + 0 + 0 \\ &= 5 \\ = f( 1, 2 ) {\rm \hspace{3pt} as} \end{array}$$

$$f( 1, 2 ) = 1^2 + 2^2 = 5,$$

$$(2) \frac{f(x) - T(x)}{| x - a |} = \frac{( x_1^2 + x_2^2 ) - ( 5 + 2 ( x_1 - 1 ) + 4 ( x_2 - 2)}{(( x_1 - 1 )^2 + ( x_2 - 2 )^2 )^{\frac{1}{2}}},$$

where the numerator approaches $0$ as $x {\rightarrow} a$.

Then, canceling the denominator, we obtain

$$\begin{array}{} \frac{f(x) - T(x)}{| x - a |} &= \frac{( x_1 - 1 )^2 + 2x_1 - 1 + ( x_2 - 2 )^2 ) + 4x_2 - 4) - 5 + 2( x_1 - 1 ) + 4 ( x_2 - 2 )}{\sqrt{..}} \\ &= \frac{( x_1 - 1 )^2 + 2x_1 - 1 + ( x_2 - 2 )^2 + 4x_2 - 4 - 5 + 2 - 4x_2 + 5}{\sqrt{..}} \\ &= \frac{( x_1 - 1 )^2 + ( x_2 - 2 )^2}{( ( x_1 - 1 )^2 + ( x_2 - 2 )^2 )^{\frac{1}{2}}} \\ &= ( ( x_1 - 1 )^2 + ( x_2 - 2 )^2 )^{\frac{1}{2}} \\ &{\rightarrow} 0 {\rm \hspace{3pt} as \hspace{3pt}} x_1 {\rightarrow} 1, x_2 {\rightarrow} 2. \end{array}$$

So, $T(x) = 5 + 2 ( x_1 - 1 ) + 4 ( x_2 - 2 )$ is the best affine approximation of $f( x_1, x_2 ) = x_1^2 + x_2^2$ around $a = ( 1, 2 )$.

Then, the total derivative

$$L_a( x - a ) = L_{(1,2)} ( x_1 - 1, x_2 - 2 ) = 2 ( x_1 - 1 ) + 4 ( x_2 - 2 ).$$

Hence,

$$L_{(1,2)}( u_1, u_2 ) = 2u_1 + 4u_2$$

is the total derivative.

Notation: We write $f'(a)$ for $L_a$,

$f'(a)( u_1, u_2 )$ for $L_{(1,2)}( u_1, u_2 )$, where $u_1, u_2$ are variables of $L_a$.
Total derivative gradient.jpg

Note that $f'(a)( u_1, u_2 )$ is linear with respect to $u_1, u_2$, not to $a$, and

$f'(a)$ is the name of the function,
$u_1, u_2$ are the variables of the function.
Partial derivatives as directional.jpg

Further,

$$u_1 = x_1 - 1,$$

$$u_2 = x_2 - 2.$$

Then with $f( x_1, x_2 ) = x_1^2 + x_2^2$, we obtain

$$\frac{\partial f}{\partial x_1} |_{(1,2)} ( x_1, x_2 ) = 2x_1 |_{(1,2)} = 2 \cdot 1 = 2,$$

$$\frac{\partial f}{\partial x_2} |_{(1,2)} ( x_1, x_2 ) = 2x_2 |_{(1,2)} = 2 \cdot 2 = 4.$$

With partial derivatives equal to $2$ and $4$, we obtain

$$\begin{array}{} f'( 1, 2) ( u_1, u_2 ) &= 2u_1 + 4u_2 \\ &= ( 2, 4 ) \cdot ( u_1 , u_2 ) \\ &= {\nabla}f( 1, 2 ) \cdot ( u_1 , u_2 ) {\rm \hspace{3pt} (as \hspace{3pt}} {\nabla}f( 1, 2 ) = ( 2, 4 )) \end{array}$$

which is linear ($f'( 1, 2)$ is a linear map).

Claim: $f'(a)(u) = {\nabla}f(a) \cdot u$, i.e. a computational formula for the total derivative. Note that $f'(a)(u)$ is defined via properties (1) and (2), and ${\nabla}f(a)$ is the gradient.

Partial derivatives and tangent plane.jpg

Here we have a function

$$z = f(x), x \in {\bf R}^n, z \in {\bf R},$$

its partial derivatives are computed and interpreted as tangent lines to the graph of $f$, within the corresponding vertical planes. Finally, these two lines span a plane, called the tangent plane.

Differentiability.jpg

Theorem. Suppose $a$ is an interior point of $D(f)$ with $f: {\bf R}^n {\rightarrow} {\bf R}$. Further suppose partial derivatives $\frac{\partial f}{\partial x_k}$ exist and are continuous on an open ball centered at $a$. Then $f'(a)$ exists, i.e. $f$ is differentiable.

So, if the tangent line exists, it is the best affine approximation. Or no tangent line exists,

Tangent plane and line.jpg

Example. Let $f( x, y ) = x^2$.

Consider $g(x) = x^2$.

What is the relation between $f'( x, y )$ and $g'(x)$?

$$\frac{\partial f}{\partial x} = g'(x),$$

$$\frac{\partial f}{\partial y} = 0.$$