This site is being phased out.
Chain rule of differentiation
The derivative of the composition is equal to the product of the derivatives
For vector functions: The derivative of the composition is equal to the composition of the derivatives = product of these matrices.
Theorem (Chain Rule). Suppose $h(x) = g( f(x) )$, where
$$f: {\bf R}^n {\rightarrow} {\bf R}^m, g: {\bf R}^m {\rightarrow} {\bf R}^q, h: {\bf R}^n {\rightarrow} {\bf R}^q. $$
Suppose further that
$g$ is differentiable at $y = f(a)$.
Then
$h'(a) = g'( f(a) ) f'(a)$.
Proof
Let
$g: {\bf R}^m {\rightarrow} {\bf R}^q$ differentiable at $y = b = f(a)$.
Then
and the derivative of the composition is equal to the composition of the derivatives.
By definition these three mean the existence of their best affine approximations
(1) $T_f(x) = f(a) + f'(a) ( x - a), \frac{f(x) - T_f(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$,
(2) $T_g(y) = g(b) + g'(b) ( y - b), \frac{g(y) - T_g(y)}{|| y - b ||} {\rightarrow} 0$ as $y {\rightarrow} b$.
Now we want to find what is the best affine approximation of the composition:
(3) $T_h(x) = h(a) + h'(a) ( x - a ), \frac{h(x) - T_h(x)}{|| x - a ||} {\rightarrow} 0$ as $x {\rightarrow} a$.
Idea: Prove that $T_g {\circ} T_f$ (satisfies (3)) is the best affine approximation of $h$.
(1) => Let $E_f(x) = \frac{f(x) - T_f(x)}{|| x - a ||}, E_f(x) {\rightarrow} 0$ as $x {\rightarrow} a$. Then
$$f(x) = T_f(x) + E_f(x) || x - a ||.$$
(2) => Let $E_g(y) = \frac{g(y) - T_g(y)}{|| y - b ||}, E_g(y) {\rightarrow} 0$ as $y {\rightarrow} b$. Then
$$g(y) = T_g(y) + E_g(y) || y - b ||.$$
Consider
$$h(x) = g( f(x) ) = T_g( f(x) ) + E_g( f(x) ) || f(x) - b ||,$$
where
Then
$$\begin{array}{} h(x) &= T_g( T_f(x) + E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= [ g(b) + g'(b) ( y - b ) ] ( T_f(x) + E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term}. \end{array}$$
With
$$y = T_f(x) + E_f(x) || x - a ||$$
we get
$$\begin{array}{} h(x) &= g(b) + g'(b) ( T_f(x) + E_f(x) || x - a || - b ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= g(b) + g'(b) ( T_f(x) - b ) + g'(b) ( E_f(x) || x - a || ) + {\rm \hspace{3pt} small \hspace{3pt} term} \\ &= g(b) + g'(b) ( T_f(x) - b ) + {\rm \hspace{3pt} small \hspace{3pt} term} \end{array}$$
since
$g'(b) ( E_f(x) || x - a || ) {\rightarrow} 0$ as $x {\rightarrow} a$ because $g'(b)$ is linear.
Now
$$\begin{array}{} h(x) &= g(b) + g'(b) ( f(a) + f'(a) ( x - a ) - b ) + {\rm \hspace{3pt} small \hspace{3pt} terms} \\ &= g( f(a) ) + g'( f(a) ) ( f'(a) ( x - a ) ) + {\rm \hspace{3pt} small \hspace{3pt} terms} \\ &= h(a) + g'( f(a) ) f'(a) ( x - a ) + {\rm \hspace{3pt} small \hspace{3pt} terms}. \end{array}$$
Define
$$T_h(x) := h(a) + g'( f(a) ) f'(a) ( x - a ),$$
then $h(x) - T_h(x)$ is small in the sense that
So $T_h(x)$ is the best affine approximation, its linear part is the derivative of $h ( g'( f(a) ) {\circ} f'(a) {\rightarrow}$ Chain Rule ).