This site is devoted to mathematics and its applications. Created and run by Peter Saveliev.

# Differentiation

## Differentiation over addition and constant multiple: the linearity

In this chapter, we will be taking a broader look at how we compute the rate of change.


What happens to the output function of differentiation as we perform algebraic operations with the input functions?

The idea of addition of the change is illustrated below:

Here, the bars that represent the change of the output variable are stacked on top of each other, then the heights are added to each other and so are the height differences. The algebra behind this geometry is very simple: $$(A+B)-(a+b)=(A-a)+(B-b).$$ The idea leads to the Sum Rule for Differences from Chapter 1: the difference of the sum of two sequences is the sum of their differences. Below is its analog.

Theorem (Sum Rule). (A) The difference quotient of the sum of two functions is the sum of their difference quotients; i.e., for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the difference quotients (defined at the corresponding secondary node) satisfy: $$\frac{\Delta(f+g)}{\Delta x}=\frac{\Delta f}{\Delta x}+\frac{\Delta g}{\Delta x}.$$ (B) The sum of two functions differentiable at a point is differentiable at that point and its derivative is equal to the sum of their derivatives; i.e., for any two functions $f,g$ differentiable at $x$, we have at $x$: $$\frac{d(f+g)}{d x}=\frac{d f}{d x}+\frac{d g}{d x}.$$

Proof. Applying the definition to the function $f+g$, we have: $$\begin{array}{lll} \Delta(f+g)(c)&=(f+g)(x+\Delta x)-(f+g)(x)\\ &=f(x+\Delta x)+g(x+\Delta x)-f(x)-g(x)\\ &=\big( f(x+\Delta x)-f(x) \big) +\big(g(x+\Delta x)-g(x) \big)\\ &=\Delta f(c)+\Delta g(c). \end{array}$$ Now, the limit with $c=x$: $$\begin{array}{lll} \frac{\Delta(f+g)}{\Delta x}(x)&=\frac{\Delta f}{\Delta x}(x)+\frac{\Delta g}{\Delta x}(x)&\text{ ...by SR...}\\ &\to\frac{d f}{d x}+\frac{d g}{d x} &\text{ as } \Delta x\to 0.\\ \end{array}$$ $\blacksquare$

In terms of motion, if two runners are running away from each other starting from a common location, then the distance between them is the sum of the distances they have covered.

The formula in the Lagrange notation is as follows: $$(f + g)'(x)= f'(x) + g'(x).$$

The same proof applies to subtraction of the change.

Exercise. State the Difference Rule.

In terms of motion, if two runners are running along with each other starting from a common location, then the distance between them is the difference of the distances they have covered.

The idea proportion of the change is illustrated below:

Here, if the heights triple then so do the height differences. The algebra behind this geometry is very simple: $$kA-ka=k(A-a).$$ The idea leads to the Constant Multiple Rule for Differences from Chapter 1: the difference of a multiple of a sequence is the multiple of the sequence's difference. Below is its analog.

Theorem (Constant Multiple Rule). (A) The difference quotient of a multiple of a function is the multiple of the function's difference quotient; i.e., for any function $f$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition and any real $k$, the difference quotients (defined at the corresponding secondary node) satisfy: $$\frac{\Delta(kf)}{\Delta x}=k\frac{\Delta f}{\Delta x}.$$ (B) A multiple of a function differentiable at a point is differentiable at that point and its derivative is equal to the multiple of the function's derivative; i.e., for any function $f$ differentiable at $x$ and any real $k$, we have at $x$: $$\frac{d(kf)}{dx}=k\frac{d f}{dx}.$$

Proof. Applying the definition to the function $c\,f$, we have: $$\begin{array}{lll} \Delta(k\cdot f)(c)&=(k\cdot f)(x+\Delta x)-(k\cdot f)(x)\\ &=k\cdot f(x+\Delta x)-k\cdot f(x)\\ &=k\cdot \big( f(x+\Delta x)-f(x) \big)\\ &=k\cdot \Delta f\, (c). \end{array}$$ Now, the limit with $c=x$: $$\begin{array}{lll} \frac{\Delta(kf)}{\Delta x}(x)&=\frac{k\Delta f}{\Delta x}(x)\\ &=k\frac{\Delta f}{\Delta x}(x)&\text{ ...by CMR...}\\ &\to k\frac{d f}{d x}(x)&\text{ as } \Delta x\to 0.\\ \end{array}$$ $\blacksquare$

In terms of motion, if the distance is re-scaled, such as from miles to kilometers, then so is the velocity -- at the same proportion.

The formula in the Lagrange notation is as follows: $$(k\cdot f)'(x) = k\cdot f'(x).$$ Here is another way to write these formulas in the Leibniz notation. This is the Sum Rule: $$\frac{d}{dx}\big( u+v \big) = \frac{du}{dx} + \frac{dv}{dx},$$ and the Constant Multiple Rule: $$\frac{d}{dx}\big( cu \big) = c\frac{du}{dx}.$$

The two theorems can be combined into one. It relies on the following idea: given two functions $f,g$, their linear combination is a new function $pf+qg$, where $p,q$ are two constant numbers.

Theorem (Linearity of Differentiation). (A) The difference quotient of a linear combination of two functions is the linear combination of their difference quotients; i.e., for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the difference quotients (defined at the corresponding secondary node) satisfy: $$\frac{\Delta(pf+qg)}{\Delta x}=p\frac{\Delta f}{\Delta x}+q\frac{\Delta g}{\Delta x}.$$ (B) A linear combination of two functions differentiable at a point is differentiable at that point and its derivative is equal to the linear combination of their derivatives; i.e., for any two functions $f,g$ differentiable at $x$, we have at $x$: $$\frac{d(pf+qg)}{d x}=p\frac{d f}{d x}+q\frac{d g}{d x}.$$


The hierarchy of polynomials and their derivatives was used in Chapter 7 to model free fall.

• The derivative of a constant polynomial is zero:

$$(c)'=0.$$

• The derivative of a linear polynomial is constant:

$$(mx+b)'=(mx)'+(b)'=m(x)'+0=m\cdot 1=m.$$

• The derivative of a quadratic polynomial is linear:

$$(ax^2+bx+c)'=(ax^2)'+(bx)'+(c)'=a(x^2)'+b(x)'+0=a\cdot 2x+b\cdot 1=2ax+b.$$ And so on: combined with the Power Formula, the two rules above allow us to differentiate all polynomials. Every time, the degree goes down by $1$! The general result is as follows.

Theorem. The derivative of a polynomial of degree $n>0$, $$f(x)=a_nx^n+a_{n-1}x^{n-1}+...+a_{2}x^2+a_{1}x+a_0,\ a_n\ne 0,$$ is a polynomial of degree $n-1$, $$f'(x)=na_nx^{n-1}+(n-1)a_{n-1}x^{n-2}+...+2a_{2}x+a_{1},\ a_n\ne 0.$$

Exercise. Prove the theorem.

## Differentiation over compositions: the Chain Rule

How does one express the derivative of the composition of two functions in terms of their derivatives?

Example. Treating functions as transformations suggest an easy answer.

• If the first transformation is a stretch by a factor of $2$, i.e., the derivative is $2$, and
• the second transformation is a stretch by a factor of $3$, i.e., the derivative is $3$, then
• the composition of the two transformations is a stretch by a factor of $3\cdot 2=6$, i.e., the derivative is $6$:

We multiply the derivatives. $\square$

Example. Let's confirm this idea with a very simple example. Consider two linear polynomials: $$\begin{array}{lllll} x&=qt&\Longrightarrow & \frac{\Delta x}{\Delta t}=\frac{dx}{dt}&=q\\ \quad\quad\circ&&&&\ \ \times\\ y&=mx&\Longrightarrow& \frac{\Delta y}{\Delta x}=\frac{dy}{dx}&=m\\ \hline y&=m(qt)=mqt&\Longrightarrow& \frac{\Delta y}{\Delta t}=\frac{dy}{dt}&=m\cdot q&=\frac{\Delta x}{\Delta t}\cdot\frac{\Delta y}{\Delta x}=\frac{dx}{dt}\cdot\frac{dy}{dx} \end{array}$$ We see their derivatives and, which is the same think for linear polynomials, their difference quotients. In either case, we see how the intermediate variable, whether it is the difference $\Delta x$ or the differential $dx$, is “cancelled”: $$\frac{\tiny{\Delta x}}{\Delta t}\cdot\frac{\Delta y}{\tiny{\Delta x}}=\frac{\Delta y}{\Delta t},\quad \frac{\tiny{dx}}{dt}\cdot\frac{dy}{\tiny{dx}}=\frac{dy}{dt}.$$

$\square$

Example. We pose the following problem. Suppose a car is driven through a mountain terrain. Its location and its speed, as seen on a map, are known. The grade of the road is also known. How fast is the car climbing?

We set up two functions, for the location and the altitude. Then their composition is what we are interested in:

The graph of the second function is literally the profile of the road.

We already know that if the location, $f$, depends on time continuously and the altitude, $g$, depends continuously on location, then the altitude depends on time continuously as well, $g\circ f$. We shall also see that the differentiability of both functions implies the differentiability of the composition.

However, let's first dispose of the “Naive Composition Rule”: $$(f \circ g)' \neq f'\circ g'.$$ We carry out, again, a “unit analysis” to show that such a formula simply cannot be true. Suppose

• $t$ is time measured in $\text{hr}$,
• $x=f(t)$ is the location of the car as a function of time -- measured in $\text{mi}$,
• $y=g(x)$ is the altitude of the road as a function of (horizontal) location -- measured in $\text{ft}$, and
• $y=h(t)=g(f(t))$ is the altitude of the road as a function of time -- measured in $\text{ft}$.

Then,

• $f'(t)$ is the (horizontal) velocity of the car on the road -- measured in $\frac{\text{mi}}{\text{hr}}$, and
• $g'(x)$ is the rate of incline (slope) of the road -- measured in $\frac{\text{ft}}{\text{mi}}$, with the input still measured in $\text{mi}$.

It doesn't even matter now what $h'$ is measured in; just try to compose these two functions... It is impossible because the units of the output of the former and the input of the latter don't match! However, this is possible:

• $f'(t)\cdot g'(x)$ is their product -- measured in $\frac{\text{mi}}{\text{hr}}\cdot \frac{\text{ft}}{\text{mi}}=\frac{\text{ft}}{\text{hr}}$; compare to
• $h'(t)$ is the altitude of the road as a function of time -- measured in $\frac{\text{ft}}{\text{hr}}$.

Why does this make sense?

• 1. How fast you are climbing is proportional to your horizontal speed.
• 2. How fast you are climbing is proportional to the slope of the road.

$\square$

Thus, the derivative of the composition of two linear functions is the product of the two derivatives! Considering the fact that, as far as derivatives at a fixed point are concerned, all functions are linear, we have strong evidence in support of this conjecture.

Unfortunately, derivatives aren't fractions! But difference quotients are: $$\frac{\Delta y}{\Delta x}\cdot\frac{\Delta x}{\Delta t}=\frac{\Delta y}{\Delta t}.$$ The only difference from the other rules we have considered is that there are two partitions and $f$ must map the partition for $t$ to the partition of $x$:

Theorem (Chain Rule). (A) The difference quotient of the composition of two functions is found as the product of the two difference quotients; i.e., for any function $x=f(t)$ defined at two adjacent nodes $t$ and $t+\Delta t$ of a partition and any function $y=g(x)$ defined at the two adjacent nodes $x=f(t)$ and $x+\Delta x=f(t+\Delta t)$ of a partition, we have the difference quotients (defined at the secondary nodes $c$ and $q=f(c)$ within these edges of the two partitions respectively) satisfy, provided $\Delta x\ne 0$: $$\frac{\Delta (g\circ f)}{\Delta t}(c)= \frac{\Delta g}{\Delta x}(q) \cdot \frac{\Delta f}{\Delta t}(c).$$ (B) The composition of a function differentiable at a point and a function differentiable at the image of that point is differentiable at that point and its derivative is found as a product of the two derivatives; specifically, if $x=f(t)$ is differentiable at $t=c$ and $y=g(x)$ is differentiable at $x=q=f(c)$, then we have: $$\frac{d (g\circ f)}{dt}(c)= \frac{dg}{dx}(q) \cdot \frac{df}{dt}(c).$$

Proof. The formula for difference quotients is deduced as follows: $$\begin{array}{lll} \frac{\Delta (g\circ f)}{\Delta t}(c)&=\frac{(g\circ f)(t+\Delta t)-(g\circ f)(t)}{\Delta t}\\ &=\frac{g(f(t+\Delta t))-g(f(t))}{f(t+\Delta t)-f(t)}\frac{f(t+\Delta t)-f(t)}{\Delta t}\\ &=\frac{g(x+\Delta x)-g(x)}{\Delta x}\frac{f(t+\Delta t)-f(t)}{\Delta t}\\ &=\frac{\Delta g}{\Delta x}(q) \cdot \frac{\Delta f}{\Delta t}(c). \end{array}$$ Now we are to take the limit of the formula, with $c=t$, as $$\Delta t \to 0.$$ Now, since $x=x(t)$ is continuous, we conclude that we also have: $\Delta x \to 0$. Therefore, we have: $$\begin{array}{lll} \ \frac{\Delta g}{\Delta t} &=&\ \frac{\Delta g}{\Delta x}(f(t))&\cdot&\ \ \frac{\Delta f}{\Delta t}(t)\\ \quad \downarrow&&\quad \downarrow&&\quad \downarrow\\ \ \frac{dg}{dt} & = &\ \frac{dg}{dx}(f(t))&\cdot&\ \ \frac{df}{dt}(t) \end{array}$$ The idea seems to have worked out... The trouble is, we assumed that $\Delta x \neq 0$! What if $x=f(t)$ is constant in the vicinity of $t$? A complete proof will be provided later. $\blacksquare$

Exercise. Find another, non-constant, example of a function $x=f(t)$ such that $\Delta f$ may be zero even for small values of $\Delta t$.

The formula in the Lagrange notation is as follows: $$(g\circ f)'(t) = g'(f(t))\cdot f'(t).$$

Example. Find the derivative of: $$y = (1 + 2x)^{2}.$$ The function is computed in two consecutive steps (that's how we know this is a composition):

• step 1: from $x$ we compute $1+2x$, and then
• step 2: we square the outcome of the first step.

We then introduce an additional, disposable, variable in order to store the outcome of step 1: $$u=1+2x.$$ Then step 2 becomes: $$y=u^2.$$ This is our decomposition: $x \mapsto u \mapsto y$. Now the derivatives: $$\begin{array}{llll} u & = 1 + 2x &\Longrightarrow&\frac{du}{dx} &= 2 \\ y & = u^{2} &\Longrightarrow&\frac{dy}{du} &= 2u \\ \text{CR } & &\Longrightarrow&\frac{dy}{dx} & = \frac{dy}{du}\cdot\frac{du}{dx} = 2u\cdot 2 = 4u. \end{array}$$ Done. But the answer must be in terms of $x$! Last step: substitute $u = 1 + 2x$. Then the answer is $4(1+2x)$. To verify, expand, $1 + 4x + 4x^{2}$, then use PF. $\square$

Example. Now a very simple example that doesn't allow us to circumvent CR. Let $$y=\sqrt{3x+1}.$$ This is the abbreviated computation (decomposition, the derivatives, CR): $$\begin{array}{llll} x \mapsto u=3x+1 \mapsto y=\sqrt{u}\\ \underbrace{x \mapsto u=3x+1} \\ \qquad \frac{du}{dx} = 3 \\ \qquad\qquad\qquad\underbrace{u \mapsto y=\sqrt{u}}\\ \underbrace{ \qquad\qquad\qquad \frac{dy}{du}= \frac{1}{2\sqrt{u}} } \\ \frac{dy}{dx} = \frac{du}{dx}\cdot\frac{dy}{du} = 3\cdot \frac{1}{2\sqrt{u}}= 3\cdot \frac{1}{2\sqrt{3x+1}}. \end{array}$$ $\square$

Example. Find the derivative of: $$z = e^{\sqrt{3x+1}}$$ Three functions this time: $$x \mapsto u = 3x+1 \ \mapsto y = \sqrt{u} \ \mapsto z = e^{y}.$$ Fortunately, we already know the derivative of the exponent from the last example. We just append that solution with one extra step: $$\begin{array}{llll} x \mapsto u=3x+1 \mapsto y=\sqrt{u} \mapsto z = e^{y}\\ \underbrace{x \mapsto u=3x+1} \\ \qquad \frac{du}{dx} = 3 \\ \qquad\qquad\qquad\underbrace{u \mapsto y=\sqrt{u}}\\ \underbrace{ \qquad\qquad\qquad \frac{dy}{du}= \frac{1}{2\sqrt{u}} } \\ \frac{dy}{dx} = \frac{du}{dx}\cdot\frac{dy}{du} = 3\cdot \frac{1}{2\sqrt{u}} \\ \qquad\qquad\qquad\qquad\qquad\qquad \underbrace{ y \mapsto z = e^{y} }\\ \underbrace{ \qquad\qquad\qquad\qquad\qquad\qquad \frac{dz}{dy}=e^y }\\ \frac{dz}{dx} = \left( \frac{du}{dx}\cdot\frac{dy}{du} \right) \cdot\frac{dz}{dy} =3\cdot \frac{1}{2\sqrt{u}}\cdot e^y=3\frac{1}{2\sqrt{3x+1}} e^{\sqrt{3x+1}}. \end{array}$$ We have applied CR twice! $\square$

The lesson we have learned is: three functions -- three derivatives -- multiply them: $$\begin{array}{rrr} &x &\mapsto u&\mapsto y&\mapsto z \\ \frac{dz}{dx} & = \frac{du}{dx} &\cdot \frac{dy}{du} &\cdot \frac{dz}{dy} \end{array}$$ These “fractions” appear to cancel again... $$\frac{dz}{dx} = \frac{\not{du}}{dx} \cdot \frac{\not{dy}}{\not{du}} \cdot \frac{dz}{\not{dy}}.$$ This is the Generalized Chain Rule about the derivative of the composition (a “chain”!) of $n$ functions.

The short version of the Chain Rule says:

• the derivative of the composition is the product of the derivatives,

as functions.

Example. However, if we fix the location $x=a$, we can make sense of the derivative of the composition as the composition of the derivatives, after all. Indeed, suppose at point $a$ we have the derivative $$\frac{dy}{dx}=m.$$ What if we, again, think of the differentials $dx$ and $dy$ as two new variables -- related to each other by the above equation?

Then we think of the derivative, $m$, not as a number but a linear function: $$dy=m\cdot dx.$$ If now there is another variable with $$\frac{dx}{dt}=q,$$ we think of $q$ as a linear function: $$dx=q\cdot dt.$$ Then, we have to substitute $q$: $$\begin{array}{lllll} x=x(t)&=qt&\Longrightarrow& dx&=q\cdot dt\\ \quad\quad\circ&\quad\circ&&&\quad\quad\circ\\ y=y(x)&=mx&\Longrightarrow& dy&=m\cdot dx\\ \hline y=y(x(t))&=m(qt)&\Longleftrightarrow& dy&=m\cdot (q\cdot dt) \end{array}$$ We have the composition! $\square$

We can use the Chain Rule to find formulas for other important functions.

Theorem. For any $a>0$, we have: $$\left( a^x\right)'=a^x\ln a.$$

Proof. We represent this exponential function in terms of the natural exponential function: $$a^x=e^{\ln a^x}=e^{x\ln a}.$$ Then, $$\left( a^x\right)'=\left( e^{x\ln a} \right)'\ \overset{\text{CR}}{=\! =\! =}\ e^{x\ln a} \cdot (x\ln a)'=a^x\cdot \ln a.$$ $\blacksquare$

Exercise. Use the idea from the proof above to find the derivative of $x^x$.

## Differentiation over multiplication and division

What happens to the output function of differentiation as we perform such an algebraic operation as multiplication with the input functions?

We already know that if the width and the height ($f$ and $g$) of a rectangle are changing continuously then so is its area ($f\cdot g$):

We shall also see that the differentiability of both dimensions implies the differentiability of the area.

However, let's first make sure we avoid the so-called “Naive Product Rule”: $$(f\cdot g)' \neq f'\cdot g'.$$ The formula is extrapolated from the Sum Rule but it simply cannot be true. Let's recast the problem in the terms of motion and take a good look at the units. Suppose

• $x$ is time measured in $\text{sec}$,
• $y=f(x)$ is the location of the first person -- measured in $\text{ft}$, and
• $y=g(x)$ is the location of the second person -- measured in $\text{ft}$.

Then

• $f'(x)$ is the velocity of the first person -- measured in $\frac{\text{ft}}{\text{sec}}$, and
• $g'(x)$ is the velocity of the second person -- measured in $\frac{\text{ft}}{\text{sec}}$.

Suppose they are running in two perpendicular directions (east and north), then

• $y=f(x)\cdot g(x)$ is the area of the rectangle enclosed by the two persons -- measured in $\text{ft}^2$.

Therefore,

• $y=\left( f(x)\cdot g(x) \right)'$ is the rate of change of the area -- measured in $\frac{\text{ft}^2}{\text{sec}}$.

Meanwhile,

• $f(x)'\cdot g(x)'$ is an unknown quantity -- measured in $\frac{\text{ft}}{\text{sec}}\cdot \frac{\text{ft}}{\text{sec}}=\frac{\text{ft}^2}{\text{sec}^2}$!

We do notice now that the product of the location and velocity gives the right units: $$f'f,\ g'g \text{ and also } f'g,\ g'f.$$ Which one(s)?

The correct idea -- cross-multiplication -- is illustrated below:

As the width and the depth are increasing, so is the area of the rectangle. But the increase of the area cannot be expressed entirely in terms of the increases of the width and depth! This increase is split into two parts corresponding to the two terms in the right-hand side of the formula below. It is based on the Product Rule for Differences from Chapter 1: $$\Delta (f \cdot g)(c)=f(x+\Delta x) \cdot \Delta g(c) + \Delta f(c) \cdot g(x).$$

Theorem (Product Rule). (A) The difference quotient of the product of two functions is found as a combination of these functions and their difference quotients. In other words, for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the difference quotients (defined at the corresponding secondary node $c$) satisfy: $$\frac{\Delta (f\cdot g)}{\Delta x}(c)=f(x+\Delta x) \cdot \frac{\Delta g}{\Delta x}(c) + \frac{\Delta f}{\Delta x}(c) \cdot g(x).$$ (B) The product of two functions differentiable at a point is differentiable at that point and its derivative is found as a combination of these functions and their derivatives; specifically, given two functions $f,g$ differentiable at $x$, we have: $$\frac{d (f\cdot g)}{dx}(x)=f(x) \cdot \frac{dg}{dx}(x) + \frac{df}{dx}(x) \cdot g(x).$$

Proof. $$\begin{array}{lll} \Delta (f \cdot g)(c)&=(f \cdot g)(x+\Delta x)- (f \cdot g)(x)\\ &=f(x+\Delta x) \cdot g(x+\Delta x)- f(x) \cdot g(x)\\ &=f(x+\Delta x) \cdot g(x+\Delta x)- f(x+\Delta x) \cdot g(x) +f(x+\Delta x) \cdot g(x)- f(x) \cdot g(x)\\ &=f(x+\Delta x) \cdot (g(x+\Delta x)- g(x)) +(f(x+\Delta x) - f(x)) \cdot g(x)\\ &=f(x+\Delta x) \cdot \Delta g(c) + \Delta f(c) \cdot g(x). \end{array}$$ Now, the limit with $c=x$: $$\begin{array}{lll} \frac{\Delta (f \cdot g)(x)}{\Delta x}&=f(x+\Delta x) \cdot \frac{\Delta g}{\Delta x} (c)&+ \frac{\Delta f}{\Delta x}(c) \cdot g(x)\\ &\quad\quad \downarrow\quad\quad \quad\ \downarrow&\quad\ \downarrow\quad \quad \quad \\ &\quad\ f(x)\quad \quad \cdot\frac{d g}{d x}(x)&+\ \frac{d f}{d x}(x)\ \cdot g(x)&\text{ as } \Delta x\to 0.\\ \end{array}$$ The first limit is justified by the fact that $f$, as a differentiable function, is continuous. $\blacksquare$

In terms of motion, it is as if two runners are unfurling a flag while running east and north respectively.

The formula in the Lagrange notation is as follows: $$(f \cdot g)'(x) = f(x)\cdot g'(x) + f'(x)\cdot g(x).$$

Example. Let $$y = xe^{x}.$$ Then, $$\begin{array}{lllll} u & = x & \Longrightarrow &\frac{du}{dx} &= (x)' = 1, \\ v & = e^{x} & \Longrightarrow &\frac{dv}{dx} &= (e^{x})' = e^{x}. \end{array}$$ Apply PR via “cross-multiplication”, the idea of which comes from the picture above: $$\frac{dy}{dx} = x\cdot e^{x} + e^{x}\cdot 1 = e^{x}(x + 1).$$ $\square$

Next, the derivatives under division? We already know that if the width and the height ($f$ and $g$) of a triangle are changing continuously then so is the tangent of its base angle ($f/g$):

We shall also see that the differentiability of either dimension implies the differentiability of the tangent.

However, let's first make sure we avoid the so-called “Naive Quotient Rule”: $$(f/ g)' \neq f'/ g'.$$ We can repeat the “unit analysis” to show that such a formula simply cannot be true. The runners still are running in two perpendicular directions, and we have:

• $y=f(x)/ g(x)$ is unitless, and then
• $y=\left( f(x)/ g(x) \right)'$ is measured in $\frac{1}{\text{sec}}$, while
• $f(x)'/ g(x)'$ is unitless!

The following is based on the Quotient Rule for Differences from Chapter 1: $$\Delta (f / g)(c)=\frac{f(x+\Delta x) \cdot \Delta g(c) - \Delta f(c) \cdot g(x)}{g(x)g(x+\Delta x)}.$$

Theorem (Quotient Rule). (A) The difference quotient of the quotient of two functions is found as a combination of these functions and their difference quotients. In other words, for any two functions $f,g$ defined at the adjacent nodes $x$ and $x+\Delta x$ of a partition, the difference quotients (defined at the corresponding secondary node $c$) satisfy: $$\frac{\Delta (f/ g)}{\Delta x}(c)=\frac{f(x+\Delta x) \cdot \frac{\Delta g}{\Delta x}(c) - \frac{\Delta f}{\Delta x}(c) \cdot g(x)}{g(x)g(x+\Delta x)},$$ provided $g(x),g(x+\Delta x) \ne 0$. (B) The quotient of two functions differentiable at a point is differentiable at that point and its derivative is found as a combination of these functions and their derivatives; specifically, given two functions $f,g$ differentiable at $x$, we have: $$\frac{d (f/ g)}{dx}(x)=\frac{f(x) \cdot \frac{dg}{dx}(x) - \frac{df}{dx}(x) \cdot g(x)}{g(x)^2},$$ provided $g(x) \ne 0$.

Proof. We start with the case $f=1$. Then we have: $$\begin{array}{lll} \frac{\Delta (1/g)(x)}{\Delta x}&=\frac{\frac{1}{g(x+\Delta x)}- \frac{1}{g(x)}}{\Delta x}\\ &=\frac{g(x)- g(x+\Delta x)}{\Delta x g(x+\Delta x)g(x)} \\ &=-\frac{g(x+\Delta x)- g(x)}{\Delta x}\cdot \frac{1}{g(x+\Delta x)\cdot g(x)} \\ &=-\frac{\Delta g}{\Delta x}(c)\cdot \frac{1}{g(x+\Delta x)\cdot g(x)} &\text{ with }c=x\\ &\to -\frac{dg}{dx}(x)\cdot\frac{1}{g(x) \cdot g(x)}&\text{ as } \Delta x\to 0. \end{array}$$ The limit of the second fraction is justified by the fact that $g$, as a differentiable function, is continuous. Alternatively, we represent the reciprocal of $g$ as a composition: $$z=\frac{1}{g(x)}\ \Longrightarrow\ z=\frac{1}{y},\ y=g(x)\ \Longrightarrow\ \frac{dz}{dy}=-\frac{1}{y^2},\ \frac{dy}{dx}=g'(x)\ \Longrightarrow\ \frac{dz}{dx}=-\frac{1}{g(x)^2}g'(x),$$ by the Chain Rule. Now the general formula follows from the Product Rule. $\blacksquare$

The formula is similar to the Product Rule in the sense that it also involves cross-multiplication.

The formula in the Lagrange notation is as follows: $$\left( \frac{f(x)}{g(x)} \right)' = \frac{f'(x)\cdot g(x) - f(x)\cdot g'(x)}{g(x)^2},$$

Example. The tangent: \begin{aligned} (\tan x)' & = \left( \frac{\sin x}{\cos x} \right)'\\ & \ \overset{\text{QR}}{=\! =\! =} \frac{(\sin x)' \cos x – \sin x (\cos x)'}{(\cos x)^{2}} \\ & = \frac{\cos x \cos x – \sin x (-\sin x)}{\cos^{2} x} \\ & = \frac{\cos^{2}x + \sin^{2}x}{\cos^{2}x} \quad \text{...use the Pythagorean Theorem...} \\ & = \sec^{2}x. \end{aligned} $\square$

In the Leibniz notation, this is the form of the Product Rule: $$\frac{d}{dx} \left(uv \right) = \dfrac{du}{dx}\cdot v + \dfrac{dv}{dx}\cdot u,$$ and the Quotient Rule: $$\frac{d}{dx} \left(\frac{u}{v}\right) = \dfrac{\dfrac{du}{dx}\cdot v – \dfrac{dv}{dx}\cdot u}{v^{2}}.$$

More examples of differentiation...

Example. Find $$(x^{2} + x^{3})' = \lim_{h \to 0} \frac{(x + h)^{2} +(x + h)^{3} - x^{2} - x^{3}}{h}=...$$ Seems like a lot of work... Instead use SR and PF: $$\begin{array}{lllll} (x^{2} + x^{3})' & = (x^{2})' + (x^{3})' \\ & = 2x +3x^{2}. \end{array}$$ $\square$

Example. We can differentiate any polynomial easily now: $$\begin{array}{lllll} (x^{77} + & 5x^{18} + 6x^{3} - x^{2} + 88)'& \text{ ...try to expand } (x+h)^{77} !\\ & \ \overset{\text{SR}}{=\! =\! =} (x^{77})' + (5x^{18})' + (6x^{3})' - (x^{2})' + (88)' \\ & \ \overset{\text{CMR}}{=\! =\! =}(x^{77})' + (5x^{18})' + (6x^{3})' - (x^{2})' + 0 \\ & \ \overset{\text{PF}}{=\! =\! =} 77x^{77 - 1} + 5\cdot 18x^{13 - 1} + 6\cdot 3x^{3 - 1} - 2x^{2 - 1} \\ & = 77x^{76} + 90x^{17} - 18x^{2} - 2x. \end{array}$$ $\square$

Example. Find $$\left( \frac{\sqrt{x}}{x^{2} + 1} \right)'.$$ Consider: $$\begin{array}{lllll} u & = \sqrt{x} &\Longrightarrow &\frac{du}{dx} &= \frac{1}{2\sqrt{x}}, \\ v & = x^{2} + 1 &\Longrightarrow &\frac{dv}{dx} &= 2x. \end{array}$$ Then, $$\frac{d}{dx} \left( \frac{u}{v} \right) = \frac{ \dfrac{1}{2\sqrt{x}} (x^2 + 1) - \sqrt{x}\cdot 2x}{( x^2 + 1)^2}.$$ No need to simplify. $\square$

Example. This is a different kind of example. Evaluate: $$\lim_{x \to 5} \frac{2^{x} - 32}{x - 5}.$$ It's just a limit. But we recognize that this is the derivative of some function. We compare the expression to the formula in the definition: $$f'(a) = \lim_{x \to a} \frac{f(x) - f(a)}{x - a},$$ and match. So, we have here: $$a = 5 ,\ f(x) = 2^{x}, \ f(5) = 2^{5} = 32.$$ Therefore, our limit is equal to $f'(5)$ for $f(x) = 2^{x}$. Compute: $$f'(x) = (2^{x})' = 2^{x} \ln 2,$$ so $$f'(5) = 2^{5} \ln 2 = 32 \ln 2.$$ $\square$


• right: differentiate, then down: add the results; or
• down: add them, then right: differentiate the result.

The result is the same! (Neither the Product Rule nor the Quotient Rule has such an interpretation.)

## The rate of change of the rate of change

If a function is known at the nodes of a partition, its difference quotient is also a function -- known at the secondary nodes. Can we treat the latter as a function too? What is the partition then? We saw in Chapter 7 how this idea is implemented in order to derive the acceleration from the velocity.

What can we say about the rate of change of this change? If we know only three values of a function (first line) at the ends of an interval, we compute the difference quotients along the two intervals (second line) and place the results at the corresponding edge: $$\begin{array}{ccccccc} -&f(x_1)&---&f(x_2)&---&f(x_3)&-&\\ -&-\bullet-&\frac{\Delta f}{\Delta x_2}&-\bullet-&\frac{\Delta f}{\Delta x_3}&-\bullet-&-\\ -&-\bullet-&---&\frac{\frac{\Delta f}{\Delta x_3} -\frac{\Delta f}{\Delta x_2}}{c_3-c_2}&---&-\bullet-&-&\\ &x_1&c_2&x_2&c_3&x_3&\\ \end{array}$$ To find the change of this new function, we carry out the same operation and place the result in the middle (third line).

Let's review the construction of the difference quotient.

First, we have an augmented partition of an interval $[a,b]$. We partition it into $n$ intervals with the help of the nodes (the end-points of the intervals): $$a=x_{0},\ x_{1},\ x_{2},\ ... ,\ x_{n-1},\ x_{n}=b;$$ and also provide secondary nodes: $$c_{1} \text{ in } [x_{0},x_{1}], \ c_{2} \text{ in } [x_{1},x_{2}],\ ... ,\ c_{n} \text{ in } [x_{n-1},x_{n}].$$

If a function $y=f(x)$ is defined at the nodes $x_k,\ k=0,1,2,...,n$, the difference quotient of $f$ is defined at the secondary nodes of the partition by: $$\frac{\Delta f}{\Delta x}(c_{k})=\frac{f(x_{k+1})-f(x_k)}{x_{k+1}-x_k},\ k=1,2,...,n.$$

The function represents the slopes of the secant lines over the nodes of the partition. In particular, when the location is represented by a function known only at the nodes of the partition, the velocity is then found in this manner. It is now especially important that we have utilized the secondary nodes as the inputs of the new function. Indeed, we can now carry out a similar construction with this function and find the acceleration!

We have now a new augmented partition, of what? The interval is $$[p,q],\ \text{ with } p=c_0 \text{ and } q=c_n.$$ We partition it into $n-1$ intervals with the help of the nodes that used to be the secondary nodes in the last partition: $$p=c_{1},\ c_{2},\ c_{3},\ ... ,\ c_{n-1},\ c_{n}=b.$$ Then the increments are: $$\Delta c_k=c_{k+1}-c_k.$$ Now, what are the secondary nodes? The primary nodes of the last partition of course! Indeed, we have: $$x_{1} \text{ in } [c_{1},c_{2}], \ x_{2} \text{ in } [c_{2},c_{3}],\ ... ,\ x_{n-1} \text{ in } [c_{n-1},c_{n}].$$

We apply the same construction to this partition to the function $g=\frac{\Delta f}{\Delta x}$. The difference quotient function of $g$ is defined at the secondary nodes of the new partition by: $$\frac{\Delta g}{\Delta x}(x_{k})=\frac{g(c_{k+1})-g(c_k)}{c_{k+1}-c_k},\ k=1,2,...,n.$$

Definition. The second difference quotient of $f$ is defined at the nodes of the partition (denoted) by: $$\frac{\Delta^2 f}{\Delta x^2}(x_{k})=\frac{\frac{\Delta f}{\Delta x}(c_{k+1})-\frac{\Delta f}{\Delta x}(c_k)}{c_{k+1}-c_k},\ k=1,2,...,n.$$

Note that there are:

• $n+1$ values of $f$ (at the nodes),
• $n$ values of $\frac{\Delta f}{\Delta x}$ (at the secondary nodes), and
• $n-1$ values of $\frac{\Delta^2 f}{\Delta x^2}$ (at the nodes except $a$ and $b$).

We will often omit the subscripts for the simplified notation: $$\frac{\Delta^2 f}{\Delta x^2}(x)=\frac{\frac{\Delta f}{\Delta x}(c+\Delta c)-\frac{\Delta f}{\Delta x}(c)}{\Delta c}.$$

Notice that the higher value of the second difference quotient means higher values of the curvature of the graph of $y=f(x)$. As another way to see this, imagine yourself driving along a straight part of the road and seeing the tree ahead to remain the same (no curvature), then, as you start to turn, the trees start to pass your field of vision from right to left (curvature):

This construction will be repeatedly used for approximations and simulations. It will be followed, when necessary, by taking its limit.

Let's differentiate $\sin x$ for the second time. In Chapter 7, we found its difference quotient over a mid-point partition with a single interval. This time we will need at least two intervals:

• three nodes $x$: $a-h$, $a$, and $a+h$, and
• two secondary nodes $c$: $a-h/2$ and $a+h/2$.

We use the two formulas for the difference quotients of $\sin x$ and $\cos x$ from Chapter 7. We write the former for the two secondary nodes, but we re-write the latter for the partition with two nodes $a-h/2,\ a+h/2$ and a single secondary node $x=a$: $$\begin{array}{lllll} \frac{\Delta}{\Delta x}(\sin x)&=\frac{ \sin (h/2)}{h/2}\cdot\cos c,& \frac{\Delta }{\Delta x}(\cos x)=-\frac{ \sin (h/2)}{h/2}\cdot\sin a,\\ \end{array}$$ Therefore, we have at $a$: $$\begin{array}{lllll} \frac{\Delta^2}{\Delta x^2}(\sin x)&=\frac{\Delta }{\Delta x}\left(\frac{\Delta}{\Delta x}( \sin x)\right)(a)\\ &=\frac{\Delta}{\Delta x}\left(\frac{ \sin (h/2)}{h/2}\cdot\cos c\right)&\text{ ...by the first formula... }\\ &=\frac{ \sin (h/2)}{h/2}\frac{\Delta \cos}{\Delta x}(a)&\text{ ...by CMR... }\\ &=\frac{ \sin (h/2)}{h/2}\left(-\frac{ \sin (h/2)}{h/2}\cdot\sin a\right)&\text{ ...by the second formula }\\ &=-\left(\frac{ \sin (h/2)}{h/2}\right)^2\cdot\sin a. \end{array}$$

Similarly, we find: $$\frac{\Delta }{\Delta x}(\cos x)=-\frac{ \sin (h/2)}{h/2}\cdot\sin c\ \Longrightarrow\ \frac{\Delta^2}{\Delta x^2}(\cos x)=-\left(\frac{ \sin (h/2)}{h/2}\right)^2\cdot\cos a.$$

For the exponential function, we need a left-end partition with two intervals:

• three nodes $x$: $a-h$, $a$, and $a+h$, and
• two secondary nodes $c$: $a-h$ and $a$.

Then, we find at $a$: $$\frac{\Delta }{\Delta x}(e^x)=\frac{ e^h-1}{h}\cdot e^{c-h/2}\ \Longrightarrow\ \frac{\Delta^2}{\Delta x^2}(e^x)=\left(\frac{ e^h-1}{h}\right)^2\cdot e^{a-h}.$$

## Repeated differentiation

Example. Let's continue to differentiate the sine: $$\begin{array}{lll} (\sin x)' & = \cos x &\\ (\cos x)' & = -\sin x & \Longrightarrow &(\sin x)' ' &=-\sin x\\ (-\sin x)' & = -\cos x & \Longrightarrow &(\sin x)' ' ' &=-\cos x\\ (-\cos x)' & = \sin x & \Longrightarrow &(\sin x )' ' ' ' &= \sin x. \end{array}$$ And we are back where we started, i.e., the differentiation process for this particular function is cyclic! $\square$

We use the following terminology and notation for the consecutive derivatives of function $f$: $$\begin{array}{|l|l|l|l|} \hline \text{function } & f & f^{(0)}&\\ \text{first derivative } & f' & f^{(1)}&\frac{df}{dx}\\ \text{second derivative } & f' '=(f')' & f^{(2)}=\left(f^{(1)}\right)'&\frac{d^2f}{dx^2}=\frac{d}{dx}\left( \frac{df}{dx} \right)\\ \text{third derivative } & f' ' '=(f' ')'& f^{(3)}=\left(f^{(2)}\right)'&\frac{d^3f}{dx^3}=\frac{d}{dx}\left( \frac{d^2f}{dx^2} \right)\\ ...&&...&...\\ n\text{th derivative } & & f^{(n)}=\left(f^{(n-1)}\right)'&\frac{d^nf}{dx^n}=\frac{d}{dx}\left( \frac{d^{n-1}f}{dx^{n-1}} \right)\\ ...&&...&...\\ \hline \end{array}$$


Note that, for a fixed $x$, the sequence of numbers: $$f(x),\ f'(x),\ f' '(x),\ ...,\ f^{(n)}(x),\ ...$$ is just that, a sequence, a concept familiar from Chapter 7. However, as the example of $\sin x$ shows, this sequence doesn't have to converge: $$\left( \sin x \right)^{(n)}\Big|_{x=0},\ n=0,1,2,3,...\ \leadsto\ 0,-1,0,1,0,...$$ We will see in Chapter 15 that some “linear combinations” of the derivatives that produce a sequence convergent to the function...

Let's try to compute as many consecutive derivatives as possible, or even all of them, for the functions below.

Example. The positive integer powers. The PF applies: $$(x^{n})' = nx^{n-1}.$$ The power decreases by $1$ every time. Therefore, $$(x^{n})^{ (n+1)} = 0.$$ Then, it stays $0$: $$(x^{n})^{ (n+1)} = (x^{n})^{ (n+2)}=...=0.$$ The powers in the sequence of derivatives decrease to $0$ and then remain constant. $\square$

Example. The exponent. Since $$(e^{x})' = e^{x},$$ we have: $$(e^{x})^{(n)} = e^{x}.$$ The function remains the same! The sequence of derivatives is constant. $\square$

Example. The trig functions. Same for both sine and cosine: \begin{aligned} (\sin x)^{(4n)} & = \sin x \\ (\cos x)^{(4n)} & = \cos x \end{aligned} The sequence of derivatives is cyclic for both functions. $\square$

Example. The negative integer powers. We apply PF again: \begin{aligned} (x^{-1})' & = -1x^{-2}, \\ (-x^{-2})' & = 2x^{-3},\\ ... \end{aligned} The power goes down by $1$ every time and, as a result, tends to $–\infty$. The sequence doesn't stop. $\square$

Exercise. Show that the same happens with all non-integer powers.


Warning: Starting in Chapter 17, we will see that the function and its derivative are two animals of very different breeds. As a result, the dynamics discussed above will disappear in higher dimensions.

The repeated differentiation process may fail to continue when the $k$th derivative is not differentiable, i.e., when the following limit does not exist: $$f^{(k)}(a)=\lim_{h\to 0} \frac{f^{(k-1)}(a+h)-f^{(k-1)}(a)}{h}.$$

Definition. A function $f$ is called twice, thrice, ..., $n$ times differentiable when $f',f' ',f' ' ',..., f^{(n)}$ exists. When the derivative exists for all $n$, we call the function smooth.

The functions that we have treated above are smooth inside their domains.

Example. This function is differentiable but not twice differentiable: $$f(x)=\begin{cases} -x^2&\text{ if } x<0;\\ x^2&\text{ if } x\ge 0. \end{cases}$$ Its graph looks smooth:

There is no doubt in which direction a beam of light would bounce off such a surface. However, let's compute the derivatives. It is easy for $x\ne 0$ because there is only one formula: $$f(x)=\begin{cases} -2x&\text{ if } x<0;\\ 2x&\text{ if } x> 0. \end{cases}$$ For the case of $x=0$, we consider the two one-sided limits: $$\lim_{h\to 0^-}\frac{f(0+h)-f(0)}{h}=\lim_{h\to 0^-}\frac{f(h)}{h}=\lim_{h\to 0}\frac{-h^2}{h}=\lim_{h\to 0}(-h)=0;$$ $$\lim_{h\to 0^+}\frac{f(0+h)-f(0)}{h}=\lim_{h\to 0^+}\frac{f(h)}{h}=\lim_{h\to 0}\frac{h^2}{h}=\lim_{h\to 0}h=0.$$ They match! Therefore, $$f'(0)=0.$$ We have discovered that $f'(x)=2|x|$. It's not differentiable at $0$! $\square$

Example. More examples of this kind:

• $\sin\frac{1}{x}$ is discontinuous at $x=0$;
• $x\sin\frac{1}{x}$ is continuous at $x=0$ but not differentiable;
• $x^2\sin\frac{1}{x}$ is differentiable at $x=0$ but not twice differentiable.

$\square$

Exercise. Prove the above statements.

Below we visualize the relation between these classes of functions:

What is the geometric meaning of these higher derivatives for a given function?

Let's consider the first derivative. It represents the slopes of the function. Then the second derivative represents the rate of change of these slopes. Notice how changing slopes are seen as rotating tangents:

Specifically, we see:

• decreasing slopes = tangents rotate clockwise;
• increasing slopes = tangents rotate counter-clockwise.

This matches our convention from trigonometry that counter-clockwise is the positive direction for rotations.

Even though we typically have functions with the $n$th derivative for each positive integer $n$, only the first two reveal something visible about the graph of the original function.

Above we compare

• the shapes of the patches of the graph of the function $f$ to the sign of the values of the first derivative $f'$; and
• the shapes of the patches of the graph of the function $f$ to the sign of the values of the second derivative $f' '$.

There are three main levels of analysis of a function:

• Analysis at level $0$: the values of $f$. We ask, how large? The findings are about the values, $x$- and $y$-intercepts, asymptotes and other large-scale behavior, periodicity, etc.
• Analysis at level $1$: the slopes of $f$. We ask, up or down? The findings are about the angles, increasing/decreasing behavior, critical points, etc.
• Analysis at level $2$: the rate of change of the slopes of $f$. We ask, concave up or down? The findings are about the change of steepness, concavity, telling a maximum from a minimum, etc.

We can go on and continue to discover more and more subtle but less and less significant properties of the function...

This three-level analysis also applies to our study of motion, below.

The derivative of the velocity and, therefore, the second derivative of the position, is called the acceleration. The concept allows one to add another level of analysis of motion:

• Analysis at level $0$: the location, where?
• Analysis at level $1$: the velocity, how fast? forward or back?
• Analysis at level $2$: the acceleration, how large is the force?

Suppose $t$ is time and $y$ is the vertical dimension, the height. Now the specific case of free fall... These are the initial conditions:

• $y_0$ is the initial height, $y_0=y\Big|_{t=0}$, and
• $v_y$ is the initial vertical component of velocity, $\frac{dy}{dt}\Big|_{t=0}$.

Then, we have: $$\begin{array}{lll} y&=y_0+v_yt-\tfrac{1}{2}gt^2&\Longrightarrow& \frac{dy}{dt}&=v_y&-gt&\Longrightarrow&\frac{d^2y}{dt^2}&=-g. \end{array}$$ Now, from the point of the physics of the situation, the derivation should go in the opposite direction:

• when there is no force, the velocity is constant;
• when the force is constant, the velocity is linear on time, etc.

However, at this point we still unable to answer these questions:

• How do we know that only the derivatives of constant functions and none others are zero?
• How do we know that only the derivatives of linear functions and none others are constant?
• How do we know that only the derivatives of quadratic functions and none others are linear?

This reversed process is called antidifferentiation. So far, we cannot justify even this, simplest conclusion: $$f'=0 \Longrightarrow f=c,\ \text{ for some real number }c.$$ We will study these and related questions in Chapter 9.

## Change of variables and the derivative

If the distance is measured in miles and time in hours, the velocity is measured in miles per hour. If the distance is measured in kilometers and time in minutes, the velocity is measured in kilometers per minute. In either case, we are dealing with the same functions just measured in different units. If the two distance functions match, do the velocity functions too?

Let's recall that we can interpret every composition as a change of variables. We are especially interested in a change of units because we often measure quantities in multiple ways:

• length and distance: inches, miles, kilometers, light years;
• time: minutes, seconds, hours, years;
• weight: pounds, kilograms, karats;
• temperature: degrees of Celsius, of Fahrenheit,
• etc.

How does such a change affect calculus as we know it?

If $$y=f(x)$$ is a relation between two quantities $x$ and $y$, then either one may be replaced with a new variable. Let's call them $t$ and $z$ respectively and suppose these replacements are given by some functions:

• case 1: $x=g(t)$;
• case 2: $z=h(y)$.

These substitutions create new relations:

• case 1: $y=k(t)=f(g(t))$;
• case 2: $z=k(x)=h(f(x))$.

The Chain Rule gives us the rate of change for each pair:

• case 1:

$$\frac{dk}{dt}=\frac{df}{dx}\frac{dg}{dt};$$

• case 2:

$$\frac{dk}{dx}=\frac{dh}{dy}\frac{df}{dx}.$$

Most often, the conversion formula of a change of units is linear.

This is for Case 1.

Theorem (Linear Chain Rule I). If $$g(t)=mt+b$$ and $y=f(x)$ is differentiable, then the derivative of $y=k(t)=f(g(t))$ is given by: $$k'(t)=mf'(mt+b).$$

Example. What if $x$ is time and we change the moment from which we start measuring time, e.g., the “daylight savings time”? We have: $$g(t)=t+t_0\ \Longrightarrow\ k'(t)=f'(t+t_0).$$ $\square$

Example. Suppose $x$ is time and $y$ is the location, then function $g$ may represent the change of units of time, such as to seconds, $x$, from minutes, $t$: $$x=g(t)=60t.$$ Then, the change of the units won't change a lot about our calculus:

• if $f$ is the location as a function of seconds, $k$ is the location as a function of minutes, and $k(t)=f(60t)$;
• also $f'$ is the velocity as a function of seconds, $k'$ is the velocity as a function of minutes, and $k'(t)=60f'(60t)$;
• also $f' '$ is the acceleration as a function of seconds, then $k' '$ is the acceleration as a function of minutes, and $k' '(t)=60^2f' '(60t)$.

Thus, the graphs of the new quantities describing motion are simply re-scaled versions of the graphs of the old ones. $\square$

This is for Case 2.

Theorem (Linear Chain Rule II). If $$h(y)=my+b,$$ and $y=f(x)$ is differentiable, then the derivative of $z=k(x)=h(f(x))$ is given by: $$k'(x)=mf'(x).$$

Example. What if $y$ is the location and we change the place from which we start measuring, e.g., the Greenwich meridian? We have: $$h(x)=y+y_0\ \Longrightarrow\ k'(x)=f'(x).$$ We can also change the direction of the $x$-axis: $$h(x)=-y\ \Longrightarrow\ k'(x)=-f'(x).$$ $\square$

Example. Suppose $x$ is time and $y$ is the location, then function $h$ may represent the change of units of length, such as from miles, $y$, to kilometers, $x$: $$z=h(y)=1.6y.$$ Then, the change of the units will change very little about the calculus that we have developed; the coefficient, $m=1.6$, is the only adjustment necessary. Furthermore,

• if $f$ is the location in miles, then $k$ is the location in kilometers: $k(x)=1.6f(x)$;
• also $f'$ is the velocity with respect to miles, $k'$ is the velocity with respect to kilometers, and $k'(x)=1.6f'(x)$;
• also $f' '$ is the acceleration with respect to miles, $k' '$ is the acceleration with respect to kilometers, and $k' '(x)=1.6f' '(x)$.

Thus, the quantities describing motion are simply replaced with their multiples. The new graphs are the vertically stretched versions of the old ones. $\square$

Example. Recall the example when we have a function $f$ that records the temperature -- in Fahrenheit -- as a function $f$ of time -- in minutes -- replaced with another to records the temperature in Celsius as a function $g$ of time in seconds:

• $s$ time in seconds;
• $m$ time in minutes;
• $F$ temperature in Fahrenheit;
• $C$ temperature in Celsius.

The conversion formulas are: $$m=s/60,$$ and $$C=(F-32)/1.8.$$

These are the relations between the four quantities: $$g:\quad s \xrightarrow{\quad s/60 \quad} m \xrightarrow{\quad f\quad} F \xrightarrow{\quad (F-32)/1.8\quad} C.$$ And this is the new function: $$F=k(s)=(f(s/60)-32)/1.8.$$ Then, by the Chain Rule, we have: $$\frac{dF}{ds}=\frac{dF}{dC}\frac{dC}{dm}\frac{dm}{ds}=\frac{1}{1.8}\cdot f'(m)\cdot \frac{1}{60}.$$ $\square$

Exercise. Provide a similar analysis for the sizes of shoes and clothing.

Example. The conversion of the number of degrees $y$ to the number of radians $x$ is: $$x=\frac{\pi}{180}y.$$ Then, $$\frac{dx}{dy}=\frac{\pi}{180}.$$ Therefore, the trigonometric differentiation formulas, such as $\left( \sin x \right)'=\cos x$, don't hold anymore! Indeed, let's denote sine and cosine for degrees by $\sin_dy$ and $\cos_dy$ respectively: $$\sin_dy=\sin \left( \frac{\pi}{180}y \right) \text{ and } \cos_dy=\cos \left( \frac{\pi}{180}y \right).$$ Then, $$\begin{array}{lll} \frac{d}{dy}\sin_d y&=\frac{d}{dy}\sin \left( \frac{\pi}{180}y \right)\\ &=\frac{\pi}{180}\cos \left( \frac{\pi}{180}y \right)\\ &=\frac{\pi}{180}\cos_dy. \end{array}$$ $\square$

Example. What if we are to change our unit to a logarithmic scale? For example, $$x=10^t.$$ Then, for any function $y=f(x)$, we have by the Chain Rule: $$\frac{dy}{dt}=\frac{dy}{dx}\Bigg|_{x=10^t}\cdot \left( 10^t \right)'=\frac{dy}{dx}\Bigg|_{x=10^t}10^t\ln 10.$$ The effect on the derivative is not proportional! $\square$


## Implicit differentiation and related rates

We differentiate functions; can we differentiate relations?

Recall from Chapter 2 that relations are represented by equations, but not the kind we are used to: $$\underbrace{x^{2}}_{\text{a number}} - \underbrace{1}_{\text{a number}}=0\quad \leadsto \text{ find a particular number } x.$$ After the substitution, the equation should be true. The equations we are interested in are equations of functions, such as the familiar equation of the circle: $$\underbrace{x^2}_{\text{a function}}+\underbrace{y^{2}}_{\text{a composition of two functions}} =0 \quad \leadsto \text{ find a particular function } y=y(x).$$ After the substitution, the equation should be true for all $x$.

The equation implicitly defines this function. As we have done in the past, we can make the function $y=y(x)$ explicit by solving the equation for $y$: $$y = \sqrt{1 - x^{2}} \text{ or } y=–\sqrt{1 - x^{2}}.$$

However, what if we want only the rate of change of this, unknown, function?

We will rely on the following fact: if two functions are equal, for all nodes $x$, of a partition then so are their difference quotients, for all secondary nodes $c$: $$f(x)=g(x) \text{ for all } x\ \Longrightarrow\ \frac{\Delta f}{\Delta x}(c)=\frac{\Delta g}{\Delta x}(c) \text{ for all } c.$$

Example (circle). Find the secant line through the two points on the circle of radius $1$ centered at $0$: $$(0,1) \text{ and } \left( \tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2} \right).$$

Typically, a curve has been the graph of a function $y = x^{2}$, $y = \sin x$, etc., given explicitly. This time the equation is: $$x^{2} + y^{2} = 1.$$ To find the slope of the secant line, we need the difference quotient of the function but there is no, explicit, function!

The idea is to consider the above equation as a relation between the two variables. In fact, we think of $y=y(x)$ as a function of $x$, i.e.: $$x^{2} + y(x)^{2} = 1.$$ We will also assume that

• the two $x$-values $x_0 =0$ and $x_1\frac{\sqrt{2}}{2}$ are nodes of a partition of the $x$-axis, and
• the two $y$-values $y_0 =1$ and $y_1= \frac{\sqrt{2}}{2}$ are nodes of a partition of the $y$-axis.

We apply the Chain Rule to both sides of the equation: $$\begin{array}{rll} \frac{\Delta }{\Delta x} \left( x^{2} + y^{2} \right) & = \frac{\Delta }{\Delta x} (1) &\Longrightarrow\\ \frac{\Delta }{\Delta x} x^{2} + \frac{\Delta }{\Delta x}y^{2} &= 0 &\Longrightarrow\\ (x_0+x_1) + (y_0+y_1) \frac{\Delta y}{\Delta x} &= 0 &\Longrightarrow\\ \frac{\Delta y}{\Delta x} &= -\frac{x_0+x_1}{y_0+y_1} &\text{ for } y_0+y_1\ne 0. \end{array}$$ We have found a formula for the difference quotient but it is still implicit -- because we don't have a formula for $y=y(x)$. Fortunately, we don't need the whole function, just those two points on its graph. We substitute these into the formula above to find: $$\frac{\Delta y}{\Delta x}= -\frac{0+\frac{\sqrt{2}}{2}}{1+\frac{\sqrt{2}}{2}}= -\frac{\sqrt{2}}{1+\sqrt{2}}.$$ Finally, from the point-slope formula we obtain the answer: $$y - \frac{\sqrt{2}}{2} = -\frac{\sqrt{2}}{1+\sqrt{2}}\left( x - \frac{\sqrt{2}}{2}\right).$$ We can automate this formula and find more secant lines:

$\square$

What about the derivative? We will rely on the following fact: if the values of two functions are equal for all $x$ then so are the values of their derivatives: $$f(x)=g(x) \text{ for all } x\ \Longrightarrow\ f'(x)=g'(x) \text{ for all } x.$$ We can put it simply as: if two functions are equal then so are their derivatives; i.e., $$\begin{array}{|c|}\hline\quad f=g \ \Longrightarrow\ f'=g' \quad \\ \hline\end{array}$$

Differentiating an equation of functions and finding the derivative of a function defined by this equation is called implicit differentiation.

Let's consider two examples of how this idea may help us with finding tangents to implicit curves.

Example (circle). Find the tangent line for the circle of radius $1$ centered at $0$ at the point $\left( \tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2} \right)$.

Typically, a curve has been the graph of a function $y = x^{2}$, $y = \sin x$, etc., given explicitly. This time the equation is: $$x^{2} + y^{2} = 1.$$ To find the slope of the tangent line, we need the derivative, but there is no function to differentiate!

Our approach is to differentiate the equation above as a relation between the two variables. As we differentiate, we think of $y=y(x)$ as a function of $x$, i.e.: $$x^{2} + y(x)^{2} = 1.$$ This is the result, via the Chain Rule: $$\begin{array}{rll} \frac{d}{dx} \left( x^{2} + y^{2} \right) & = \frac{d}{dx} (1) &\Longrightarrow\\ \frac{d}{dx} x^{2} + \frac{d}{dx}y^{2} &= 0 &\Longrightarrow\\ 2x + 2y \frac{dy}{dx} &= 0 &\Longrightarrow\\ \frac{dy}{dx} &= -\frac{x}{y} &\text{ for } y\ne 0. \end{array}$$

We have found a formula for the derivative, but it is still implicit -- because we don't have a formula for $y=y(x)$. Fortunately, we don't need the whole function, just a single point on its graph: $$x = \frac{\sqrt{2}}{2},\ y = \frac{\sqrt{2}}{2}$$ We substitute these into the formula above to find: $$\frac{dy}{dx}\Bigg|_{x = \frac{\sqrt{2}}{2},\ y = \frac{\sqrt{2}}{2}}= -\frac{x}{y}\Bigg|_{x = \frac{\sqrt{2}}{2},\ y = \frac{\sqrt{2}}{2}}= -1.$$ Finally, from the point-slope formula we obtain the answer: $$y - \frac{\sqrt{2}}{2} = -1\left( x - \frac{\sqrt{2}}{2}\right).$$

Note that we could use the explicit formula $y = \sqrt{1 - x^{2}}$ with the same result: $$\frac{dy}{dx} \overset{\text{CR}}{=} \frac{-2x}{2\sqrt{1 - x^{2}}} = -\frac{x}{1 - x^{2}},$$ after we substitute $x = \frac{\sqrt{2}}{2}$. However, it's only explicit for the upper half of the circle. For a point below the $x$-axis, we'd need to start over and use the other formula, $y = -\sqrt{1 - x^{2}}$.

Observe also that the derivative $\frac{dy}{dx}$ is undefined at $x= \pm 1$ (implicit or explicit) because the denominator is $0$. How do we find the tangent? From the formula we can proceed in two directions: $$x^{2} + y^{2} = 1 \leadsto \begin{cases} y \text{ depends on } x,\\ x \text{ depends on } y. \end{cases}$$ Then, we can try implicit differentiation of the same equation -- but with respect to $y$ this time. The computation is very similar, and the result is: $$\frac{dx}{dy} = -\frac{y}{x}.$$ The formula is defined for $y = 0$, at the points $(-1,0),\ (1, 0)$. Then, $\frac{dx}{dy} = 0$ at these points. Therefore, the tangent line is $x - 1 = 0 (y-0)$, or $x = 1$. $\square$

Example (Folium of Descartes). This curve is given by: $$x^{3} + y^{3} = 6xy.$$

We differentiate the equation as before: $$\frac{d}{dx} \left( x^{3} + y^{3} \right) = \frac{d}{dx} (6xy).$$ Using CR we notice that every time if we see $y$, the factor $\frac{dy}{dx}$ also appears: $$\begin{array}{rll} \frac{d}{dx} (x^{3}) + \frac{d}{dx} (y^{3}) & = 6\frac{d}{dx} (xy) \\ 3x^{2} + 3y^{2}\cdot \frac{dy}{dx} &= 6 \left(y + x\frac{dy}{dx} \right) . \end{array}$$ Solve for $\frac{dy}{dx}$. $$\begin{array}{rll} 3x^{2} + 3y^{2} \frac{dy}{dx} & = 6y + 6x \frac{dy}{dx} \\ (3y^{2} - 6x) \frac{dy}{dx} & = 6y – 3x^{2} \\ \frac{dy}{dx} & = \underbrace{\frac{6y – 3x^{2}}{3y^{2} - 6x}}_{\text{Fails at } (0,0)!} \end{array}$$ The end result is: if we know the location $(x, y)$, you know the slope of the tangent at that point. For example, at the tip of the curve we have $x=y$. Therefore, the slope is $\frac{dy}{dx}=-1$. $\square$

Note that in either example, we can cut the curve into pieces each of which is the graph of a function:

Now, implicit differentiation also helps with situations when several quantities depend on each other implicitly as well on time. If we differentiate this dependence equation, we get a dependence between their derivatives. The result is related rates.

Example (air balloon). Suppose we have an air balloon, spherical in shape. Air is pumped in it at the rate of $5 {}^{\text{in }^{3}}/_{\text{sec}}$. What is the rate of growth of the radius at different radii?

Step one in word problems: introduce variables; let

• $t$ be time,
• $V$ be the volume, and
• $r$ be the radius.

Next, $V$ depends on $t$ and at that moment we have $$\frac{dV}{dt} = 5,$$ according to the condition. Furthermore, this is a sphere, so $$V = \frac{4}{3}\pi r^{3}.$$ Here we see that $V$ also depends on $r$; altogether, this is the dependencies we face: $$\begin{array}{cccc} t &\to &r\\ &\searrow&\downarrow\\ &&V \end{array}$$ We could reverse the last arrow by finding the inverse: $r = \sqrt[3]{\frac{3}{4\pi}V}$. Instead, we differentiate the equation itself. Thus, if two variables are related (via an equation), then so are their derivatives, i.e., the rates of change (hence, “related rates”).

Keeping in mind that both $V$ and $r$ are functions of time, we differentiate the relation with respect to $t$: $$V= \frac{4}{3} \pi r^{3}.$$ The left-hand side is very simple: $$\frac{d}{dt}V=\frac{dV}{dt},$$ but in the right-hand side $r(t)^{3}$ is a composition: $$\frac{d}{dt}\left( \frac{4}{3} \pi r^{3} \right) = \frac{4}{3} \pi \cdot 3r^{2} \frac{dr}{dt}.$$ Thus, we have: $$\frac{dV}{dt} = \frac{4}{3} \pi \cdot 3r^{2} \frac{dr}{dt}.$$

Recall that the rate of change of $V$ is $5$ (at a given moment), so: $$5 = 4\pi r^{2} \frac{dr}{dt},$$ or $$\frac{dr}{dt} = \frac{5}{4\pi r^{2}}.$$

Next, what's the rate of growth of $r$ when $r = 1,\ r = 2,\ r = 3$? $$\begin{array}{lll} r = 1: & \frac{dr}{dt} = \frac{5}{4\pi}; \\ r = 2: & \frac{dr}{dt} = \frac{5}{16\pi}; \\ r = 3: & \frac{dr}{dt} = \frac{5}{36\pi}. \end{array}$$ Indeed, we see a slow-down. $\square$

Example (sliding ladder). Suppose a $10$ ft. ladder stands against the wall and its bottom is sliding at $2$ ft/sec. How fast is the top moving when it is $6$ ft from the floor?

Introduce variables:

• $x$ the distance of the bottom from the wall,
• $y$ the distance of the top from the floor, both functions of
• $t$ the time.

Translate the information, as well as the question, about the variables into equations: $$\begin{array}{ll|l} &\text{quantities:}&\text{functions:}\\ \hline \text{always}& x^{2} + y^{2} = 10^{2}&(x(t))^{2} + (y(t))^{2} = 10^{2}\\ \text{now}&\frac{dx}{dt} = 2&x'(t_0)=2\\ \text{now}&y = 6&y(t_0)=6 \\ \text{now}&\frac{dy}{dt} = ?& y'(t_0)=? \end{array}$$ That's the purely mathematical problem to be solved.

We differentiate the equation with respect to the independent variable, $t$: $$\begin{array}{rlll} \frac{d}{dt}\left( x^{2} + y^{2} \right) & = \frac{d}{dt}\left(100\right) \\ 2x\frac{dx}{dt} + 2y\frac{dy}{dt} & = 0,& \text{ solve for } \frac{dy}{dt} \\ \frac{dy}{dt} &= - \frac{x}{y}\frac{dx}{dt},& \text{ substitute } \\ &= -\frac{x}{6} 2, & \text{ now } x=8 \text{ comes from } x^{2} + y^{2} = 100, \\ &= -\frac{8}{6} 2 \\ & = -\frac{8}{3}. \end{array}$$ It is going down! $\square$

Exercise. Solve the problem for the moment when the ladder hits the floor.

Problem. Suppose you are driving at a speed $80$ mph when you see a police car positioned off $40$ feet the road. What is the radar gun's reading?

How does the radar gun work? In fact, how does a radar work? A signal is sent, it bounces off an object, and, when it comes back, the time lapse is recorded. Then, the distance to the object is computed as: $$S = \underbrace{\text{ signal's speed }}_{\text{ known }} \cdot \underbrace{\text{ time passed }}_{\text{ measured }}.$$

A radar run does this twice.

A signal is sent, it comes back, the time is measured. Then the second time:

• $S_{1} =$ speed $\cdot$ time, at time $t=t_1$,
• $S_{2} =$ speed $\cdot$ time, at time $t=t_2$.

Then, the reading is computed as: $$\text{ estimated speed }= \frac{\text{ change of distance }}{\text{ time between signals }}.$$ No radar gun can do better than that!

To summarize: $$\frac{dS}{dt}\approx \frac{\Delta S}{\Delta t},$$ where

• $\Delta S=S_{2} - S_{1}$ is the change of distance between the two cars,
• $\Delta t=t_2-t_1$ is the time between signals.

Now the question, is the reading of the radar gun $80$ m/h?

To get an idea of what can happen, consider this extreme example: what if you are just passing in front of the police car, like this?

It is conceivable that at time $t_1$ your car is the same distance from the intersection as it is past the intersection at time $t_2$. Then the $\Delta S=0$! So, the reading can be off by a lot...

These are the variables:

• $S$, the distance between the police car to yours.
• $P$, the distance between your car to the intersection.
• $t$, the time, the independent variable, also
• $D=40$, distance from the police car to the road.

Since $80$ m/h is your speed, $\frac{dP}{dt} = 80$. That's what the radar gun is meant to detect. But what does the radar measure in reality is $\frac{dS}{dt}$!

How good an approximation of the real velocity $\frac{dP}{dt}$ is the perceived velocity $\frac{dS}{dt}$? The spreadsheet contains a column of locations $P$ of your car (distances to the intersection), the next one is for the distance $S$ to the police car (plotted first), and finally the average rate of change of $S$.

As we can see, the approximation is best away from the intersection. But, within $75$ feet from the intersection, the reading will be less than $70$ mph!

Next, we establish a functional relation between the two via the Pythagorean Theorem: $$P^{2} + D^{2} = S^{2} \gets \text{These aren't numbers, but variables, i.e., functions.}$$ This connects $P$ and $S$, but not $\frac{dP}{dt}$ and $\frac{dS}{dt}$ yet. We differentiate equation with respect to $t$ to get there: \begin{aligned} \frac{d}{dt}\left(P^{2} + D^{2}\right) & = \frac{d}{dt}\left(S^{2}\right)\ \Longrightarrow \\ 2P\cdot \frac{dP}{dt} + 2D\underbrace{\frac{dD}{dt}}_{=0} &= 2S\cdot \frac{dS}{dt}\ \Longrightarrow \\ P\cdot\frac{dP}{dt} &= S\cdot \frac{dS}{dt}\ \Longrightarrow \\ \frac{dS}{dt} &= \frac{P}{S}\frac{dP}{dt}. \end{aligned} Thus, we finally have a relation between these functions. In fact, this is what the radar gun shows: $$\frac{dS}{dt} = \frac{P}{\sqrt{P^2+D^2}}\cdot 80.$$ We plot this relation below, to confirm our earlier conclusions:

Furthermore, we can simplify this relation: $$\cos \alpha = \frac{P}{S},$$ where $\alpha$ is the angle between the road ahead of you and the direction to the police car.

How does $\alpha$ change as you drive?

• Early: $\alpha$ is close to $0$, so $\cos \alpha$ close to $1$, and, therefore, $\frac{dS}{dt}$ is close to $80$.
• Then, as $\alpha$ increases, $\cos \alpha$ decreases toward $0$, and so does $\frac{dS}{dt}$.
• In the middle, we have $\alpha = \frac{\pi}{2}$, $\cos \alpha = 0$, $\frac{dS}{dt} = 0$.
• As $\alpha$ passes $\frac{\pi}{2}$, $\cos \alpha$ decreases to negative values, and so does $\frac{dS}{dt}$.
• Late: $\alpha$ approaches $\pi$, and $\cos \alpha$ approaches $1$, and, therefore, $\frac{dS}{dt}$ approaches $80$.

Conclusion: The radar gun always underestimates your speed: $$\left| \frac{dS}{dt} \right| < 80.$$ Unless, the police car is on the road! In that case, what can you do to “improve” the reading? What do you want $\alpha$ to be -- as large as possible!

## The derivative of the inverse function

Let's recall from Chapter 3 that for a given one-to-one and onto function $y=f(x)$, its inverse is the function, $x=f^{-1}(y)$, that satisfies $$f^{-1}(y)=x \text{ if and only if }f(x)=y.$$ An idea is that a function and its inverse represent the same relation:

• $x$ and $y$ are related when $y=F(x)$, or
• $x$ and $y$ are related when $x=F^{-1}(y)$.

For example, these are pairs of functions inverse to each other: $$\begin{array}{rl} y=x+2& \text{ vs. } & x=y-2,\\ y=3x&\text{ vs. } & x=\frac{1}{3}y,\\ y=x^2&\text{ vs. } & x=\sqrt{y} \quad\text{ for }x,y\ge 0,\\ y=e^x&\text{ vs. } & x=\ln y \quad\text{ for }y > 0. \end{array}$$

Can we express the derivative of the inverse of a function in terms of the derivative of the function?


Exercise. We can make any function one-to-one by restricting its domain. How?