The Chain Rule

A version of Taylor's theorem says that if a function g is second differentiable at a number t, then

g ( t+h ) = g ( t ) + g' ( t ) h + o( h )

where o(h)/h approaches 0 as h approaches 0. Let's apply this to if z = f ( x,y ) where x=p ( t ) and y = q ( t ). To begin with,

z ( t+h ) = f ( x ( t+h ), y ( t+h ) )

Taylor's theorem and the chain rule thus for the 2nd argument of f yield

z ( t+h ) = f ( x ( t+h ), y ( t ) ) + f[y] ( x ( t+h ), y ( t ) ) dy/dt h + o( h )

Application to the second argument of f and f[x] then yield

z ( t+h ) = f ( x ( t ), y ( t ) ) + f[x] ( x ( t ), y ( t ) ) dx/dt h + f[y] ( x ( t ), y ( t ) ) dy/dt h + f[xy] ( x ( t ), y ( t ) ) dx/dt dy/dt h^2 + o( h )

and gathering powers of h then yields

z ( t+h ) = z ( t ) + ( f[x] dx/dt + f[y] dy/dt ) h + o1( h )

where o1( h ) contains all powers of h with exponents greater than 1. Taylor's theorem thus implies that

dz/dt = f[x] dx/dt+f[y] dy/dt

which is known as the chain rule for 2 variables.

In practice, it is often easier to simply substitute x ( t ) and y ( t ) and differentiate directly. However, the chain rule for 2 variables is a valuable theoretical tool, as we will soon see. For now, let's look at an example.

First, let's define a function f and apply the chain rule. In doing so, we use the inert "Diff" differential operator because it keeps track of where we will need to evaluate a derivative later.

> f:=x^2+y^4;
dz_dt:=diff(f,x)*Diff(x,t)+diff(f,y)*Diff(y,t);

>

Now let's define functions x ( t ) and y ( t ) simplify the derivatives above. The "value" function will cause the "Diff" operators to be evaluated.

> x:=t^2;
y:=t^3;
dz_dt:=dz_dt;
dz/dt=value(%);

>

Notice that we could have obtained the same thing by simply substituting for x and y :

> f; #redefined by definitions above
dz/dt=diff(f,t);
x:='x':y:='y':

>

So what then is the value of the Chain rule? First, it relates derivatives of functions of 2 variables to gradients and Hessians. Indeed, notice that

dz/dt = f[x] dx/dt+f[y] dy/dt = <f[x],f[y]> . <dx/dt,dy/dt>

Since v (t) = <dx/dt,dy/dt> is the velocity of the vector-valued function r ( t ) = <x(t),y(t)> , the chain rule can be written

dz/dt = grad( f ) . v

Moreover, the chain rule also implies that

df[x]/dt = f[xx] dx/dt + f[xy] dy/dt and df[y]/dt = f[xy] dx/dt + f[yy] dy/dt

which in turn implies that

d/dt grad( f ) = d/dt <f[x],f[y]> = < f[xx] dx/dt + f[xy] dy/dt , f[xy] dx/dt + f[yy] dy/dt > = H[f] v

where H[f] is the hessian matrix of f. Thus, the second derivative is

d^2*z/(dt^2) = d(grad(f))/dt . v + grad( f ) . dv/dt

which simplifies to

d^2*z/(dt^2) = ( H[f] v ) . v + grad( f ) . a

where a is the acceleration vector of r ( t ).