The Chain Rule
A version of Taylor's theorem says that if a function g is second differentiable at a number t, then
g ( t+h ) = g ( t ) + g' ( t ) h + o( h )
where
approaches 0 as
h
approaches 0. Let's apply this to if
z = f
(
x,y
) where
x=p
(
t
) and
y
=
q
(
t
). To begin with,
z ( t+h ) = f ( x ( t+h ), y ( t+h ) )
Taylor's theorem and the chain rule thus for the 2nd argument of f yield
z
(
t+h
) =
f
(
x
(
t+h
),
y
(
t
) ) +
(
x
(
t+h
),
y
(
t
) )
h +
o(
h
)
Application to the second argument of
f
and
then yield
z
(
t+h
) =
f
(
x
(
t
),
y
(
t
) ) +
(
x
(
t
),
y
(
t
) )
h
+
(
x
(
t
),
y
(
t
) )
h +
(
x
(
t
),
y
(
t
) )
+
o(
h
)
and gathering powers of h then yields
z
(
t+h
) = z (
t
) + (
+
)
h +
o1(
h
)
where o1( h ) contains all powers of h with exponents greater than 1. Taylor's theorem thus implies that
which is known as the chain rule for 2 variables.
In practice, it is often easier to simply substitute x ( t ) and y ( t ) and differentiate directly. However, the chain rule for 2 variables is a valuable theoretical tool, as we will soon see. For now, let's look at an example.
First, let's define a function f and apply the chain rule. In doing so, we use the inert "Diff" differential operator because it keeps track of where we will need to evaluate a derivative later.
>
f:=x^2+y^4;
dz_dt:=diff(f,x)*Diff(x,t)+diff(f,y)*Diff(y,t);
>
Now let's define functions x ( t ) and y ( t ) simplify the derivatives above. The "value" function will cause the "Diff" operators to be evaluated.
>
x:=t^2;
y:=t^3;
dz_dt:=dz_dt;
dz/dt=value(%);
>
Notice that we could have obtained the same thing by simply substituting for x and y :
>
f; #redefined by definitions above
dz/dt=diff(f,t);
x:='x':y:='y':
>
So what then is the value of the Chain rule? First, it relates derivatives of functions of 2 variables to gradients and Hessians. Indeed, notice that
=
.
Since
v
(t) =
is the velocity of the vector-valued function
r
(
t
) =
, the chain rule can be written
= grad(
f
)
.
v
Moreover, the chain rule also implies that
=
+
and
=
+
which in turn implies that
grad(
f
) =
= <
+
,
+
> =
v
where
is the
hessian matrix
of
f.
Thus, the second derivative is
=
.
v
+ grad(
f
)
.
which simplifies to
= (
v
)
.
v
+ grad(
f
)
. a
where a is the acceleration vector of r ( t ).