# Directional Derivatives

The partial derivatives we've studied in the last lecture allow us to compute how fast a multivariable function is changing along the x and y (and possibly z) directions. But what if we want to know how fast a function is changing along an arbitrary direction? Directional derivatives are the easy way to do this.

When we want to know the rate of change of a function in a given direction, we must specify what direction we're talking about. Clearly, vectors are perfectly suited to our purposes.

The direction derivative is defined just as you might expect: pick a vector $\vec{h}$ in the direction you want to differentiate in, and we define

$\partial_\vec{h} f(\vec{p}) = \lim_{|\vec{h}|\to 0} \frac{f(\vec{p}+\vec{h})-f(\vec{p})}{|\vec{h}|}$

where $\vec{p}=x\vec{i}+y\vec{j}$, ie, $f(\vec{p}) = f(x,y)$

Is there an easier way to compute these kinds of derivatives, without Only vectors of unit length are useful to us; other lengths will mess up our rates of change.

So, first we pick a vector, $\vec{u}$. Now we must normalize it to give it unit length: this is easy enough, we define a new vector, $\vec{w}=\vec{u}/|\vec{u}|$ will do the trick.

Our directional derivative is now quite easy to compute:

$\partial_\vec{u} f = \begin{pmatrix}\partial_x f\\\partial_y f\end{pmatrix} \cdot \vec{u}$

We've discussed partial derivatives, but it seems there should be a way to find a derivative which is in some sense "complete." We'll address this concept for vector fields in the next lecture, but for now let's examine the gradient, a differential operator for scalar functions. For every point in a scalar field, the gradient is a vector in the direction of greatest change, with length equal to the rate of change in that direction (ie, the directional derivative).

$\vec{grad} f = \vec{i}\partial_x f+ \vec{j}\partial_y f + \vec{k}\partial_z f$.

If we wanted to construct a more concise version of this, we can always form the "vector" often called del or nabla:

$\nabla = \vec{i}\frac{\partial}{\partial x} + \vec{j}\frac{\partial}{\partial y} + \vec{k}\frac{\partial}{\partial z}$.

Then, if we just take this operator of f, we get the gradient: $\vec{grad} f = \nabla f$. Remember this operator; we'll encounter it again.

# Parametric Curves and Surfaces

Earlier, we described lines in three dimensional space by referring to a non-coordinate variable, t. In the challenging exercises, we also explored using this method to create curves in space. This method of describing geometric objects is called a parametric representation. Parametric representations are able to describe more varied and complex figures than the kinds of curves which can be described by equations like y = f(x) or z = f(x,y), and often the parametric equations can yield more insights into the structure of the object than a representation using only coordinates.

Just as it is customary to use x,y,z for coordinates, so it is customary to use t as the parameter for curves, and u,v as the parameters for surfaces. Occasionally, we use (θ,φ) when the parameters are easily envisioned as angles.

## Curves

In the plane, we frequently encounter curves which cannot be described in y = f(x) form. The circle is the simplest example of this - it takes two such equations to represent a circle, $y=\sqrt{1-x^2}, y=-\sqrt{1-x^2}$, and these equations become impossible to differentiate at the points (-1,0) and (1,0). This violates our intuition about what a circle is: it's not made of two pieces stuck together, and there are no points different from any other.

Let's take a look at a parametric representation of the circle: x(t) = cos(t),y(t) = sin(t). What does this mean? If we picture a point traveling in the plane, these equations give its x and y coordinates at some time t. At time t=0, the point is at (1,0). If you've memorized your essential trig function values, it won't be difficult for you to complete a graph of this and you'll find this does indeed describe a circle.

Frequently, we describe a parametric representation as a vector function: $\vec{r}(t) = \vec{i}\cos(t) + \vec{j}\sin(t) \$. This gives us one tidy function describing a circle, which is differentiable everywhere. Quite an improvement over our previous representation of the circle!

It is just as simple to describe curves in space, and students completing the challenging exercises will already have some familiarity with this. In fact, this method is used to describe curves so often, we even call lines "curves" - they're just straight curves. Below is a curve you're probably familiar with in space, the helix:

$\vec{r} = \vec{i}\cos(t) + \vec{j}\sin(t) + \vec{k}t \$

Now take a look at these two curves:

$\vec{r}(t) = \vec{i}2t - \vec{j}t/2 + \vec{k}t \$

$\vec{r} = \vec{i}2t^3 - \vec{j}t^3/2 + \vec{k}t^3 \$.

Compute some points along these curves, and you'll see that they're the same curve. A single geometric object can have many different parametric representations.

expand

## Surfaces in Space

We also frequently describe surfaces in space with parametric representations. Let's take a look at an illuminating example, the unit sphere.

$x =\sin(\phi) \cos(\theta) \$

$y =\sin(\phi) \sin(\theta) \$

$z =\cos(\phi) \$

Before continuing, make sure you understand why these equations describe a sphere. For example, set φ = π / 2; then these describe a circle in three dimensions. What happens as we vary φ?

Now, there are several important points to make at this point. Remember sin and cos are both periodic, with sin(u) = sin(u + 2π) for any u, and similarly for cos, so many different values of (θ,φ) are mapped to the same point on the sphere. To prevent this, let's impose the limits $0\leq \theta <2\pi, 0\leq \phi \leq \pi$. The second point to note is that this doesn't completely solve the problem: every point (θ,0) is taken to (0,0,1), regardless of the value of θ. A similar problem arises for points (θ,φ).

Although this sort of thing happens often, it doesn't actually cause much of a difficulty in the analysis of the surface, which we turn to now.

# Extrema of Multivariate Functions

In single variable calculus, we found that extrema, that is, maximum and minimum values for a function, could only occur at points where the derivative was 0. Dealing with multivariate functions, it should come as no surprise, then, that extrema can only occur when all of the partial derivatives are 0, or, to say the same thing, that the gradient is the zero length vector $\vec{0}$.

As in the single variable case, having $\nabla f(x_0,y_0,z_0)=0$ isn't sufficient to demonstrate that an extreme value of a function exists. It is necessary to rule out the possibility of a non-extremal stationary point, and, as in the single variable case, this is done with an examination of the second-order derivative.

In the single variable case, we recall, a function $y=f(x) \$ has an extrema at a point $x=x_0 \$ if $f'(x_0)=0 \$ and $f''(x_0)\neq0$. On the other hand, we have an inflection point if $f'(x_0)=f''(x_0)=0 \$. See the illustration at right.

It should come as no surprise then that we examine the second-order derivative of a multivariate function to determine when an extrema has taken place. We examine one of the most important operators in higher mathematics now, the Laplacian: $\Delta = \nabla^2$.

In Cartesian coordinates, we will have

$\Delta f = \frac{\partial^2 f}{\partial x^2}+\frac{\partial^2 f}{\partial y^2}$

and in three dimensions, we will of course add on the term $\partial_{zz}f$. In polar coordinates, the Laplacian is

$\Delta f = {1 \over r} {\partial \over \partial r} \left( r {\partial f \over \partial r} \right) + {1 \over r^2} {\partial^2 f \over \partial \theta^2} \$

and in spherical coordinates,

$\Delta f = {1 \over r^2} {\partial \over \partial r} \left( r^2 {\partial f \over \partial r} \right) + {1 \over r^2 \sin \phi} {\partial \over \partial \phi} \left( \sin \phi {\partial f \over \partial \phi} \right) + {1 \over r^2 \sin^2 \phi} {\partial^2 f \over \partial \theta^2}$

where we maintain our convention of using θ for the azimuth and φ for the zenith angle. For more on how these different forms of the Laplacian were calculated, see the Laplacian article.

# Optimization with Constraints: Lagrange Multipliers

We will introduce Lagrange multipliers with functions of only two variables, so that we can visualize what we're talking about easily. First, we will develop the theory, and then we will work a concrete example to demonstrate the technique.

## The Gradient and the Level Curve

The first order of business will be to show that the direction of the gradient of a function is perpendicular to its level curve.

Let's pick a point $\vec{x}_0 = (x_0,y_0,z_0)$ and define its isocurve $\left\{{\vec{x}:f(\vec{x})=f(\vec{x_0 }) }\right\}$. We will parametrize this isocurve with a function $\vec{\gamma}(t)$, and adjust it so that $\vec{\gamma}(0)=\vec{x_0}$. Then note that $f(\vec{\gamma}(0))=f(\vec{x_0})$.

Let us investigate the derivative of $f ( \vec{\gamma}(t))$. By the chain rule, it is simple $\nabla f (\vec{\gamma}(t))\cdot\vec{\gamma}'(t)$. Since $\vec{\gamma}(0)=\vec{x_0}$, when we evaluate the derivative of $f ( \vec{\gamma}(t))$ at $t=0 \$ we get $\nabla f(\vec{x_0}) \cdot \vec{\gamma}'(0)$.

Since $\vec{\gamma}$ parametrizes the level curve of f, it is obvious that its derivative should be a vector tangent to the curve. If this expression is 0, and neither the gradient nor the derivative of $\vec{\gamma}$ are themselves 0, then we are correct in our belief that that the gradient of a function is perpendicular to its level curves by the definition of dot product. But remember, $\vec{\gamma}$ parametrizes a level set of f: for some constant c, it satisfies $f(\vec{\gamma}(t))=c$ by definition, and hence any derivative of $f(\vec{\gamma}(t))$ is zero.

## Extrema with regard to a constraint

Optimization with constraint.

Suppose that we wanted to know what the greatest (or least) value of a multivariate function was, but only with respect to a curve. For example, a person might wish to know the greatest wind speed at a point on Earth. Well, the wind speed at different points on Earth could be described as a function of two variables, longitude and latitude, but suppose this person doesn't care what the greatest wind speed on Earth is, just what the greatest windspeed along say, a route taken by an airplane.

The first step is to describe the function you wish to find the extreme of (in the above example, windspeed) as f(x,y). Next, you want to be able to describe the curve you wish to maximize f on as g(x,y) = c for some constant c.

Let $\vec{p}=(x_p,y_p)$ be some point on g(x,y) = c.Note that $\nabla f (\vec{p})$ and $\nabla g(\vec{p})$ do not point along the same line, then we can always travel along g(x,y) = c in the direction of $\nabla f (\vec{p})$ to get values for f larger than $f(\vec{p})$, or in the opposite direction of $\nabla f (\vec{p})$ to get values for f smaller than $f(\vec{p})$. Hence, $\vec{p}$ cannot be a place where f achieves a local extrema on g if $\nabla f(\vec{p})$ and $\nabla g(\vec{p})$ do not point along the same line. The logical conclusion, then, is that the only points $\vec{u}$ on g(x,y) = c that could possibly be local extrema for f are those points $\vec{u}$ where $\nabla f(\vec{u}) \propto \nabla g(\vec{u})$, or, stated another way, there exists some λ for which $\nabla f(x_0,y_0) = \lambda \nabla g(x_0,y_0)$.

By solving this equation for x0,y0,z0, we can plug our values for x0,y0,z0 into f and get the value of f at this point. We can then check that it is an extrema, and find which kind.

## An Example Problem

Let's work a problem which would be listed in the "challenging" problem section, which will require us to draw on a number of different things we've learned. This problem is very similar to something one would find on the AP Calculus exam.

Problem: Find the point of intersection of $\left\{{\vec{u}\in\mathbb{R}^3: \vec{u}=x\vec{i}+y\vec{j}+z\vec{k}, x^2+y^2 =1}\right\}$ and $\left\{{\vec{v}=x\vec{i}+y\vec{j}+z\vec{k}\in\mathbb{R}^3:z=3x-2y}\right\}$

This is a formidable problem! Let's break it down. The first set described begins with the statement $\vec{u}\in\mathbb{R}^3$. What does this mean? Well, these are vectors in three-dimensional space we're talking about, since it describes them as being in $\mathbb{R}^3$. But they are also described as having x2 + y2 = 1, which is an equation for a unit circle in the xy plane. Since these vectors do not have their z restricted, the first set describes a cylinder in space.

The second set is just a plane, that is, the graph of z = 3x − 2y. So the problem is asking us, of the points where this plane and this cylinder intersect, which has the highest value of z? A natural question to ask now might be, what does this set of intersection look like? It turns out it's an oval slanted at a strange angle in space, but actually, that's not terribly important. What's important is noticing that finding the highest z value of the points of intersection of this cylinder and this plane amounts to finding the largest z value achieved by the function 3x − 2y over the set x2 + y2 = 1.

So, using the method of Lagrange multipliers, we set f(x,y) = 3x − 2y and g(x,y) = x2 + y2 = 1, and compute their gradients:

$\nabla f = \vec{i}\frac{\partial}{\partial x} (3x-2y) + \vec{j}\frac{\partial}{\partial y}(3x-2y) = 3\vec{i}-2\vec{j}$

$\nabla g = \vec{i}\frac{\partial}{\partial x} (x^2+y^2) + \vec{j}\frac{\partial}{\partial y}(x^2+y^2) = 2x\vec{i}+2y\vec{j}$

Now must simply solve the equation

$\begin{pmatrix}3\\2\end{pmatrix} = \lambda \begin{pmatrix}2x\\2y\end{pmatrix}$

We already know x2 + y2 = 1, so we can just write $y=\sqrt{1-x^2}$ and we get

$\begin{pmatrix}3\\2\end{pmatrix} = \lambda\begin{pmatrix}2x\\2\sqrt{1-x^2}\end{pmatrix}$

So x = 3 / (2λ), and then $2=2\lambda\sqrt{1-9/(4\lambda^2)}= \lambda^2 - 9/4$

Dividing out the 2 and squaring both sides gives $1=\lambda^2 (1-\frac{9}{4\lambda^2})=\lambda^2-9/4$

or $13/4 = \lambda^2 \implies \lambda = \pm \frac{1}{2}\sqrt{13}$.

Plugging this into our formula x = 3 / (2λ) gives $x=\pm3/\sqrt{13}$, then our formula x2 + y2 = 1, along with the logic that the highest point will be precisely opposite the lowest, gives $y=\pm2/\sqrt{13}$. These two points are the highest and lowest values of f along g=1. So by plugging these into f, we should be able to figure out which is the highest point of intersection. First lets check that these points satisfy g(x,y) = 1:

$g(\pm\frac{3}{\sqrt{13}},\pm\frac{2}{\sqrt{13}}) = \frac{9}{13}+\frac{4}{13}=\frac{13}{13}=1$

Now we can finally answer the question, what is the highest z value in the intersection of this cylinder and this plane. We have two points to investigate: $(3/\sqrt{13},2/\sqrt{13})$ and $(-3/\sqrt{13},-2/\sqrt{13})$. Let's examine the negative one first:

$f(-\frac{3}{\sqrt{13}},-\frac{2}{\sqrt{13}}) = \frac{-9}{\sqrt{13}} + \frac{4}{\sqrt{13}} = \frac{-5}{13}$

and now the positive:

$f(\frac{3}{\sqrt{13}},\frac{2}{\sqrt{13}}) = \frac{9}{\sqrt{13}} - \frac{4}{\sqrt{13}} = \frac{5}{\sqrt{13}}$

Since $5/\sqrt{13}>-5/\sqrt{13}$, the point of intersection of the cylinder and the plane with the highest z value is

$\vec{w}=\begin{pmatrix}3/\sqrt{13}\\2/\sqrt{13}\\5/\sqrt{13}\end{pmatrix}$