Courses and textbooks, at least in undergrad, often gloss over the details of the Legendre transformation, which converts convex functions of one variable into another convex function of the “conjugate” variable. Used ubiquitously in physics, from thermodynamics to quantum field theory, this mathematical method plays a central role in connecting some of the most fundamental concepts, and yet, at least to me for a while, was a mysterious black-box procedure. I am now going to attempt to illuminate this technique.

Let’s begin with the technical definition:

Definition Given a convex function f:IRRf: I \subset \mathbb{R} \to \mathbb{R} of a variable xx, the Legendre transform, or convex conjugate, f:IRf^* : I^* \to \mathbb{R} in terms of the conjugate variable xx^* is given by

f(x)=supxI(xxf(x)),  xI \begin{equation} f^*(x^*) = \sup_{x \in I} \big( x^* x - f(x)\big), \ \ x^* \in I^* \end{equation}

where

I={xR:supxI(xxf(x))<} I^* = \left\{ x^* \in \mathbb{R} : \sup_{x \in I} \big( x^* x - f(x)\big) < \infty \right\}

This definition generalizes to convex functions of higher dimensions by replacing xxx^{*}x with x,x\langle x^*, x \rangle, the appropriate inner product.

Below is an interactive plot (made with Julia) demonstrating how f(p)f^*(p) depends on the maximum of the transformed function pxf(x)px-f(x) for a given function f(x)=12ax2+cf(x) = \frac{1}{2}ax^2 + c, where here a=1a = 1 and c=4c = 4. The derivation of these functions will be made clear below, but this might give some intuition into the transformation.

p=p = -6.0

Notice that the maximum of the orange curve always lies on the green curve ff^*.

a more explicit definition

For a fixed xx^* we can find f(x)f^*(x^*) by finding xˉ\bar x, s.t. the expression in (1) is maximized, via the standard method from calculus, and the fact that a linear function minus a convex function is a concave function.

0=ddx(xxf(x))x=xˉ =xf(xˉ)      xˉ=(f)1(x)  \begin{align} 0 &= \frac{d}{dx} \big( x^* x - f(x) \big)\bigg\rvert_{x=\bar x} \nonumber \\\ &= x^* - f’(\bar x) \nonumber \\\ \nonumber \\\ \implies \bar x &= \big(f’\big)^{-1}(x^*) \\\ \nonumber \end{align}

Keeping in mind that xˉ\bar x depends on xx^*, we can now write

f(x)=xxˉf(xˉ) \begin{equation} \boxed{ f^*(x^{*}) = x^{*} \bar x - f(\bar x) } \end{equation}

which is our first explicit definition of the Legendre transform of ff. Then, taking the derivative, we find that

(f(x))=xˉ+xdxˉdxf(xˉ)dxˉdx =xˉ.  \begin{align*} \big(f^*(x^{*})\big)’ &= \bar x + x^{*} \frac{d \bar x}{dx^{*}} - f’(\bar x) \frac{d \bar x}{dx^{*}} \\\ &= \bar x. \\\ \end{align*}

This is true since (2) implies f(xˉ)=xf’(\bar x) = x^*, which then implies

(f)=(f)1. \begin{equation} \boxed{ \big( f^* \big)’ = \big( f’ \big)^{-1}. } \end{equation}

Thus (4) is an equivalent way to specify ff^* up to an additive constant by integrating both sides of the expression with respect to xx^*.

mechanics

Considering a mechanical system, we can specify the dynamics of the system by considering the Lagrangian functional for a given path q(t)q(t), which, for all intents and purposes can be written as:

L[q(t)]=L(q,q˙;t)12mq˙2V(q) \mathcal{L}[q(t)] = \mathcal{L}(q, \dot q; t) \equiv \frac{1}{2}m \dot q^2 - V(q)

This defines a function on configuration space (technically the tangent bundle TMT\mathcal{M} of a manifold M\mathcal{M}), with coordinates (q,q˙)(q, \dot q). Given starting and ending coordinates, the physical solution is given by extremizing the action functional S[q(t)]=dt L[q] S[q(t)] = \int dt \ \mathcal{L}[q], resulting in the Euler-Lagrange equation:

ddtLq˙=Lq \frac{d}{dt}\frac{\partial \mathcal{L}}{\partial \dot q} =\frac{\partial \mathcal{L}}{\partial q}

an aside on conjugates

Before proceeding, I must comment that there seems to exist inconsistent definitions of conjugate variables or at least inconsistent usage of the term. For example, the classical definition of a conjugate variable in mechanics is the result of differentiating the action with respect to the original variable–e.g. schematically for qq

Sq=dtLq =dtddtLq˙ =Lq˙ p  \begin{align*} \frac{\partial S}{\partial q} &= \int dt \frac{\partial \mathcal{L}}{\partial q} \\\ &= \int dt \frac{d}{dt} \frac{\partial \mathcal{L}}{\partial \dot q} \\\ &= \frac{\partial \mathcal{L}}{\partial \dot q} \\\ &\equiv p \\\ \end{align*}

So pp is conjugate to qq, but the literature (wikipedia) seems to say that pp is conjugate to q˙\dot q in the context of Legendre transforms. I am looking out for clarification regarding this point.

For fixed qq, we can view L\mathcal{L} as a convex function of q˙\dot q, so

L(q˙)=mq˙    (L)1(p)=pm. \mathcal{L}’(\dot q) = m \dot q \implies \big(\mathcal{L}’\big)^{-1}(p) = \frac{p}{m}.

We can now find the Legendre transform of L\mathcal{L} taking q˙p\dot q \to p, using (3):

L(q,p)=(pq˙L(q,q˙))q˙=(L)1(p) =p(pm)L(q,pm)      H(q,p)p22m+V(q)=L(q,p)  \begin{align*} \mathcal{L}^*(q, p) &= \big( p \dot q - \mathcal{L}(q, \dot q) \big)\bigg\rvert_{\dot q = \big(\mathcal{L}’\big)^{-1}(p)} \\\ &= p \left( \frac{p}{m} \right) - \mathcal{L}\left( q, \frac{p}{m} \right) \\\ \\\ \implies \mathcal{H}(q, p) &\equiv \frac{p^2}{2m} + V(q) = \mathcal{L}^*(q, p) \ \end{align*}

Here, H\mathcal{H} is the Hamiltonian - a function of the system’s phase space (technically the cotangent bundle TMT^*\mathcal{M})

For completeness’ sake, let’s see how this works with the alternative definition (4). With qq fixed

H(q,p)=dp (Lq˙)1 =dp pm =p22m+C \begin{align*} \mathcal{H}(q,p) &= \int dp \ \left(\frac{\partial \mathcal{L}}{\partial \dot q}\right)^{-1} \\\ &= \int dp \ \frac{p}{m} \\\ &= \frac{p^2}{2m} + C \end{align*}

which matches (4) when we recognize the constant of integration C=V(q)C = V(q).