Can I derive the Boltzmann distribution by an invariance argument?

6934 views

In statistical mechanics, the Boltzmann distribution gives the probability of a system being in state $i$ as

$\displaystyle \frac{e^{- \beta E_i}}{\sum_i e^{-\beta E_i}}$

where $E_i$ is the energy of state $i$ . I have generally seen this demonstrated, starting with some reasonable physical assumptions, via a heat bath argument (as exposited e.g. by Terence Tao) involving interactions between the system and a larger external system. For me, an unsatisfying aspect of the heat bath argument is that it doesn't give me a strong reason to expect that a fundamental function like the exponential should appear at the end.

Here is what I think could be an argument which accomplishes that. By inspection, the Boltzmann distribution only depends on the relative energies of the different states. Under some mild assumptions this actually characterizes the Boltzmann distribution. Let us suppose there is a non-negative function $f(E)$ such that WLOG $f(0) = 1$ and such that the probability of a system being in state $i$ is

$\displaystyle \frac{f(E_i)}{\sum_i f(E_i)}.$

Let us suppose that the system has two states. Then the statement that the Boltzmann distribution only depends on the relative energies turns out to be equivalent to the functional equation $f(x + y) = f(x) f(y)$ , which under any kind of continuity assumption whatsoever gives $f(x) = e^{ax}$ for some constant $a$ .

Question 1: How can this argument be fleshed out? In particular, what physical principle would suggest that the Boltzmann distribution only depends on the relative energies of the states? (I seem to recall from my high-school physics lessons that energies are only well-defined up to an additive constant, but I would really appreciate some clarification on this issue.)

Question 2: How does this argument relate to the heat bath argument or the combinatorial argument given, for example, at Wikipedia?

(Motivation: some important functions in mathematics, like the Jones polynomial and various zeta functions, can be interpreted as partition functions of certain statistical-mechanical systems, and I am trying to sharpen my physical intuition about these constructions.)

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Qiaochu Yuan

asked Jul 11, 2010 in Theoretical Physics by Qiaochu Yuan (385 points) [ revision history ]
edited Aug 19, 2015 by Dilaton

Hi Qiaochu, did you read before about the variational method ? I liked it. I wrote a post about that question, because of a question of a friend of mine. Unfortunately it was written in Portuguese. If you want to take a look in this approach the link is : leandromat.wordpress.com/2010/07/04/… It is a very basic text and it was wrote with help of these books: Thermodynamic Formalism - David Ruelle Entropy, Large Deviations; and Statistical Mechanics - Richard Ellis; Equilibrium states in ergodic theory - Gerhard Keller.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Leandro

commented Jul 11, 2010 by Leandro (155 points) [ no revision ]

The argument you were seeking is possibly detailed balance. The line of thinking you provided above has a flavor of it. You can look it up in David Tong's lecture note here.

commented Aug 24, 2015 by Sheng-Jie Huang (50 points) [ no revision ]

Your comment on this question:

To answer, leave an answer instead. Comments are usually for non-answers.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
To alert a user, please use the "@" command and remove spaces from the username, example, the user "John Doe" should be pinged as "@JohnDoe", while the user "Johndoe" should be pinged as "@Johndoe". The post author is always automatically pinged (unless you are the post author).
Please consult the FAQ for as to how to format your post.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

6 Answers

Like Andreas, I find a maximum entropy argument to be intellectually appealing. However, he says the solution can be found by Lagrange multipliers and I don't know the justification for using Lagrange multipliers. That is, in the space of all probability distributions on the particles, how do you know the maximum entropy solution is really accessible to variational methods?

For a derivation not using Lagrange multipliers, see the bottom of page 9 through page 11 at http://www.math.uconn.edu/~kconrad/blurbs/analysis/entropypost.pdf.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user KConrad

answered Jul 11, 2010 by KConrad (60 points) [ no revision ]

Hi KConrad, your question is about the non finite state spaces ?

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Leandro

commented Jul 11, 2010 by Leandro (155 points) [ no revision ]

Thanks! That paper was very helpful. Is it correct to say that the dependence on relative energies comes from the mean-energy constraint?

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Qiaochu Yuan

commented Jul 11, 2010 by Qiaochu Yuan (385 points) [ no revision ]

The question is in part about the non-finite case, but even in the finite case how do you know in advance that the max. entropy distr. does not lie on the boundary of the convex space of prob. distr., where one of the particle probabilities is 0? You need to rule out the answer being located there to know that the answer by variational methods is max. over all the possibilities. (In fact the max. entropy distr. is on the boundary for a finite state space if you want the avg. energy to be min or max of the

$E_i$ 's, so there is something to show in the other "non-degenerate" situations.)

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user KConrad

commented Jul 11, 2010 by KConrad (60 points) [ no revision ]

Qiaochu, yes if you read Theorem 4.9 you will see there is a mean-energy constraint

$\sum q_jE_j = \langle E\rangle$ . That is the only condition imposed, along with the necessary condition that your choice of

$\langle E\rangle$ has to lie in the closed interval between the min and max of the

$E_i$ 's. (If you want

$\langle E\rangle$ to be outside that range then of course there's no answer.)

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user KConrad

commented Jul 11, 2010 by KConrad (60 points) [ no revision ]

Since I asked a question in my answer about why the variational method is justifiable even though the space of prob. distributions has a boundary, I should add that the variational method does have the virtue of telling us what form the answer ought to be! A downside to the nonvariational proof in the link I give is that it doesn't explain where the family of Boltzmann distr. comes from. I see two parts: (1) variational methods tell us what kind of answer to expect and (2) we then need a proof taking the whole space, incl. the boundary, into account. I don't know how to do (2) variationally.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user KConrad

commented Jul 11, 2010 by KConrad (60 points) [ no revision ]

@Kconrad, the boundary problem it was took in account in the link I provided above. But there are two aspects of the argument I presented there. The state space is finite and the starting variational problem is

$\sup_{\mu\in\mathcal M}\left[h(\mu)-\int U \ d\mu \right]$

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Leandro

commented Jul 11, 2010 by Leandro (155 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

I sketched this here: http://blog.eqnets.com/2009/09/09/the-fundamental-law-of-statistical-physics/

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Steve Huntsman

answered Jul 11, 2010 by Steve Huntsman (105 points) [ no revision ]

Thanks, Steve. I don't think I have a clear understanding of why energy is only defined up to an additive constant. Do you know anywhere this issue is clarified?

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Qiaochu Yuan

commented Jul 11, 2010 by Qiaochu Yuan (385 points) [ no revision ]

The equations of motion are always invariant under the transformation

$U \mapsto U + const$ of any potential. This is a fancy way of talking about the work-energy theorem.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Steve Huntsman

commented Jul 11, 2010 by Steve Huntsman (105 points) [ no revision ]

BTW, I always thought it was funny that Feynman (not to mention anyone else) never did this, especially given his observation about this invariance in his statistical physics lectures. See the footnote on page 3: books.google.com/…

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Steve Huntsman

commented Jul 11, 2010 by Steve Huntsman (105 points) [ no revision ]

I wondered the same thing when I read that footnote, actually. (It's mildly annoying that Feynman didn't state a continuity hypothesis - I guess he didn't know about pathological solutions to the Cauchy functional equation.)

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Qiaochu Yuan

commented Jul 11, 2010 by Qiaochu Yuan (385 points) [ no revision ]

Even if Feynman had known about them, he probably wouldn't have mentioned them—not really his style (even compared to other physicists) to let mathematical pathologies derail physical reasoning.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Steve Huntsman

commented Jul 12, 2010 by Steve Huntsman (105 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

For me the clearest derivation of the Boltzmann distribution is by maximizing the entropy $\sum n_i \ln(n_i)$ unter the constraint of constant total energy $\sum n_i E_i = \text{const.}$ and constant total particle number $\sum n_i = \text{const.}$ . The Lagrange multiplicator for the first constraint gives $\beta$ . You can immediately see that a shift of the energies does not change the distribution.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Andreas Rüdinger

answered Jul 11, 2010 by Andreas Rüdinger (90 points) [ no revision ]

That shifts the source of my confusion to what the rationale behind the definition of entropy is!

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Qiaochu Yuan

commented Jul 11, 2010 by Qiaochu Yuan (385 points) [ no revision ]

Qioachu, see Theorem 5.1 of the link I put in my answer for a justification of the formula for entropy (on finite sample spaces). Section 6 may also be interesting to you in terms of the relation between maximum entropy and invariance.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user KConrad

commented Jul 11, 2010 by KConrad (60 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

This answer is just an expanding version of Kconrad answer's. I am posting it here because this argument support the variational method for finite state space and also touch in the observation made by Kconrad about a technical issue about boundary values of the variational approach.
Proposition: Suppose $\Omega$ non empty finite set and let $\mathcal M$ denote the set of the probability measures on $\Omega$ then

$\sup_{\mu\in\mathcal M} \left[ h(\mu)-\int_{\Omega} U d\mu \right]=\log Z$ moreover, the supremum is attained for the measure

$\mu$ given by

$\mu(\{\omega\})=\frac{1}{Z}e^{-U(\omega)}.$ Proof: Let be

$n$ the cardinality of

$\Omega$ . Define the function

$f:\mathbb R_+^n\to\mathbb R$ by

$f(x_1,\ldots,x_n)=-\sum_{i=1}^n \Big[x_i\log x_i +K_ix_i\Big],$ where

$K_i\in\mathbb R$ for all

$i\in\{1,\ldots,n\}$ . Consider the function

$g:\mathbb R_+^n\to\mathbb R$ given by

$g(x_1,\ldots,x_n)=\sum_{i=1}^n x_i.$ We fix an enumeration for

$\Omega$ and let be

$K_i=U(\omega_i)$ . So the following optimization problem

$\sup_{\mu\in\mathcal M} \left[ h(\mu)-\int_{\Omega} U d\mu \right]$ can be solved by finding a maximum for

$f$ restricted to

$g^{-1}(1)$ . Note that for any critical point

$(x_1,\ldots,x_n)$ of

$f$ in

$(0,\infty)^n\cap g^{-1}(1)$ , it follows from the Lagrange Multipliers Theorem's that

$\nabla f(x_1,\ldots,x_n)=\lambda \nabla g(x_1,\ldots,x_n)$ for some

$\lambda\in\mathbb R$ , i.e.,

$-(\log x_i +1+K_i)=\lambda, \ \ \ \text{for all}\ i=1,\ldots,n.$ So for any pairs of index

$i,j\in\{1,\ldots,n\}$ , we have

$\log x_i +K_i=\log x_j+K_j$ taking the exponentials it follows that

$x_ie^{K_i}=x_je^{K_j}.$ Using that

$\sum_{i=1}^nx_i=1$ and the above identities, we have

$x_ie^{-K_i}=\left[1-\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}x_j\right]e^{-K_i}$

$=e^{-K_i}-\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}x_je^{-K_i}$ So

$x_ie^{-K_i}=e^{-K_i}-x_i\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}e^{-K_j}.$ Explicting

$x_i$ , we show that all critical points of

$f$ in

$(0,\infty)^n\cap g^{-1}(1)$ are given by (here there is just one)

$x_i=\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}.$ The image of

$f$ at this point is given by

$-\sum_{i=1}^n \left[\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)\log \left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right) +K_i\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)\right] = \log\left(\sum_{j=1}^ne^{-K_j}\right)$ to see that

$(x_1,\ldots,x_n)$ is local maximum we can compute the Hessian and check that it is negative definite at this point.

To show that the point is global maximum point, we can compare the image of

$f$ at this point, with the value of

$f$ in any point of the set

$\partial (0,\infty)^n\cap g^{-1}(1)$ . The restriction of

$f$ to this set is given by

$f(x_1,\ldots,x_n)=-\sum_{i\in\{1,\ldots,n\}\backslash I}\Big[x_i\log x_i +K_ix_i\Big]$ Where

$I\subset \{1,\ldots,n\}$ is a index set such that

$|I|\geq 1$ e

$x_i=0$ para todo

$i\in I$ . We define

$f_I$ which is a function of

$n-|I|$ variables. It is maximum point can be determined in the same way and therefore we have that the max of

$f_I$ is

$\log\left(\sum_{j\in\{1,\ldots,n\}\backslash I}e^{-K_j}\right)$ which is less than

$\log\left(\sum_{j=1}^ne^{-K_j}\right).$ Repeating this argument at most

$n$ times we conclude that maximum of

$f$ restricted to

$g^{-1}(1)\cap \mathbb R^n_+$ , is not attained in the boundary.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Leandro

answered Jul 11, 2010 by Leandro (155 points) [ no revision ]

Thanks for posting this with the discussion of the boundary case. If the set

$\Omega$ is countably infinite, is this method still complete? I don't know about justifications of Lagrange multipliers in that situation.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user KConrad

commented Jul 11, 2010 by KConrad (60 points) [ no revision ]

@KConrad, you are welcome. About your question, unfortunately I do not know the answer.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Leandro

commented Jul 12, 2010 by Leandro (155 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

I'd like to chime in here, as someone with a physics background.

I absolutely love the derivation given by Landau in volume 5 on statistical physics, chapter 1. The basic idea is that since the log of the probability distribution function (i.e. the entropy) is an additive constant of the motion, it can be expressed as a linear combination of the 7 fundamental additive constants of the motion, namely the three components of momentum, the three components of angular momentum, and the energy. But since the momentum/angular momentum components can be reduced to zero with an appropriate frame of reference, the log of the distribution function depends only on some multiple of the energy, which turns out to be 1/T. We obtain the partition function naturally by normalizing the probability distribution.

I think this answers your question 1 from a physics point of view.

EDIT:

in view of the comments below, I should point out the the probability distribution I am referring to gives the probability of finding a system of N particles which obey the laws of classical mechanics in the state for which the n^th particle is at position r_n and moving with a velocity v_n

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Matt

answered Jul 11, 2010 by Matt (120 points) [ no revision ]

I think this argument uses too many properties specific to systems of particles. Keith Conrad's answer shows that the underlying principles here are information-theoretic in nature and don't depend on the specific details of the physical system.

This post imported from StackExchange MathOverflow at 2015-08-19 09:29 (UTC), posted by SE-user Qiaochu Yuan

commented Jul 12, 2010 by Qiaochu Yuan (385 points) [ no revision ]

To add to Qiaochu's comment, the physicist Edwin Jaynes (sorry, I don't know how well-known he is in physics, so maybe this looks as dumb as speaking of "the mathematician Frobenius"?) promoted the information-theoretic approach to explaining the Boltzmann distribution. See bayes.wustl.edu/etj/articles/theory.1.pdf.

This post imported from StackExchange MathOverflow at 2015-08-19 09:29 (UTC), posted by SE-user KConrad

commented Jul 12, 2010 by KConrad (60 points) [ no revision ]

Re: Jaynes: blog.eqnets.com/2009/09/21/jaynes-and-the-gibbs-paradox

This post imported from StackExchange MathOverflow at 2015-08-19 09:29 (UTC), posted by SE-user Steve Huntsman

commented Jul 12, 2010 by Steve Huntsman (105 points) [ no revision ]

If you would forgive me for protesting your comment Qiaochu, then I would say that I would not know how to state a "physical principle" to show "that the Boltzmann distribution only depends on the relative energies of the states" without a reference to the energy of particles. The information theoretic approach is useful for quantum mechanics, but,in my opinion, if we want a clear picture in our head of why the Boltzmann distribution is related to the relative energy of states, we must resort to an analogy with the classical mechanics of systems of extremely large numbers of particles.

This post imported from StackExchange MathOverflow at 2015-08-19 09:29 (UTC), posted by SE-user Matt

commented Jul 12, 2010 by Matt (120 points) [ no revision ]

@Matt—see my answer for a derivation that does not rely on any of that stuff.

This post imported from StackExchange MathOverflow at 2015-08-19 09:29 (UTC), posted by SE-user Steve Huntsman

commented Jul 12, 2010 by Steve Huntsman (105 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

+ 0 like - 0 dislike

This is an old question, but I would like to contribute a small note about energies being defined up to a constant, from a different point of view.

Force is usually seen as a vector, but you can see it as a 1-form, which integrates along a curve to produce the work done. A conservative force is one for which work is independent of path, i.e. it integrates to zero along any closed curve. By Stoke’s theorem, its exterior derivative vanishes $dF=0$ , i.e. it is a closed form. By the Poincaré lemma, it is exact, $F=dU$ , where $U$ is the energy. Shift invariance, a kind of Gauge invariance, is now obvious.

This post imported from StackExchange MathOverflow at 2015-08-19 09:29 (UTC), posted by SE-user Marcel

answered Aug 18, 2015 by Marcel (300 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if a comment is added after mine

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me if my answer is selected or commented on

Anti-spam verification:

If you are a human please identify the position of the character covered by the symbol

$\varnothing$ in the following word:
p

$\hbar$ ysicsOve

$\varnothing$ flow
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).

Please complete the anti-spam verification

News

Tools for paper authors

Tools for SE users

Public $\beta$ tools

Most popular tags

Site Statistics

Can I derive the Boltzmann distribution by an invariance argument?

Your comment on this question:

Live Preview

Preview

6 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview

News

Tools for paper authors

Tools for SE users

Public β\beta tools

Most popular tags

Related questions

Site Statistics

Can I derive the Boltzmann distribution by an invariance argument?

Your comment on this question:

Live Preview

Preview

6 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview

Public $\beta$ tools