Unusual generalization of the law of large numbers

2543 views

I have seen in physical literature an example of application of a very unusual form of the law of large numbers. I would like to understand how legitimate is the use of it, whether there are mathematically rigorous results in this direction, or at least some clarifications would be helpful. What is non-typical in the example is that the "probability measure" is not positive, but rather complex valued (though still normalized by one).

The example is taken from the book "Gauge fields and strings", $\S$ 9.1, by A. Polyakov. The argument is a part of computation of some path integral.

Let us fix $T>0$. Divide the segment $[0,T]$ to $T/\varepsilon$ parts of equal length $\varepsilon$. For small $c>0$ consider the integral $$\int_{\mathbb{R}^{T/\varepsilon}}\left(\prod_{t=1}^{T/\varepsilon}d\gamma_t(\gamma_t-ic)^{-2}e^{i\varepsilon (\gamma_t-ic)}\right)\Phi(R;-i\varepsilon\sum_t(\gamma_t-ic)^{-1}),$$ where $\Phi(R,x)=x^{-2}\exp(-R^2/x)$. (Here $R$ is a real number; my notation is slightly different from the book.)

The measure is not normalized, but one can divide by the total measure. Clearly $(\gamma_t-ic)^{-1}$ are i.i.d. The above integral depends only on their sum $\sum_{t=1}^{T/\varepsilon}(\gamma_t-ic)^{-1}$. Thus formally it looks like one is in position to apply some form of LLN when $\varepsilon\to 0$ and replace this sum inside $\Phi$ by the expectation of $(\gamma_t-ic)^{-1}$ times $T/\varepsilon$. (In fact Polyakov gives few more estimates of the variance to justify that. It would be standard if the measure was positive, but otherwise it looks mysterious to me.)

This post imported from StackExchange MathOverflow at 2015-04-28 10:48 (UTC), posted by SE-user MKO

asked Apr 26, 2015 in Theoretical Physics by MKO (130 points) [ no revision ]
retagged Apr 28, 2015

I am not sure I understand your notation. Anyway, I suggest that you apply the LLN to the total variation of your measure (which is a positive measure). $1/n \sum X_i$ is converging to $1/|\mu|(X) \int X d|\mu|$ for a complex valued measure, not to $\int X d\mu$.

This post imported from StackExchange MathOverflow at 2015-04-28 10:48 (UTC), posted by SE-user coudy

commented Apr 26, 2015 by coudy (0 points) [ no revision ]

Technically one can do that. But I think this is not what is done in the book mentioned above.

This post imported from StackExchange MathOverflow at 2015-04-28 10:48 (UTC), posted by SE-user MKO

commented Apr 26, 2015 by MKO (130 points) [ no revision ]

Your comment on this question:

To answer, leave an answer instead. Comments are usually for non-answers.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
To alert a user, please use the "@" command and remove spaces from the username, example, the user "John Doe" should be pinged as "@JohnDoe", while the user "Johndoe" should be pinged as "@Johndoe". The post author is always automatically pinged (unless you are the post author).
Please consult the FAQ for as to how to format your post.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

2 Answers

A non-positive measure allows application of the law of large numbers when a sketch of the standard proof of the convergence can be carried out with the particular measure and variables in question. There are heuristics for spotting when this works, which are pretty intuitive, and the basic heuristic is that you can look at the absolute value of the measure to see how big across the distribution is, how slow the falloff is, and this will tell you how fast the convergence will be. When it looks like it works, Polyakov won't bother justifying it with careful estimates, as this is tedious. But it can be justified if you need to do it.

It is true for any measure that two "independent random variables" (in scare quotes because the measure $\mu$ is no longer positive) x and y have additive "means" (when defined):

$$ \langle x + y \rangle = \langle x \rangle + \langle y \rangle $$

and further, if you subtract out the means so that the variables have zero mean, the variance is additive (when defined):

$$ \langle (x+y)^2 \rangle = \langle x^2 \rangle + \langle y^2 \rangle $$

The question is whether the "probability density function" of the sum of many values of some function of these variables converges to a delta-function, this is the law of large numbers.

The convergence here is in the sense of distributions, so that the statement is that the "probability density function" $\mu_S$ of the sum $\sum_i F(x_i)$ for these i.i.d variables (with positive/negative distribution) converges to a delta function around the central value, i.e., the integral against any smooth test function converges to the value of the test function at $N \langle F\rangle$ as the number of variables becomes large.

The "distribution" $\mu_S$ is still the convolution of the "distribution" of $\mu_F$:

$$ \mu_S = \mu_F * \mu_F * \mu_F ... *\mu_F$$

Where the number of convolutions is N, in your case, $T/\epsilon$. The main step of the proof of the standard central limit theorem consists of taking the Fourier transform of both sides of the equation, and noting that:

$$ \mu_S(k) = \mu_F(k)^N $$

So that as long as the Fourier transform values obey the rules:

normalization/zero-center: $\mu_F(0) = 1$

shrinky-ness: $|\mu_F(k)| < |\mu_F(0)| $

Then, for large N, outside of a small region near $k=0$, the Fourier transform of the distribution of the sum will vanish in a way controlled by the cusp-behavior at k=0. The shrinking condition needs to be stated more precisely, you want the Fourier transform to not approach one away from zero, but in the case of interest, this is obvious, because you have something like a Gaussian or exponentially decaying Fourier transform. I said "zero center" instead of "zero mean" because the mean might diverge. You can still define a center by translating the measure until the Fourier transform at 0 is real (and then you can normalize by rescaling so that the value is 1).

When $\mu_F(k) = 1- A {k^2$\over 2}$ (the second derivative of the Fourier transform at zero, A, is necessarily the variance of F), you get the standard Gaussian when you raise to a large power, simply from the law:

$$ (1 - Ak^2)^N \approx e^{-ANk^2} $$

Which is one of the definitions of the exponential: $\lim_{N\rightarrow\infty} (1+A/N)^N = e^A$, but Polyakov would have just replaced $(1-Ak^2) \approx e^{-Ak^2}$. to show this, although it is strictly only true at large N.

when $\mu_F(k) = 1 - A |k|^\alpha$ with $0<\alpha<2$, you get Levy behavior, and in this case $\alpha=1$, and you get the fourier transform of the spreading Cauchy distribution:

$$ (1 - A|k|)^N = e^{-N|k|}$$.

From this, you can see that the sum of many Cauchy distributed variables spreads out linearly in time. To get rid of the N dependence, you need to absorb N into k by rescaling linearly (the linear shrinking of the k distribution turns into linear spreading of the x distribution). In the Gaussian case, to get rid of the N dependence of $Nk^2$, you rescale k to absorb $sqrt{N}$, so that the usual Gaussian variables have the normal $\sqrt{N}$ spreading. This is the standard argument for the central limit theorem/Levy's-theorem. It doesn't assume that the measure is positive, only that the Fourier transform is biggest at the origin, and has a definite diferentiable or cusp behavior there.

This means that the central limit convergence can be established with Gaussians with complex means and complex variances too, since the Fourier transform is still Gaussian. It also works with Cauchy distributed variables, or other Levy distributions, as long as the Fourier transform of the pseudo-probability function is still well behaved.

But you weren't quite asking about the central limit theorem, you were asking about the law of large numbers. In this case, you add N times the center value to the convolved distribution, to find the new center value of the sum, and rescale the Fourier transform appropriately for the average value (or whatever you are computing). The convergence of this to a delta function is guaranteed when the rescaled Fourier transform becomes constant over a wider and wider range (in k) as N becomes large.

The special case of Cauchy distribution is right on the boundary where the law of large numbers stops working, because here, the width parameter of the distribution scales linearly in N. I bring it up, because if you look at the absolute value of the $\gamma$ distribution, it is

$${1\over \gamma^2 + c^2}$$

This is the Cauchy case. If you were adding together $\gamma$s, they wouldn't obey the law of large numbers, as the positive case Cauchy distribution is on the border of obeying it.

But for your case, you are adding together the quantities $1\over \gamma +i c $, which transforms the Cauchy-like variable $\gamma$ into a variable with finite mean and variance. This means the result will be a standard Gaussian central limit type thing, and you can just find the mean and variance.

These types of limit arguments are common in Polyakov, and I go through each one justifying it quickly mentally like this. The main way in which this appears in the physics is when you substitute the classical value for a variable with no derivative terms in a path integral. Each position value is fluctuating, but the integral over a region, or the average value of the variable, is by the minimum action location. The justiication for this is through the rigamarole above, but you can't always go through all the steps, so you use your gut.

To justify this stuff is much easier than justifying path-integrals.

answered Apr 28, 2015 by Ron Maimon (7,730 points) [ revision history ]
edited Apr 29, 2015 by Ron Maimon

That's a great answer. However, I think, there are a few additions/modifications that could make it easier to follow. First, A is half the variance of F, I think. Second, the variance of S, as you wrote, scales additively, so the variance of S is N times A (what you wrote isn't wrong, but it's weird that the exponential depends explicitely on N when you are talking about the limit of N going to infinite.)

commented Apr 29, 2015 by drake (885 points) [ no revision ]

@drake: I'll fix the screwups, there are a few others too. To get rid of the N dependence in the exponential, you rescale $k$. When the Fourier transform is $e^{-Nk^2}$, you rescale k to absorb $\sqrt{N}$ which is the same as rescaling x to absorb ${1\over \sqrt N}$, that is x spreads out in the usual way as $\sqrt{N}$. When you have a Cauchy tail, so that it is $e^{-N|k|}$, you scale $k$ linearly, so x spreads linearly. For each Levy exponent $0<\alpha<2$, you have $e^{-Nk^\alpha}$, and you get a spreading behavior with a different exponent, but it's very easy to see the scaling law in the Fourier tranform cusp behavior. In the nonpositive case, you can also have non-probabilistic non-Levy exponents, a Fourier transform going like $e^{-Nk^4}$, and the sum of "independent random variables" with distribution whose Fourier transform has this form spreads out in x-space even slower, as ${1\over N^2}$. This type of thing doesn't show up in physics, Quantum systems tend to have the same Levy exponents as probability systems.

commented Apr 29, 2015 by Ron Maimon (7,730 points) [ revision history ]
edited Apr 30, 2015 by dimension10

I suppose it is $(1 - A|k|)^N = e^{-AN|k|}$. Great answer ... I'm craving the same for the path integrals :)

commented Aug 30, 2018 by igael (360 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

One can apply the law of large numbers when $c$ and $\epsilon$ are purely imaginary and the singularities are somehow regularized. Physicists then allow themselves the freedom to assume that the results remain valid after analytic continuation and undoing the regularization.

This is the same kind of ''formal'' argument that is routinely used in quantum field theory to make sense of the Feynman path integral, which is a formal analytic continuation of the integral with respect to the Wiener measure. Of course, from a rigorous point of view, this has only a heuristic value, and one must be careful to use it; but if used with care it often - not always - gives correct ''results'' (i.e., conjectures, from a rigorous point of view).

Making these kinds of arguments mathematically rigorous is often highly nontrivial work.

answered Apr 28, 2015 by Arnold Neumaier (15,787 points) [ revision history ]
edited Apr 29, 2015 by Arnold Neumaier

I think I can feel from your answer the general style of the argument. Nevertheless in this specific situation if $c$ is purely imaginary, the singularity in the integral is non-integrable.

commented Apr 28, 2015 by anonymous [ no revision ]

This is why I said that one regularizes the singularities somehow before integrating.

Physicists are accustomed to (and feel entitled to) do these things since 1948 - when treating quantum electrodynamics (QED) this way produced good agreement with experiments and earned the formal manipulators (Tomonaga, Feynman, Schwinger) a few years later a Nobel prize.

For QED, these formal manipulations work extraordinarily well (agreement to experiment in up to 12 significant digits). So far no one was able to make these predictions on a mathematically rigorous basis. Thus the nonrigorous ''formal'' arguments are (at least currently) necessary in physics to make progress.

commented Apr 28, 2015 by Arnold Neumaier (15,787 points) [ revision history ]
edited Apr 28, 2015 by Arnold Neumaier

@ArnoldNeumaier: This is not the correct justification--- c must not be purely imaginary, in that case the integrand has a double pole at x=c, and the integral is infinite at this location. The proper domain is for c real, where the integrand is completely regular.

commented Apr 28, 2015 by Ron Maimon (7,730 points) [ no revision ]

@RonMaimnon: But for real $c$ the measure is not positive, and one has no justification at all.

Whereas one can regularize any double pole of a function $f(x)$ at $x_0$ by multiplying it with $\frac{(x-x_0)^2}{(x-x_0)^2+\delta^2}$. Then the integral is well-defined in a compact region, one can integrate, analytically continue, move the boundary to infinity, and move $\delta$ to zero. At least in a formal way, as physicists often do.

commented Apr 29, 2015 by Arnold Neumaier (15,787 points) [ revision history ]

@ArnoldNeumaier: The justification is that the law of large numbers doesn't require positivity in the proof, the conditions are milder, namely that the Fourier transform tricks work out. You can prove that complex mean/variance Gaussians still have a law-of-large-numbers and central limit theorem easily, because the Fourier transforms are still Gaussian.

Your method doesn't work because the regulated measure as a probability measure doesn't have any spread, it just becomes a delta-function on $\gamma$ when all is said and done, and then the law of large numbers is empty, because it's also the law of the number 1, all the distributions on $\gamma$ are delta functions, because $\gamma$ is just determined. I gave an answer.

commented Apr 29, 2015 by Ron Maimon (7,730 points) [ no revision ]

Ok, your answer is better.

commented Apr 29, 2015 by Arnold Neumaier (15,787 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if my answer is selected or commented on:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
p$\hbar$ysicsOverf$\varnothing$ow
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).

Please complete the anti-spam verification

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Site Statistics

Unusual generalization of the law of large numbers

Your comment on this question:

Live Preview

Preview

2 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Related questions

Site Statistics

Unusual generalization of the law of large numbers

Your comment on this question:

Live Preview

Preview

2 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview