# Sampling typical clusters between distant points in subcritical percolation

+ 13 like - 0 dislike
302 views

I have on several occasions wondered how one might proceed in order to sample large subcritical Bernoulli bond-percolation clusters, say on the square lattice.

More precisely, let's consider the lattice $\mathbb{Z}^2$, and open each edge independently with probability $p<p_c=0.5$. I am interested in the event that the vertices $(0,0)$ and $(N,0)$ are connected by a path of open edges, when $N$ some large number ("large" meaning here "many times the correlation length").

It is well-known (rigorously) that this probability decays exponentially fast with $N$, and one has a lot of information about the geometry of the corresponding cluster (e.g., the corresponding cluster converges to a brownian bridge as $N\to\infty$ under diffusive scaling, it has a maximal "width" of order $\log N$, etc.).

Question: Are there (not too inefficient) algorithms that would sample typical clusters contributing to this event?

Edit: googling a bit, I have stumbled upon this paper. I have yet to read it (and to decide whether it is relevant to my question). What the author seems to be doing is growing his objects in a biased way, which helps create the very unlikely configurations he's after, but would result in the wrong final distribution. So he's at the same time correcting the bias (in a way I haven't yet understood by just browsing through the paper). As an example, he's sampling 18 disjoint crossing clusters for critical percolation on $\mathbb{Z}^3$ in a box of size $128\times128\times2000$, an event of probability of order $10^{-300}$. So:

Alternative question: does anybody know how this works? (I can also simply read the paper, but it might be useful for me and others if a nice self-contained description was given here by someone knowledgeable).

Apparently, looking more thoroughly, there is a lot of material that could be useful. It seems that relevant keywords here are: rare events simulation, importance sampling, splitting, etc.

This post has been migrated from (A51.SE)
asked Sep 30, 2011
Yes, there is. I've been programming one ~ten years ago, but forgot the name. Will look up references and post an answer.

This post has been migrated from (A51.SE)
"one has a lot of information about the geometry of the corresponding cluster (e.g., the corresponding cluster converges to a brownian bridge as N→∞ under diffusive scaling, it has a maximal "width" of order logN , etc.)." Perhaps the best bet is to understand where this information comes from and do the proofs suggest how a random cluster of the kind you want looks like. Also, you can use the known information to check suggestions for such samplings like the one in the editted part of the question.

This post has been migrated from (A51.SE)
"As an example, he's sampling 18 disjoint crossing clusters for critical percolation on $Z^3$ in a box of size 128×128×2000 , an event of probability of order $10^{−300}$". Yvan, I see very little hope that this sampling is related to the distribution claimed to be sampled. (This reminds me the following joke: A person says he what to sell his dog, "for how much?" his friends ask, "for $10^{300}$ dollars" he says. The next day they ask him if he was successful. "Yes!, I sold it for 10 kittens worth $10^{299}$ dollars each!")

This post has been migrated from (A51.SE)
@Gil Kalai: "Perhaps the best bet is to understand where this information comes from and do the proofs suggest how a random cluster of the kind you want looks like". Well, this I know very well, as I was one of the authors of [this study](http://arxiv.org/abs/math/0610100). The proof does tell you that a typical such cluster is a concatenation of "irreducible pieces". The problem is that I have no idea how to sample the latter ;) .

This post has been migrated from (A51.SE)
@Gil Kalai: "I see very little hope that this sampling is related to the distribution claimed to be sampled". Well, it does look quite serious, and the guy was able to recover numerically with very high precision all kinds of predictions from SLE (in the 2d case) for very rare events. I still have to really see how it works, however...

This post has been migrated from (A51.SE)
Dear Yvan, Indeed, maybe I was overly skeptical. Suppose you want to sample a random such cluster assuming there is a path between (0,0) to (N,0) with y-coordinate in -M,M. Then you can condition on having a left-to-right crossings in T by 2M rectangles of the form [rT,(r+1)T] X [-M,M]. These events are independent so you can sample conditioned to all of them satisfied. Having a path from (0,0) to (N,0) will still be rare conditioned on the existence of these crossings but not as rare. Thanks for the reference to your work.

This post has been migrated from (A51.SE)
Actually, maybe we can use the above idea iteratively. We can sample fairly efficiently condition on the case that there is left to right crossing for the T by 2M rectangles. Given this we can try being more ambitious and try to sample from the results the cases where there is crossing in 2T by 2M rectangles, and then (say) by 4T by M rectangles. This may lead to a not too ineficient method. (We can tune it further by letting M depend on the X coordinate, and perhaps also by splitting the rectangles horizontally as well.

This post has been migrated from (A51.SE)
@Gil Kalai: Yes, such an idea might work. In the literature on simulation of rare events, they do propose something similar (with a twist), namely to decompose the very rare event that you want as the end-result of s series of successive non-rare events (non-rare conditionnally on the fact that the previous ones occured). In our case, say reaching some hyperplane at distance M (comparable with the correlation length) on the right; then reaching one at 2M, and so on... [to be continued]

This post has been migrated from (A51.SE)
[continued] The twist is that you run many independent copies of your process, some of which "die" before achieving the next level; but those which achieve a new level are cloned and then evolve independently; in this way, you always have a large number of simultaneous trials. There is apparently a way to do that without introducing bias in the resulting final distribution, but I haven't read these papers in enough details to be fully convinced ;) . In any case, it is an interesting problematic, and apparently very important in practice (they mention insurances, portfolio management, etc.).

This post has been migrated from (A51.SE)

+ 3 like - 0 dislike

Here is the algorithm as I remember it. There is a name and a reference attach to it, I will update my answer when I find them.

The algorithm is not limited by $N$ (does no require storing the grid). Each edge of an infinite lattice is either open, closed or unknown. Initially all edges are unknown. The idea is to grow a connected cluster iteratively, trying (generating the random number for) all yet unknown edges in cluster's neighborhood.

You will need two data structures: list CLUSTER for open edges that form a connected cluster, list KNOWN for all the edges that are either open or closed but not unknown.

1. Start with both CLUSTER and KNOWN populated by one open edge.
2. Construct the list NEIGHBOURS consisting of all the edges connected to the edges in CLUSTER.
3. If there are no unknown edges in NEIGHBOURS then stop.
4. For each unknown edge in NEIGHBOURS: a) throw a random number and determine whether it is open or closed; b) add the edge to KNOWN; c) if the edge is closed, add it to CLUSTER.
5. Go back to step 2.

The algorithm terminates with probability if you are below the percolation threshold. By repeating it a sufficient number of times, you get an unbiased distribution of connected clusters. Depending on what you are particularly interested in, you may discard all 'small clusters' or build in a break once you cluster linear size exceeds $N$.

This post has been migrated from (A51.SE)
answered Oct 2, 2011 by (610 points)
Yes, this is the natural way to grow a typical subcritical cluster. But what I'm after is an extremely atypical such cluster, one which connects two points at a distance much farther than the correlation length. With such an algorithm, I'd need to repeat the growth an exponential number of times in $N$ in order for the event to occur...

This post has been migrated from (A51.SE)
Indeed, a local algorithm like this is not efficient for your purposes. Hard to imagene how a global (target size) information can be incorporated effieciently but I hope you'll succeed. Sorry this one didn't help.

This post has been migrated from (A51.SE)
+ 0 like - 3 dislike

The naive algorithm - sampling conditioned on the rare event in question (that vertices (0,0) and (N,0) are connected) is rather inefficient. The only hope I see for an efficient algorithm comes from duality results connecting subcritical percolation and supercritical percolation. There are various such results (that apply in greater generality - to nonplanar percolation, to random graphs etc.). I don't know if there are results of this kind for this precise question: namely of the law for the cluster containing (0,0) and (N,0) in subcritical percolation conditioned on them being connected is the same (or approximately the same) for subcritical percolation with parameter p and supercritical percolation (say of probability 1-p).

This post has been migrated from (A51.SE)
answered Oct 2, 2011 by (475 points)
I'm not sure I understand your last sentence, since the probablity of connecting the two points in the supercritical case is of course of order $1$ (bounded below by the probability that both belong to the infinite component). On the other hand, I agree that if the $2$ points are not connected in a subcritical configuration, then there cannot be a dual circuit separating one from the other in the corresponding supercritical configuration. But I don't see how the latter event would be easier to simulate (it's really just a reformulation of the former). Am I mistakent?

This post has been migrated from (A51.SE)
Dear Yvan, I DONT refer here to planar duality. In fact duality between the behavior of events in the subcritical and supercritical probabilities exists for various models e.g. the Erdos Renyi model of random graph. For example, we may wonder if the distribution of clusters containing (0,0) and (N,0) in subcritical percolation with edge probability p is related to the distribution of the cluster containing (0,0) and (N,0) conditioned on the existence of such cluster and it not being infinite. (Such a relation is a bit wild speculation but I see no other hope for an efficient algorithm.)

This post has been migrated from (A51.SE)
OK, I'll have to think more about what you're suggesting ;) . On the other hand, I don't think that there is no hope for an alternative efficient algorithm, see the edits in my question.

This post has been migrated from (A51.SE)

 Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead. To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL. Please consult the FAQ for as to how to format your post. This is the answer box; if you want to write a comment instead, please use the 'add comment' button. Live preview (may slow down editor)   Preview Your name to display (optional): Email me at this address if my answer is selected or commented on: Privacy: Your email address will only be used for sending these notifications. Anti-spam verification: If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:p$\hbar$ys$\varnothing$csOverflowThen drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds). To avoid this verification in future, please log in or register.