Every now and again I hear something about Tsallis entropy,
$$
S_q(\{p_i\}) = \frac{1}{q-1}\left( 1- \sum_i p_i^q \right), \tag{1}
$$
and I decided to finally get around to investigating it. I haven't got very deep into the literature (I've just lightly skimmed Wikipedia and a few introductory texts), but I'm completely confused about the motivation for its use in statistical physics.
As an entropy-like measure applied to probability distributions, the Tsallis entropy has the property that, for two independent random variables $A$ and $B$,
$$
S_q(A, B) = S_q(A) + S_q(B) + (1-q)S_q(A)S_q(B).\tag{2}
$$
In the limit as $q$ tends to $1$ the Tsallis entropy becomes the usual Gibbs-Shannon entropy $H$, and we recover the relation
$$H(A,B) = H(A) + H(B)\tag{3}$$
for independent $A$ and $B$.
As a mathematical property this is perfectly fine, but the motivation for its use in physics seems completely weird, unless I've fundamentally misunderstood it. From what I've read, the argument seems to be that for strongly interacting systems such as gravitationally-bound ones, we can no longer assume the entropy is extensive (fair enough so far) and so therefore we need an entropy measure that behaves non-extensively for independent sub-systems, as in Equation $2$ above, for an appropriate value of $q$.
The reason this seems weird is the assumption of independence of the two sub-systems. Surely the very reason we can't assume the entropy is extensive is that the sub-systems are strongly coupled, and therefore not independent.
The usual Boltzmann-Gibbs statistical mechanics seems well equipped to deal with such a situation. Consider a system composed of two sub-systems, $A$ and $B$. If sub-system $A$ is in state $i$ and $B$ is in state $j$, let the energy of the system be given by $E_{ij} = E^{(A)}_i + E^{(B)}_j + E^{(\text{interaction})}_{ij}$. For a canonical ensemble we then have
$$
p_{ij} = \frac{1}{Z} e^{-\beta E_{ij}} = \frac{1}{Z} e^{-\beta (E^{(A)}_i + E^{(B)}_j + E^{(\text{interaction})}_{ij})}.
$$
If the values of $E^{(\text{interaction})}_{ij}$ are small compared to those of $E^{(A)}_i$ and $E^{(B)}_j$ then this approximately factorises into $p_{ij} = p_ip_j$, with $p_i$ and $p_j$ also being given by Boltzmann distributions, calculated for $A$ and $B$ independently. However, if $E^{(\text{interaction})}_{ij}$ is large then we can't factorise $p_{ij}$ in this way and we can no longer consider the joint distribution to be the product of two independent distributions.
Anyone familiar with information theory will know that equation $3$ does not hold for non-independent random variables. The more general relation is
$$
H(A,B) = H(A) + H(B) - I(A;B),
$$
where $I(A;B)$ is the mutual information, a symmetric measure of the correlation between two variables, which is always non-negative and becomes zero only when $A$ and $B$ are independent. The thermodynamic entropy of a physical system is just the Gibbs-Shannon entropy of a Gibbs ensemble, so if $A$ and $B$ are interpreted as strongly interacting sub-systems then the usual Boltzmann-Gibbs statistical mechanics already tells us that the entropy is not extensive, and the mutual information gets a physical interpretation as the degree of non-extensivity of the thermodynamic entropy.
This seems to leave no room for special "non-extensive" modifications to the entropy formula such as Equation $1$. The Tsallis entropy is non-extensive for independent sub-systems, but it seems the cases where we need a non-extensive entropy are exactly the cases where the sub-systems are not independent, and therefore the Gibbs-Shannon entropy is already non-extensive.
After that long explanation, my questions are: (i) Is the above characterisation of the motivation for Tsallis entropy correct, or are there cases where the parts of a system can be statistically independent and yet we still need a non-extensive entropy? (ii) What is the current consensus on the validity of Tsallis entropy-based approaches to statistical mechanics? I know that it's been the subject of debate in the past, but Wikipedia seems to imply that this is now settled and the idea is now widely accepted. I'd like to know how true this is. Finally, (iii) can the argument I sketched above be found in the literature? I had a quick look at some dissenting opinions about Tsallis entropy, but surprisingly I didn't immediately see the point about mutual information and the non-extensivity of Gibbs-Shannon entropy.
(I'm aware that there's also a more pragmatic justification for using the Tsallis entropy, which is that maximising it tends to lead to "long-tailed" power-law type distributions. I'm less interested in that justification for the sake of this question. Also, I'm aware there are some similar questions on the site already [1,2], but these don't cover the non-extensivity argument I'm concerned with here the answers only deal with the Rényi entropy.)
This post imported from StackExchange Physics at 2014-04-08 05:11 (UCT), posted by SE-user Nathaniel