Quantcast
  • Register
PhysicsOverflow is a next-generation academic platform for physicists and astronomers, including a community peer review system and a postgraduate-level discussion forum analogous to MathOverflow.

Welcome to PhysicsOverflow! PhysicsOverflow is an open platform for community peer review and graduate-level Physics discussion.

Please help promote PhysicsOverflow ads elsewhere if you like it.

News

PO is now at the Physics Department of Bielefeld University!

New printer friendly PO pages!

Migration to Bielefeld University was successful!

Please vote for this year's PhysicsOverflow ads!

Please do help out in categorising submissions. Submit a paper to PhysicsOverflow!

... see more

Tools for paper authors

Submit paper
Claim Paper Authorship

Tools for SE users

Search User
Reclaim SE Account
Request Account Merger
Nativise imported posts
Claim post (deleted users)
Import SE post

Users whose questions have been imported from Physics Stack Exchange, Theoretical Physics Stack Exchange, or any other Stack Exchange site are kindly requested to reclaim their account and not to register as a new user.

Public \(\beta\) tools

Report a bug with a feature
Request a new functionality
404 page design
Send feedback

Attributions

(propose a free ad)

Site Statistics

206 submissions , 164 unreviewed
5,103 questions , 2,249 unanswered
5,355 answers , 22,800 comments
1,470 users with positive rep
820 active unimported users
More ...

  What is the interpretatation of individual contributions to the Shannon entropy?

+ 5 like - 0 dislike
4003 views

If $X=\{ x_1,x_2,\dots,x_n\}$ are assigned probabilities $p(x_i)$, then the entropy is defined as

$\sum_{i=1}^n\ p(x_i)\,\cdot\left(-\log p(x_i)\right).$

One may call $I(x_i)=-\log p(x_i)$ the information associated with $x_i$ and consider the above an expectation value. In some systems it make sense to view $p$ as the rate of occurrence of $x_i$ and then high low $p(x_i)$ the "value of your surprise" whenever $x_i$ happens corresponds with $I(x_i)$ being larger. It's also worth noting that $p$ is a constant function, we get a Boltzmann-like situation.

Question: Now I wonder, given $\left|X\right|>1$, how I can interpret, for fixed indexed $j$ a single term $p(x_i)\,\cdot\left(-\log p(x_i)\right)$. What does this "$x_j^\text{th}$ contribution to the entropy" or "price" represent? What is $p\cdot\log(p)$ if there are also other probabilities.

enter image description here

Thoughts: It's zero if $p$ is one or zero. In the first case, the surprise of something that will occur with certainty is none and in the second case it will never occur and hence costs nothing. Now

$\left(-p\cdot\log(p)\right)'=\log(\frac{1}{p})-1.$

With respect to $p$, The function has a maximum which, oddly, is at the same time a fixed point, namely $\dfrac{1}{e}=0.368\dots$. That is to say, the maximal contribution of a single term to $p(x_i)\,\cdot\left(-\log p(x_i)\right)$ will arise if for some $x_j$, you have $p(x_j)\approx 37\%$.

My question arose when someone asked me what the meaning for $x^x$ having a minimum $x_0$ at $x_0=\dfrac{1}{e}$ is. This is naturally $e^{x\log(x)}$ and I gave an example about signal transfer. The extrema is the individual contribution with maximal entropy and I wanted to argue that, after optimization of encoding/minimization of the entropy, events that happen with a probability $p(x_j)\approx 37\%$ of the time will in total "most boring for you to send". The occur relatively often and the optimal length of encoding might not be too short. But I lack interpretation of the individual entropy-contribution to see if this idea makes sense, or what a better reading of it is.

It also relates to those information units, e.g. nat. One over $e$ is the minimum, weather you work base $e$ (with the natural log) or with $\log_2$, and $-\log_2(\dfrac{1}{e})=\ln(2)$.


edit: Related: I just stumbled upon $\frac{1}{e}$ as probability: 37% stopping rule.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
asked Feb 4, 2015 in Theoretical Physics by NikolajK (200 points) [ no revision ]
retagged Feb 15, 2015
Are you unhappy with the idea that the Shannon entropy is the "average information"? That is, the expectation value of the random variable $I(X)$. In this case $-p_i\log p_i$ is just the weighted contribution of the event $x_i$ to this average.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Mark Mitchison
@MarkMitchison: To answer your question: No, I'm not unhappy with that interpretation for the whole sum (I've pointed that it takes the form of an expectation value.)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

1 Answer

+ 2 like - 0 dislike

This is a bit of a negative answer, but consider instead an expectation of some other quantity, such as the energy: $$ \langle E \rangle = \sum_i p_i E_i. $$ Now, it's obvious what $E_i$ means - it's the energy of state $i$ - but what does $p_iE_i$ mean? The answer is not very much really - it's the contribution of state $i$ to the expected energy, but it's very rarely if ever useful to consider this except in the context of summing up all the states' contributions.

In the context of information theory, $-p_i\log p_i$ is the same. $-\log p_i$ is the meaningful thing - it's the "surprisal", or information gained upon learning that state $i$ is in fact the true state. $-p_i\log p_i$ is the contribution of state $i$ to the Shannon entropy, but it isn't really meaningful except in the context of summing up all the contributions from all the states.

In particular, as far as I've ever been able to see, the value that maximises it, $1/e$, isn't a particularly special probability in information theory terms. The reason is that you always have to add up the contributions from the other states as well, and this changes the maximum.

In particular, for a two-state system, there is another state whose probability has to be $1-p$. Consequently, its Shannon entropy is given by $H_2 = -p\log p - (1-p)\log (1-p)$, and this function has its maximum not at $1/e$ but at $1/2$.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
answered Feb 5, 2015 by Nathaniel (495 points) [ no revision ]
Most voted comments show all comments
What I meant is that the links point at the same page. You mean the Rényi entropy entropy is less arbitrary because of the interpretations pointed out in the paper?

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
Oh sorry. The first was just a Wikipedia link. Yes, the point is that paper makes the Rényi entropy feel like a physically and informationally meaningful thing, whereas to me the Tsallis seems like a mysterious equation that comes from nowhere. (But if you know an interpretation I'd like to hear it.)

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Just stumbled upon $\frac{1}{e}$ as probability: 37% stopping rule.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK
Interesting - thanks. I've thought of another case where it comes up as well, which I've been meaning to post as an answer. Let me see if I have time to do that now.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Aah, I'm not sure if it works. My idea was to set up a situation where you can send either signal $A$ or $B$, but where there's a cost to send one of the signals but not the other. Then by trying to maximise (total information transmitted)/(expected cost) you might end up maximising $-p(A)\log p(A)$ to get $p(A)=1/e$ as the optimum. But the exact thing I thought of doesn't work, so I need to think more about it.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Most recent comments show all comments
@NikolajK your other thing seems related to the Rényi entropy. I do know a nice interpretation of the Rényi entropy, though its exact relation to your formula would need some thought.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user Nathaniel
Yeah, I was thinking about that argument for a second. I guess "I don't know anyone who knows an interpretation for this related quantity either" is indeed more information. Did you intent to post two different links? There are indeed several q-analogs, e.g Tsallis entropy.

This post imported from StackExchange Physics at 2015-02-15 11:55 (UTC), posted by SE-user NikolajK

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.
Live preview (may slow down editor)   Preview
Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
p$\hbar$ysicsO$\varnothing$erflow
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).
Please complete the anti-spam verification




user contributions licensed under cc by-sa 3.0 with attribution required

Your rights
...