This answer is just an expanding version of Kconrad answer's. I am posting it here because this argument support the variational method for finite state space and also touch in the observation made by Kconrad about a technical issue about boundary values of the variational approach.

**Proposition:** Suppose $\Omega$ non empty finite set and let $\mathcal M$ denote the set of the probability measures on $\Omega$ then
$$
\sup_{\mu\in\mathcal M} \left[ h(\mu)-\int_{\Omega} U d\mu \right]=\log Z
$$
moreover, the supremum is attained for the measure $\mu$ given by
$$
\mu(\{\omega\})=\frac{1}{Z}e^{-U(\omega)}.
$$
**Proof:**
Let be $n$ the cardinality of $\Omega$. Define the function $f:\mathbb R_+^n\to\mathbb R$ by
$$
f(x_1,\ldots,x_n)=-\sum_{i=1}^n \Big[x_i\log x_i +K_ix_i\Big],
$$
where $K_i\in\mathbb R$ for all $i\in\{1,\ldots,n\}$. Consider the function $g:\mathbb R_+^n\to\mathbb R$ given by
$$
g(x_1,\ldots,x_n)=\sum_{i=1}^n x_i.
$$
We fix an enumeration for $\Omega$ and let be $K_i=U(\omega_i)$. So the following optimization problem
$$
\sup_{\mu\in\mathcal M} \left[ h(\mu)-\int_{\Omega} U d\mu \right]
$$
can be solved by finding a maximum for $f$ restricted to $g^{-1}(1)$. Note that for any critical point $(x_1,\ldots,x_n)$ of $f$ in $(0,\infty)^n\cap g^{-1}(1)$, it follows from the Lagrange Multipliers Theorem's that
$$
\nabla f(x_1,\ldots,x_n)=\lambda \nabla g(x_1,\ldots,x_n)
$$
for some $\lambda\in\mathbb R$, i.e.,
$$
-(\log x_i +1+K_i)=\lambda, \ \ \ \text{for all}\ i=1,\ldots,n.
$$
So for any pairs of index $i,j\in\{1,\ldots,n\}$, we have
$$
\log x_i +K_i=\log x_j+K_j
$$
taking the exponentials it follows that
$$
x_ie^{K_i}=x_je^{K_j}.
$$
Using that $\sum_{i=1}^nx_i=1$ and the above identities, we have
$$x_ie^{-K_i}=\left[1-\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}x_j\right]e^{-K_i}$$
$$=e^{-K_i}-\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}x_je^{-K_i}$$
So
$$
x_ie^{-K_i}=e^{-K_i}-x_i\sum_{j\in \{1,\ldots,n\}\backslash\{i\}}e^{-K_j}.
$$
Explicting $x_i$, we show that all critical points of $f$ in $(0,\infty)^n\cap g^{-1}(1)$ are given by (here there is just one)
$$
x_i=\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}.
$$
The image of $f$ at this point is given by
$$
-\sum_{i=1}^n \left[\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)\log
\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)
+K_i\left(\frac{e^{-K_i}}{\sum_{j=1}^ne^{-K_j}}\right)\right]
=
\log\left(\sum_{j=1}^ne^{-K_j}\right)
$$
to see that $(x_1,\ldots,x_n)$ is local maximum we can compute the Hessian and check that it is negative definite at this point.

To show that the point is global maximum point, we can compare the image of
$f$ at this point, with the value of $f$ in any point of the set
$\partial (0,\infty)^n\cap g^{-1}(1)$.
The restriction of $f$ to this set is given by
$$
f(x_1,\ldots,x_n)=-\sum_{i\in\{1,\ldots,n\}\backslash I}\Big[x_i\log x_i +K_ix_i\Big]
$$
Where $I\subset \{1,\ldots,n\}$ is a index set such that $|I|\geq 1$ e $x_i=0$ para todo
$i\in I$ .
We define $f_I$ which is a function of $n-|I|$ variables. It is maximum point can be determined in the same way and therefore we have that the max of $f_I$ is
$$
\log\left(\sum_{j\in\{1,\ldots,n\}\backslash I}e^{-K_j}\right)
$$
which is less than
$$
\log\left(\sum_{j=1}^ne^{-K_j}\right).
$$
Repeating this argument at most $n$ times we conclude that maximum of $f$ restricted to $g^{-1}(1)\cap \mathbb R^n_+$, is not attained in the boundary.

This post imported from StackExchange MathOverflow at 2015-08-19 09:28 (UTC), posted by SE-user Leandro