1) As far as I know, the first convincing non perturbative argument for the exact finiteness of $D=4$ $N=4$ SYM was given by Seiberg in
http://www.sciencedirect.com/science/article/pii/0370269388912658
Let me give a summary of the argument. First one remarks that a renormalizable action for a $D=4$ $N=4$ SYM of given gauge group $G$ is completely fixed by symmetries up to the choice of a (complexified) coupling constant $g$. The question is to find whereas this coupling constant flows under the renormalisation group flow. The main trick is that the theory has 6 real scalars and so a non-trivial moduli space $M$ of vacua obtained by giving expectation values to the scalars. The fact that $M$ exists i.e. that it is possible to give expectation values to scalars, i.e. that there is no extra potential lifting these classical vacua, is a consequence of supersymmetry (in fact $N=2$ is enough for this step). At a generic point of $M$, corresponding to expectation values of order $v$ (an energy: a scalar field has the dimension of an energy in 4 spacetime dimensions), the gauge group $G$ is Higgsed to its maximal torus $U(1)^r$ and the low energy effective theory is an abelian $N=4$ gauge theory, i.e. something that we understand (this is the main point of the trick: if we stay at $v=0$, we have a priori no idea of what happens at low energy because we don't understand well the dynamics of non-abelian gauge theories). In particular, the form of this effective action is completely fixed by $N=4$ supersymmetry up to some coupling constants. By $N=4$ supersymmetry, these low energy coupling constants are independent on the fluctuations of the scalar fields around the fixed expectation value $v$ (i.e. the metric on the moduli space of vacua is flat). By general properties of the Higgs mechanism, these low energy coupling constants are determined by the value of the UV coupling constant $g$ at the energy scale $v$. As the answer does not depend on the fluctuations of the scalar fields around $v$, it means that the $UV$ coupling constant $g$ is constant for energies slightly fluctuating around $v$. As the energy scale $v$ can be choosen arbitrarily in the preceding reasoning, it implies that the $UV$ coupling constant $g$ is independant of the energy scale, i.e. the beta function is zero. Here is another argument. It does not use the trick to give expectation values to the 6 scalar fields, one remains at the expected conformal point v=0, but the trick is to use the action of the R-symmetry on the 6 scalars to have some control on a priori very complicated non-abelian gauge theory, so in some sense both arguments are quite similar. We consider the action of the conformal algebra on the local operators of the theory (beware: here, we are not assuming that the theory is conformal: the conformal algebra acts because it acts on the Minkowski space and the theory is defined on the Minkowski space, it is a purely cinematic fact). To any local field, it is possible to associate a dimension by looking at the action of the dilatation operator on it. To show that the beta function is zero is equivalent to the fact that the dimension of the coupling constant is zero. But the coupling constant appears in front of the cinetic term $Tr(F^2)$ of the gauge field in the Lagrangian density. As the Lagrangian density has to be of dimension 4, the problem is equivalent to show that the operator $Tr(F^2)$ is of dimension 4, i.e. has no anomalous dimension. As the theory is $N=4$ supersymmetric, we have in fact an action of the $N=4$ superconformal algebra on the local operators. The operators in a given superconformal representation differs one from the other by application of generators of the suoerconformal algebra. In particular, their dimensions are related in a precise way and so if an operator has no anomalous dimension then the same is true for every operator in the same superconformal representation. So it is enough to find an operator with no anomalous dimension in the same superconformal representation than $Tr(F^2)$. The theory has 6 real scalars. Let us group them as three complex scalars $Z_1$, $Z_2$, $Z_3$. By $N=4$ supersymmetry, $Tr(F^2)$ is in the same superconformal representation that $Tr(Z_1^2)$ so it is enough to show that $Tr(Z_1^2)$ has no anomalous dimension. In fact, $Tr(Z_1^2)$ is of smallest possible dimension for an operator in its superconformal representation, i.e. it is a primary operator. The superconformal algebra contains a subalgebra $so(6)$ called R-symmetry, which acts by rotation on the 6 scalars. Let $J_1$, $J_2$, $J_3$ be the three commuting generators of R-symmetry rotating respectively $Z_1$, $Z_2$, $Z_3$. Now the key point is that the operator $Tr(Z_1^2)$ is preserved by half of the supersymmetries: it is a chiral primary operator and this implies via the superconformal algebra a BPS like relation : dimension of $Tr(Z_1^2)$ = R-charge $J_1$ of $Tr(Z_1^2)$. But the R-charge is an integer so remains constant under continous deformations of the theory so the same is true of the dimension of $Tr(Z_1^2)$, i.e. $Tr(Z_1^2)$ has no anomalous dimension. 2) It is not known if $D=4$, $N=4$ SYM is exactly solvable. I would tend to say that it is not. What is likely and (almost?) shown by the work of many people in the last 15 years is the integrability of the large color limit of the theory ('t Hooft limit: gauge coupling going to zero, number of colors going to infinity, product of the gauge coupling by the number of color constant) in the planar approximation, which via AdS/CFT should be the same as type IIB strings at tree level on AdS. In a very different direction, there are many recent works on the properties of scattering properties of $D=4$, $N=4$, SYM (even at tree level, the scattering amplitudes of a great number of particles are very non-trivial). 3) It is not true that all the anomalous dimensions are zero. I hope it is more or less clear from the details of the second argument I have given in 1) for the vanishing of the beta function. The key point of this proof is that the anomalous dimension vanishes for very special classes of operators, the primary chiral ones, such as $Tr(Z_1^2)$, which preserve enough supersymmetry. It is not true for more general operators, for example for an operator as simple as $Tr(Z_1 \bar{Z}_1)$. When I wrote in 2) about integrability of the large color limit, it is mainly a work about computation of anomalous dimensions, i.e. eigenvalues of the dilatation operator. This is somehow similar to the problem of computing the eigenvalues of an Hamiltonian in a quantum mechanical problem and the notion of integrability is similar: there exists in an infinite family of commuting operators containing the dilatation operator, i.e. there is a big number of symmetries and so a hope for an exact determination of the spectrum.