The argument from the 1975 Polyakov paper seems to hinge on the following:
These instantons are locally metastable, meaning a local perturbation won't necessarily destroy them. That's what the $\delta H = 0$ condition is ensuring. One can also show (through some work I won't go through) his statements that:
- The radius of these instanton configurations remains roughly constant as the temperature or scale of the system is varied; this means they stay a finite size
- Their average distance between instantons $r_{av} \sim a e^{E/T}$ (a being the lattice spacing) might increase with lowering temperature, but is always finite for a finite temperature. This can be thought of as an entropy effect; so long as you're at a finite temperature, these metastable configurations will exist.
The argument is thus: any long-wavelength correlation function of spins loses its correlations between the spin at x=0 and the spin roughly at $x \sim r_{av}$ since, well, the ferromagnetic correlations at short-distance scales and moderate-distance scales are there, but on average you'll get the correlations to disappear because you'll reach these other configurations. Therefore, your correlation function will look like
$\langle S_{x=0} S_{x=r}\rangle \sim \begin{cases} G_{ferro}(r) & r \ll r_{av} \\ e^{-mr} & r \gg r_{av} \end{cases}$
where $m$ is just some constant. It's likely there are corrections to this form. The exponential decay is still nonzero essentially because there's a small probability you haven't 'encountered' these metastable configurations in the equilibrium system yet even at these large distances. One can in fact interpolate between these two points by introducing a length scale factor that suppresses the exponential form at small $r$: $\langle S_{x=0} S_{x=r}\rangle \sim G_{ferro}(r) e^{-\frac{mr}{r_{av}}}$. The latter factor is basically 1 for $mr \ll r_{av}$, so it reproduces the above form.
The point (1) above, that these instantons remain a fixed radius, means that if you were to do an RG analysis, once you get to the point where you block-average over length scales of $r_{av}$, you will encounter new terms corresponding to these instantons, which will also lead you to this conclusion of a gapped state. I haven't done the analysis, so I can't comment on the exact form this would take. (I suspect these instantons will come out as looking like delta-functions as T goes to 0, since $R_{instanton}/r_{av}$ goes to 0 exponentially)
Something to keep in mind are that there can also be domain wall defects (line defects). I believe this is discussed in Chaikin & Lubensky's condensed matter physics book. The argument is also one of entropy; as the system gets larger and larger, the likelihood that there exists a domain wall somewhere in the system increases to 1 for finite temperatures. Other sorts of defects in general can occur as well.
I can't comment on the O(4) model case, sorry.
EDIT 4/6/2022: I can now comment on the O(4) model case, at least classically, and hazard my guess for how it affects the problem in QM/QFT. As discussed in Chaikin & Lubensky, Principles of Condensed Matter Physics Ch.9, if you have an O(N) spin model (order parameter is therefore degenerate on a sphere $S_{N-1}$ on a d-dimensional system, then for topological defects to be (meta)stable to infinitesimal deformations, you have restrictions on the dimensional character of the defect.
Define $d' = d - d_s$, where $d$ is the dimension of the space (= 2 here) and $d_2$ is the dimensionality of the 'core' of the defect. For 2d vortices, the core is a point (0d); for vortices in a 3D system, you may have the core be a line ($d_s = 1$). A defect must have $n <= d'$, where $n$ is N above, aka the symmetry of the order parameter characterizing the state. If $N>d'$, then one can use the extra dimensionality of the order parameter to continuously deform the spins such that you can get back to the uniform/regular configuration.
QM fluctuations add another dimension, so I think the 2-dimensional space of the XY model can be extended to be considered 3-dimensional for the purposes on this context. Therefore the vortices are topologically stable for the O(3) model, whereas in the O(4) model they wouldn't be. In the context of this question/answer, this means that configurations with topological defects will not show up as local minima of the action if you have an O(4) model on a 2-dimensional lattice. However, that does not mean there is or isn't a phase transition. It just means the same argument precluding the O(3) model having a phase transition does not apply here, but does not guarantee one way or the other that the O(4) model has a transition.