This question is about the renormalization procedure applied to classical electrodynamics. In classical electrodynamics, renormalization is perhaps surprisingly more difficult and less consistent than in quantum electrodynamics.
The main difficulty is that the classical field contribution to the mass of an electron cut off to radius R goes as $e^2/R$, which is linearly divergent. The classical field-mass becomes equal to the mass of the electron when R is in magnitude equal to the classical electron radius $e^2/m^e$. Making the classical electron smaller than this leads to a negative classical bare mass, and the unphysical bare pointlike electron limit produces negative-mass inconsistencies as a by-product.
The basic inconsistency is that a negative mass bare classical electron can accelerate to very fast velocities, making a larger and larger negative energy, at the same time radiating energy in the electromagnetic field, keeping the total energy fixed. These are the self-accelerating exponentially blowing up solutions which come from naively integrating the equation of motion with third-derivatives and no special constraints on the motion.
Dirac's attempted solution to this was to reject the self-accelerating solutions by a teleological constraint, you demand that the solution to a third-order equation be well behaved asymptotically. This more or less produces physical behavior at normally long time and distance scales.
As you noticed in the body of your question, this is also automatically is what happens when you treat the third-derivative radiation-reaction term perturbatively, because the perturbation series starts with solutions to the second-order Newtonian equations, and the perturbation series can be made to avoid the exponentially blowing up solutions. This is why the perturbation description hides the fundamental inconsistency of the classical theory.
The Dirac approach, rejecting the self-accelerating solutions, gives physical motion, more or less, but it produces non-causal behavior--- there is pre-acceleration of the electron in the Dirac theory. This means that if a classical electromagnetic step-function wave is going to hit an electron, the electron responds a little bit before the wave hits, the acausal response is exponentially decaying in time with a scale equal to the classical electron radius. This is why the classical renormalization program ultimately fails at the classical electron radius, you simply need structure. The electron is just being artificially called a point in the dirac approach, the pre-acceleration reveals it is really extended, and the scale for new structure demanded by classical physics is, as always, the classical electron radius.
But for quantum electrodynamics, the classical analogy is misleading, at least for small values of the fine-structure constant. The first miracle of quantum electrodynamics is that the self-field of the electron, despite classical physical intuition that it should diverge linearly, is only logarithmically divergent. This was first demonstrated in the old perturbation theory days by Weisskopf, but it's obvious today in the relativistic perturbation theory--- the contribution of the self-field of the electron is from a diagram where you have an electron line with a photon going out and coming back to the same line, and the short-distance divergence is when the proper travel-time of the electron is short. This diagram, when regulated, only diverges as the log of the cutoff, same as every other divergent one-loop diagram in QED. The modern covariant methods make the result too easy, they end up hiding an important physical difference between quantum and classical electromagnetism.
What's physically softening the classical linear divergence? The main reason, explained by Weisskopf, and made completely obvious in the formalism of Stueckelberg, Feynman, and Schwinger, is that the intermediate states for short times between photon emission and absorption involves a sum over both electron and positron states together, as for short propagation times, you shouldn't separate relativistic electron from positron states. The contribution calculated artificially using only electron intermediate states between emission and absorption is linearly divergent, just as in the classical theory, and of course the same holds for positron self-mass truncating to only positron intermediate states. But for the true relativistic field theory calculations, you have to add both contributions up, and the cutoff has to respect relativistic invariance, as in Pauli-Villars regularization, so it must be the same for both positrons and electrons. The contributions of positrons are opposite sign to the contributions of the electrons, as the positron field is opposite sign, and cancels the electron field. When the distances become short, the main classical linear divergence is cancelled away, leaving only the relativistically invariant log divergence, which is the short-distance renormalization mass correction considered in renormalizing quantum electrodynamics. Notice that this ignores completely the classical issue of the response of the self-field, which is important for distances larger than the Compton wavelength. For those large distances, positron contributions can be naturally nonrelativistically separated from electron contributions, and contribute negligibly to the field, so that it turns into the classical point field for the electron.
One physical intepretation is that the electron's charge is not concentrated at a point on any time-slice in quantum electrodynamics, but the back-and-forth path in time of the electron means that the charge on any one time slice has both electron and positron intersections, and the charge ends up fractally smeared out over a region comparable to the Compton wavelength, partially smoothing the charge and mollifying the divergence.
The ratio of the classical electron radius, where the classical renormalization program breaks down, to the Compton wavelength $1/m_e$, where relativistic quantum renormalization kicks in, is the fine-structure constant by definition. The fine-structure constant is small, which means that the linearly divergent corrections to the self-mass are replaced by the log-divergent corrections before getting a chance to make a significant contribution to the self-mass of the electron, or any other pointlike charged particle (small, but not completely negligible. For pions, due to the Goldstone nature and small mass, the electromagnetic field explains the mass splitting, as understood in the 1960s). So the classical inconsistency is side-stepped for a while, just because the divergence is softened.
But the problems obviously can't fully go away, because if you were to twiddle the parameters and make the fine structure constant large, you make the classical electron radius larger than the Compton wavelength, at which point the classical self-field surrounding a nonrelativistic electron, even in the nonrelativistic limit, would have more energy than the electron itself, and the renormalization procedure would break down, requiring unphysical properties at a normal accessible scale.
But in the case of our universe, with small fine-structure constant, the log running means that the problems don't kick in until an absurdly high energy scale. Still, they do kick in, and this is at the enormously large Landau pole energy, which is where the coupling of the electron runs to big enough that the fine structure constant is no longer small. This energy is larger than the Planck energy, so that at this point, we expect the field theory description to break down anyway, and be replaced by a proper fundamental gravitational theory, by string theory, or a variant.
This part is old well known, and explains the disconnect between the modern quantum renormalization program and the old classical renormalization program. The quantum renormalization program works over an enormous range, but because of the technical differences of the divergences, it simply doesn't answer the classical renormalization questions.
But one can go further today, and consider different cases, where one can gain some insight into the renormalization program. There is a case where you do have a large value of the fine-structure constant: magnetic monopoles. The magnetic charge is inverse to the electric charge. The magnetic monopoles in, say, the Schwinger model (SU(2) gauge theory Higgsed to U(1) at long distances) can't be inconsistent, as it is asymptotically free at high energies. But in this case, the monopole is not pointlike, in this, it is an extended field configuration. The soliton size is determined from the Higgs field details, and when the Higgs field is physical, there is no renormalization paradox for monopoles either--- monopoles in consistent gauge field theories are simply solitons which are bigger than the classical monopole radius. They do have a third-derivative radiation reaction term in the equation of motion at long scales, but it isn't a paradox, because the pre-acceleration can be understood as a response of parts of the monopole to a field, and it isn't acausal. Runaway solutions can't occur, because there is an energy bound for a moving solution, the never is any negative energy density anywhere, unlike in the classically renormalized point particle theory, so you can't balance a negative mass moving faster and faster (and gaining negative energy) with a field gaining positive energy.
It is interesting to consider the classical issues in the gravitational context, now that we have understanding not only of the quantum renormalization program, but also of quantum gravity in the form of string theory, which solves the quantum renormalization problem for good. The classical long-distance version of string theory is general relativity, and the case, the classical charged point particle analog is an electrogravitational solution, a black hole of mass m with charge e. In order for the black hole to be sensible, $m>e$, and in this case, the classical solution is classically well behaved, but it is not pointlike--- it has a radius of size m. The pointlike limit requires taking m to zero and therefore e to zero, and the electromagnetic self energy e^2/m is not only less than m, it is less and less a contribution to the mass as e gets smaller.
In the case that you choose e large, it seems that the extremal black hole would get a negative "bare mass". But the behavior of the larger than extremal black hole is always perfectly causal. The apparent negative mass required here is nonsense, you are just ignoring the negative gravitational field energy--- the extremal black hole can be interpreted as the case where the total field energy, electromagnetic plus gravitational, is equal to the total mass.
In string theory, the quantum analog of the extremal black hole solutions are the fundamental objects, these are the strings and branes. Here, there is an interesting reversal: the classical restriction for black holes, $m\ge e$, is reversed in string theory, for the special case of the lightest charged particle. In this case, the requirement that a black hole can decay completely suggests strongly that the lightest charged particle must obey the opposite inequality, that is, that $m\le e$ for the lightest quantum.