Here is a more direct way of seeing it. Instead of working with the Heaviside function, consider the function
$$\frac{1}{2}\,\left(1+\tanh{\frac{x}{\epsilon}}\right)~.$$
Now, this is a nice, smooth function for finite $\epsilon$. But if you start to make $\epsilon$ smaller (try plotting it for values like $\epsilon=1$ and $\epsilon=0.1$) you'll see that it looks more and more like the Heaviside step function. Taking the derivative with respect to $x$ gives
$$\frac{1}{2\epsilon}\,\text{sech}^{2}\frac{x}{\epsilon}~,$$
which, as expected, looks more and more like the delta function for smaller and smaller values of $\epsilon$. In fact, these functions give nice representations of the step and delta functions, respectively, in the $\epsilon \to 0$ limit.
This representation of the step function has a "jump" of 1 at $x=0$, so to get a jump of $\alpha$ you could start with
$$\frac{\alpha}{2}\,\left(1+\tanh{\frac{x}{\epsilon}}\right)~.$$
Taking the derivative with respect to $x$ explains why you get $\alpha\,\delta(x)$.
This post imported from StackExchange Mathematics at 2014-06-09 19:12 (UCT), posted by SE-user Robert McNees