First you should confront the question why should I think of the $\delta$-function as a function at all? If you are trying to imagine it as a real-valued function of real inputs, which just happens to be $0$ just about everywhere, then you are off to a bad (but very common) start. You can define $\delta$ as a symbol with certain properties relating to combining it with an actual function and some other symbols (e.g. $\int$), and this really suffices for most purposes, so why insist on trying to cram such an interesting object into a limited definition of "function?"
So instead, let's take a different approach. Let $f : \mathbb{R} \to \mathbb{C}$ be a generic function from the reals to the complexes. Consider the set of all1 such functions, and call it $L$ for lack of a better letter. $L$ is a set just like $\mathbb{R}$, and so we can define maps (read: functions) from it to $\mathbb{C}$ as well. The $\delta$-function is one such beast, defined by
\begin{align}
\delta : L & \to \mathbb{C} \\
f & \mapsto f(0).
\end{align}
Thus it is a function, but not of real numbers. It is a function of functions of reals, which is sometimes called a functional.
So what about the integrals? Well you can also approach this in a limiting fashion. One way is to note that
$$ \lim_{\sigma\to0} \int\limits_\mathbb{R} f(x) \frac{1}{\sqrt{2\pi\sigma^2}} \mathrm{e}^{-x^2/2\sigma^2} \mathrm{d}x = f(0). $$
Exchange the limit and the integral2, and you see that there is a "function" - or rather a limit of a sequence of functions from $L$ that is itself not a member of $L$ - whose values seem to be given by
$$ \delta(x) = \lim_{\sigma\to0} \frac{1}{\sqrt{2\pi\sigma^2}} \mathrm{e}^{-x^2/2\sigma^2}. $$
This is what a distribution is, with terminology suggestive of the probability distributions one so often integrates against (though I could be mistaken on the etymology). Note though that we really weren't allowed to switch that limit and integral while we still called that Gaussian-looking thing a member of $L$. After all, taking the pointwise limit first produces something that vanishes everywhere but a point, and such an object will cause the Lebesgue integral we were using to vanish as well.
In any event, the integral was there from the very beginning. You can think of this as overbearing notation for what we really wanted to say: "Give the value that results when $\delta$ acts on $f$." The integral notation has another advantage, though, and that is in connection with inner product spaces. Secretly, we constructed $L$ to be a vector space over $\mathbb{R}$. Then the set of linear maps from $L$ to $\mathbb{C}$ form its dual space $L^*$. For every $g \in L$ there is a corresponding $g^* \in L^*$, which can conveniently be represented in this integral notation as the complex conjugate of $g$.3 The inner product of $f$ and $g$ is
$$ \langle f | \underbrace{g}_{g\in L} \rangle = \int\limits_\mathbb{R} f(x) \underbrace{g^*}_{g,g^*\in L}(x) \mathrm{d}x, $$
and so you can identify
\begin{align}
\underbrace{g^*}_{g^*\in L^*} : L & \to \mathbb{C} \\
f & \mapsto \int\limits_\mathbb{R} f\underbrace{g^*}_{g,g^*\in L}.
\end{align}
Now for every $g \in L$ there is a corresponding dual member that you can write as the complex conjugate of $g$ for the purposes of such integration, but the converse is not true.4 $\delta$ is an example of a member of $L^*$ that has no actual function in $L$ we can complex conjugate and integrate against to replicate its behavior.
1 In practice this is often too much. It's better to restrict attention to, e.g., all square-integrable functions from $\mathbb{R}$ to $\mathbb{C}$.
2 Beware! A very dangerous thing to do!
3 Yes, we are about to thoroughly abuse the two meanings of $*$ - be on the lookout.
4 It won't be in general unless $L$ is finite-dimensional, but in that case you have Kronecker deltas and finite sums rather than Dirac deltas and integrals.
This post imported from StackExchange Mathematics at 2014-06-16 11:24 (UCT), posted by SE-user Chris White