Sources for the path integral
You can read any standard source, so long as you supplement it with the text below. Here are a few which are good:
- Feynman and Hibbs
- Kleinert (although this is a bit long winded)
- An appendix to Polchinski's string theory vol I
- Mandelstam and Yourgrau
There are major flaws in other presentations, these are good ones. I explain the major omission below.
Completing standard presentations
In order for the discussion of the path integral to be complete, one must explain how non-commutativity arises. This is not trivial, because the integration variables in the path integral for bosonic fields or particle paths are ordinary real valued variables, and these quantities cannot be non-commutative themselves.
Non-commutative quantities
The resolution of this non-paradox is that the path integral integrand is on matrix elements of operators, and the integral itself is reproducing the matrix multiplication. So it is only when you integrate over all values at intermediate times that you get a noncommutative order-dependent answer. Importantly, when noncommuting operators appear in the action or in insertions, the order of these operators is dependent on exactly how you discretize them--- whether you put the derivative parts as forward differences or backward differences or centered differences. These ambiguities are extremely important, and they are discussed only in a handful of places (Negele/Orland Yourgrau/Mandelstam Feynman/Hibbs, Polchinski, Wikipedia) and hardly anywhere else.
I will give the classic example of this, which is enough to resolve the general case, assuming you are familiar with simple path integrals like the free particle. Consider the free particle Euclidean action
$$ S= -\int {1\over 2} \dot{x}^2 $$
and consider the evaluation of the noncommuting product $x\dot{x}$. This can be discretized as
$$ x(t) {x(t+\epsilon) - x(t)\over \epsilon} $$
or as
$$ x(t+\epsilon) {x(t+\epsilon) - x(t)\over \epsilon}$$
The first represents $p(t)x(t)$ in this operator order, the second represents $x(t)p(t)$ in the other operator order, since the operator order is the time order. The difference of the second minus the first is
$$ {(x(t+\epsilon) - x(t))^2\over \epsilon} $$
Which, for the fluctuating random walk path integral paths has a fluctuating limit which averages to 1 over any finite length interval, when $\epsilon$ goes to zero. This is the Euclidean canonical commutation relation, the difference in the two operator orders gives 1.
For Brownian motion, this relation is called "Ito's lemma", not dX, but the square of dX is proportional to dt. While dX is fluctuating over positive and negative values with no correlation and with a magnitude at any time of approximately $\sqrt{dt}$, dX^2 is fluctuating over positive values only, with an average size of dt and no correlations. This means that the typical Brownian path is continuous but not differentiable (to prove continuity requires knowing that large dX fluctuations are exponentially suppressed--- continuity fails for Levy flights, although dX does scale to 0 with dt).
Although discretization defines the order, not all properties of the discretization matter--- only which way the time derivative goes. You can understand the dependence intuitively as follows: the value of the future position of a random walk is (ever so slightly) correlated with the current (infinite) instantaneous velocity, because if the instantaneous velocity is up, the future value is going to be bigger, if down, smaller. Because the velocity is infinite however, this teensy correlation between the future value and the current velocity gives a finite correlator which turns out to be constant in the continuum limit. Unlike the future value, the past value is completely uncorrelated with the current (forward) velocity, if you generate the random walk in the natural way going forward in time step by step, by a Markov chain.
The time order of the operators is equal to their operator order in the path integral, from the way you slice the time to make the path integral. Forward differences are derivatives displaced infinitesimally toward the future, past differences are displaced slightly toward the past. This is is important in the Lagrangian, when the Lagrangian involves non-commuting quantities. For example, consider a particle in a magnetic field (in the correct Euclidean continuation):
$$ S = - \int {1\over 2} \dot{x}^2 + i e A(x) \cdot \dot{x} $$
The vector potential is a function of x, and it does not commute with the velocity $\dot{x}$. For this reason, Feynman and Hibbs and Negele and Orland carefully discretize this,
$$ S = - \int \dot{x}^2 + i e A(x) \cdot \dot{x}_c $$
Where the subscript c indicates infinitesimal centered difference (the average of the forward and backward difference). In this case, the two orders differ by the commutator, [A,p], which is $\nabla\cdot A$, so that there is an order difference outside of certain gauges. The correct order is given by requiring gauge invariance, so that adding a gradiant $\nabla \alpha$ to A does nothing but a local phase rotation by $\alpha(x)$.
$$ ie \int \nabla\alpha \dot{x}_c = ie \int {d\over dt} \alpha(x(t))$$
Where the centered differnece is picked out because only the centered difference obeys the chain rule. That this is true is familiar from the Heisenberg equation of motion:
$$ {d\over dt} F(x) = i[H,F] = {i\over 2} [p^2,F] = {i/2}(p[p,F] + [p,F]p) = {1\over 2}\dot{x} F'(x) + {1\over2} F'(x) \dot{x}$$
Where the derivative is a sum of both orders. This holds for quadratic Hamiltonians, the ones for which the path integral is most straightforward. The centered difference is the sum of both orders.
The fact that the chain rule only works for the centered difference means that people who do not understand the ordering ambiguities 100% (almost everybody) have a center fetishism, which leads them to use centered differences all the time.
THe centered difference is not appropriate for certain things, like for the Dirac equation discretization, where it leads to "Fermion doubling". The "Wilson Fermions" are a modification of the discretized Dirac action which basically amounts to saying "Don't use centered derivatives, dummy!"
Anyway, the order is important. Any presentation of the path integral which gives the Lagrangian for a particle in a magnetic field without specifying whether the time derivative is a forward difference or a past difference, is no good. That's most discussions.
A good formalism for path integrals thinks of things on a fine lattice, and takes the limit of small lattice spacing at the end. Feynman always secretly thought this way (and often not at all secretly, as in the case above of a particle in a magnetic field), as does everyone else who works with this stuff comfortably. Mathematicians don't like to think this way, because they don't like the idea that the continuum still has got new surprises in the limit.
The other thing that is hardly ever explained properly (except for Negele/Orland, David John Candlin's Neuvo Cimento original article of 1956, and Berezin) is the Fermionic field path integral. This is a separate discussion, the main point here is to understand sums over Fermionic coherent states.