Measurement appears as a process in classical statistical mechanics. In this formalism, the basic objects are not the phase space location, but the probability distribution $\rho$ on the state space. The fundamental Hamilton equation states that
$\partial_t \rho + H_p \partial_x \rho - H_x \partial_p \rho = 0 $
schematically, there is more than one degree of freedom, so you need to interpret the second and third term as summed, but this is obvious--- it is just saying that the probability is an abstract fluid, and the fluid of probability on phase space flows along the paths of the Hamiltonian trajectories, without any diffusion.
The Liouville theorem tells you the flow is incompressible, so you conclude that the volume of phase space is preserved, or restated in probability language, that the entropy of any probability distribution is constant in any Hamiltonian system over time.
The probability distribution in classical mechanics is the closest analog to the state vector, the time evolution equation is analogous to the Schrodinger equation. It is clear that it must be exactly linear, because of the hidden-information property of probability. I will state this as the coin principle.
Coin principle: If you have 50% probability of one probability distribution, and 50% probability of another probability distribution, i.e. if $\rho = \rho_1/2 +\rho_2/2$, then you can imagine flipping a coin at the beginning of time, and choosing $\rho_1$ or $\rho_2$ depending on the outcome, and the resulting distribution, if you don't know the outcome of the coin flip, is the same as evolving $\rho$.
The coin principle explains what linearity means--- it means that you can imagine learning a bit of information at the beginning of time, and then time-evolving, or learning the same bit at the end. The process of gaining/losing information is completely independent of the dynamical laws. This is what linearity means, the the probability is encoding hidden information, and the principles of probability formalize the idea that probability is describing knowledge of hidden information.
There are two entropy non-conserving unphysical processes which can happen to the abstract probability distribution, coarse graining and measurement, both of these lie outside the dynamical laws of motion in the naive formulation.
Coarse graining happens when the initial distribution gets spaghettified into densely covering a broader volume with lower density, and at some point, it is just sensible to switch over to describing the system with a higher entropy probability distribution, since the only loss of information is in practically uncomputable fine-details of the positions and momenta. In typical systems described by real numbers, if there is exponential separation of trajectories along some directions (and necessarily exponential contraction along other directions), then the coarse graining is natural when you place some practical cut-off for the precision of your positions and momenta.
The coarse-graining process is identified in statistical mechanics as the source of entropy increase in statistical mechanics. This is an experimentally observed effect, but it is contradicted superficially by the law of conservation of entropy. In the 19th century, this was the source of philosophical and interpretation squabbles about statistical mechanics and the law of entropy increase.
The statistical measurement process is that you, as an observer, can learn something about the phase space location of the system. In this case, you use your knowledge to reduce the probability distribution to a smaller volume, reducing the entropy of the system. This second process associated with observation also has no direct analog in the equations of motion, it is instead included in the abstract considerations of the meaning of probability.
If you model the observer as an external system doing computation according to physical law, with certain internal variables which encode the information of the observer, and also interacting with the given system in a Hamiltonian way, the overall conservation of entropy means that in order to gain a bit of information about the system, to observer needs to produce a log 2 of entropy somewhere else, usually by dumping kT log 2 heat into a thermal environment at temperature T. Then the collapse of the observed system is in a two-step process, first you classically entangle (i.e. correlate) the bit-value part of the state with the observer and the observer stores this bit value, reduces the entropy of the system (by learning about it) and increase the entropy of the environment commensurately.
The ability to reinterpret the "collapse" of the probability distribution of S as this kind of physical process of correlation is widely agreed upon to remove the philosophical problems in the interpretation of statistical mechanics, if there ever was such a problem. The process of an observer learning about the probability distribution of the world can be modelled in some Baysian probabilistic fasion, and in this way, you get a sensible interpretation of classical statistical mechanics.
If you try to do the same thing with quantum mechanics, you run into the issue that the reduction of information is only philosophically trouble-free when it obeys the coin-principle, when it is probability. The quantum amplitudes are formally different from probability, they can be positive or negative, so that they are not naively interpretable as ignorance of hidden variables. Amplitudes don't obey the coin-flip principle--- the mixed state produced by a coin flip is different from the coherent superposition state in quantum mechanics. This means that you can't interpret a small quantum system as a probability evolution over the same state space, it's just something different than that.
But when you look at a large quantum system, like people, we notice that the superpositions turn into standard probabilities, meaning that once you make a measurement, and learn something, the result does obey the coin-flip principle. Meaning that the result of measuring the spin of an electron in an equal superposition of spin-states is, after measurement, exactly indistinguishable from flipping a coin and considering an electron in a definite spin-state aligned with the device measurement axis.
So somehow quantum mechanics turns into classical probability for large systems. This is the measurement problem in this formulation--- how does a quantum formalism include a classical probabilistic one?
There is a minor physics problem here, which is to make sure that the result is consistent, that you won't observe interference effects for macroscopic objects. This is decoherence, and everyone agrees that macroscopic systems don't have measurable coherence effects in any practical way which can be used to refute the statistical interpretation.
A possible resolution to this is to simply consider that the statistical aspects simply come from the infinite system limit, meaning that the reduction is asymptotic. When systems are very large, and the decoherence is more and more perfect, you approach in a limiting way classical probability, so you can do probabilistic reductions, and you are doing them on the asymptotic limit of infinite size and infinite time. Because the interference effects are hard to observe even with moderate size, you can apply classical probability in even the tinest classical realm, and do reductions with squares of wavefunction values replaced by probabilities without any philosophical problem.
A slogan for this is to say "our knowledge is defined asymptotically", meaning that the information about our world is inherently an asymptotic object defined in the limit of infinite size quantum systems.
This, in my opinion, is a fully consistent philosophical point of view, and it is either Copenhagen, or Many-Worlds, or Shut Up and Calculate, depending on the philosophical words you associate to it. The only issue for me personally is that it involves an infinite size limit, to get rid of the coherence effects. It is annoying to think that all our information about the world is somehow asymtotic, that it doesn't make sense at any finite size, because if we learn some information, like say "The exchange rate for Euros to Dollars is 1:1.1", this information reduction is not consistent, because it doesn't obey the coin-flip principle exactly, meaning that the wavefuncion is not interpretable as ignorance of hidden variables. So learning about the exchange rate is not reducing wavefunction information, it is only reducing a classical probability.
What this means, to see how unphysical it is, is that you could (in principle) later have a quantum macroscopic interference effect, where another world with a different exchange rate takes the place of this world, interfering away the current world with an equal negative amplitude contribution. Of course, after this happens, the whole previous world is not there, so we would never know, so it is not clear that it is even meaningful to speak about these enormous decoherence events. This type of nonsense can't happen in classical probability, there is no interfering away of possibilities with a positive probability.
This is the philosophical puzzles in quantum mechanics--- you are treating a quantity which is not a probability, the wavefunction, as if it was encoding a probability, and the result is only consistent in the strict infinite system limit. it is disconcerting, it suggests that there is a real philosophical consistency issue with quantum mechanics.
An alternate resolution to this can be that the reduction is not at infinite system limit, at least not at strict infinite size. One can insist that the real laws of nature obey the coin-flip principle, that they should be considered as revealing values of hidden variables. If one takes this position, and asserts that the probability description is correct, and not asymptotically emergent, then you are led to consider that quantum mechanics is just a strange and convoluted way of describing a weird kind of probability distribution when the system is large.
I don't know if this works, but I don't think it's implausible, because you can see many situations where a probabilisitic evolution looks approximately reversible, and in this case, it is concievable that the evolution can be a rough poor-man's quantum mechanics. If the number of hidden variables is not obscenely large, such a thing is a real new theory, because it necessarily must fail to reproduce quantum mechanics exactly, because it can't reproduce the effective exponentially large search in quantum factoring. If the number of hidden variables is infinite, it's philosophy again, because Bohm's theory with it's absurdly enormous size, reproduces quantum mechanics exactly, although the method it uses is kind of ridiculous on physical grounds.