Non-commutativity of operators ensures that in general we can't construct a joint probability distribution over the observables that we model using those operators. In some states and for some choices of non-commuting operators we can construct joint probability distributions, for example the vacuum state and coherent states of a quantized simple harmonic oscillator generate a positive-definite Wigner function for position and momentum, which can be taken to be a probability distribution. Of course that possibility falls apart when one considers almost any superposition of coherent states, say. The Wigner function is not positive-definite in the general case, making the interpretation of the Wigner function as a probability distribution in the special cases quite tendentious.
Conversely, if we have a commutative algebra of operators we can construct a joint probability distribution over the any subset of the observables in any state over the algebra. One could take this property as a somewhat plausible definition of classicality.
For the technical basis of this, I like best two short papers, John Baez, Letters in Mathematical Physics 13 (1987) 135-136, and Lawrence J. LANDAU, PHYSICS LETTERS A, Volume 120, number 2 (1987), which put remarkably little interpretation in the way of the mathematics, but there is a substantial literature that has tried to get at this relationship in some sort of clear way.
A literature that gives an alternative way into the relationship between non-commutativity and measurement, and that focuses on the relationship between quantum theory and classical probability theory in a way that I find helpful, albeit not conclusive, is the positive-operator valued measure approach, which is well represented by the book by Paul Busch, Marian Grabowski, and Pekka J. Lahti, Operational Quantum Physics, Springer, 1995. Searching the literature or the ArXiv for anything more recent by any of these three authors will give you something enlightening to read. To my taste, Paul Busch is always worth reading.
As far as physicality is concerned, classical physics models measurements as not affecting other measurements, so that joint probability distributions over multiple measurements are possible. In the presence of any finite level of noise ---there always is noise, everywhere (only the thermal component of the noise goes away when one is close to absolute zero, the Lorentz invariant quantum component of the noise is not controllable)--- the uncontrolled nature of the noise is something that has to be accommodated by our models of our measurements. Quantum theory accommodates the non-trivial effects of joint measurements on each other by introducing non-commutativity of the operators that are used to model the measurements, whereas classical physics models the non-trivial effects of joint measurements on each other by modeling the measurement apparatus. Contextual models are precisely models that include the measurement apparatus, or the complete experimental apparatus, in the extreme case the whole universe, not just a putative measured system.
That's somewhat bashed out. Hope someone finds it congenial.
This post has been migrated from (A51.SE)