ZapperZ pointed out this paper[I'll refer to this as SA] (which also led me to this paper and to this earlier paper). All of these are open access. The notation and the language that has become commonplace in quantum information/computation has come to obscure the issues involved, IMO, so I wanted to run what these papers do, reformulated in terms of weak von Neumann measurements, through PhysicsOverflow.
A weak von Neumann measurement associated with a self-adjoint operator $\hat A$ that has spectral decomposition $\hat A=\sum_i a_i\hat P_{\!A_i}$, where $\sum_i\hat P_{\!A_i}=1$, transforms a state $\rho:\hat O\mapsto \mathsf{Tr}[\hat\rho\hat O]$ to the state $$\rho_A:\hat O\mapsto \sum_i\mathsf{Tr}[\hat P_{\!A_i}\hat\rho\hat P_{\!A_i}\hat O]=\sum_i\mathsf{Tr}[\hat\rho\hat P_{\!A_i}\hat O\hat P_{\!A_i}],$$ which discards all terms $\mathsf{Tr}[\hat\rho\hat P_{\!A_i}\hat O\hat P_{\!A_j}]$ for which $i\not=j$. SA refers to such operations as "measure-and-reprepare operations" [SA, last sentence in the first complete paragraph on page 4].
SA takes Alice to be able to carry out a weak von Neumann operation while Bob can only carry out a unitary operation, $\rho_U:\hat O\mapsto \mathsf{Tr}[\hat\rho\hat U^\dagger\hat O\hat U]$ [SA, second complete paragraph on page 4], but they would prefer both to be able to carry out a weak von Neumann operation, so I'll work with that (just remove the sum over $j$ and replace $\hat P_{\!B_j}...\hat P_{\!B_j}$ by $\hat U_{\!B}^\dagger...\hat U_{\!B}$ everywhere, if preferred).
If we perform two weak von Neumann measurements in sequence, we obtain $$\begin{array}{r c l}
\rho_{AB}:\hat O&\mapsto&\sum_{i,j}\mathsf{Tr}[\hat\rho\hat P_{\!A_i}\hat P_{\!B_j}\hat O\hat P_{\!B_j}\hat P_{\!A_i}]\mathrm{\ or}\cr
\rho_{BA}:\hat O&\mapsto&\sum_{i,j}\mathsf{Tr}[\hat\rho\hat P_{\!B_j}\hat P_{\!A_i}\hat O\hat P_{\!A_i}\hat P_{\!B_j}],
\end{array}$$ which in general will be different states, whenever $[\hat A,\hat B]\not=0$, so any convex linear combination $\lambda\rho_{AB}+(1-\lambda)\rho_{BA}$ is also a state [cf. SA, Eq. (6)]. The papers above then introduce a third measurement $\hat C$, which is actually measured. Supposing we again introduce a spectral decomposition, we have the expectation values $$\sum_{i,j}\mathsf{Tr}\!\left[\hat\rho\!\left(\lambda\hat P_{\!A_i}\hat P_{\!B_j}\hat P_{\!C_k}\hat P_{\!B_j}\hat P_{\!A_i}+(1-\lambda)\hat P_{\!B_j}\hat P_{\!A_i}\hat P_{\!C_k}\hat P_{\!A_i}\hat P_{\!B_j}\right)\right]$$ to work with. The task at this point is to construct what they call a "causal witness", which is essentially to construct a set of density matrices $\hat\rho_i$ that verify that, given that $\hat A$ and $\hat B$ are being applied as weak von Neumann measurements before the $\hat C$ measurement, $0<\lambda<1$.
It seems noteworthy that this construction does not need the apparatus of qubits. If we do use qubits, one qubit is enough. I'm not touching on the experimental implementation at all here, but am I missing something of the theoretical construction? I take the measure-and-reprepare operation to be understandable as a logical matrix operation on the density matrix that is essentially atemporal beam-line ordered. If this approach fails, is there another approach that does not make such a big deal of causal/temporal order (my ulterior motive, cf also ZapperZ)?
Update: Quantum Optics is a "beam-line formalism". That is, the order of operations is determined by position along each beam-line (each of which is ordered by where the source of the beam is). Quantum Optics removes time from the formalism by taking beam-line ordering as a proxy for temporal ordering. If we are to talk about causality, however minimally, it seems that time should be added back into the formalism. If measurements are made in different orders, then we might, for example, write $\lambda\rho_{A_1B_2}+(1-\lambda)\rho_{B_1A_2}$ for the compound measurement, acknowledging that in a more complete description of the experiment (something that goes beyond a small-dimensional beam-line Hilbert space formalism in which $\hat A_1=\hat A_2$, $\hat B_1=\hat B_2$) a measurement $\hat A_1$ at a time $t_1$ that is before a measurement $\hat B_2$ at time $t_2$ would be different from a measurement $\hat A_2$ at a time $t_2$ that is after a measurement $\hat B_1$ at time $t_1$. That is, in a more temporally complete description the statistical mixture would be of different pairs of operations rather than of a different causal ordering. In any case, I see no reason to reason here for doubting microcausality.