Actually the $\partial_\mu$ is a "total derivative", otherwise it wouldn't be a total divergence and we would not be able to get rid of it during the derivation of Lagrange equations
$$\partial_\mu \mathcal{L} \equiv \frac{\partial \mathcal{L}}{\partial \phi} \partial_\mu \phi + \frac{\partial \mathcal{ L}}{\partial ( \partial _\nu \phi)} \partial_\mu \partial _\nu \phi$$
This is a slightly confusing but in the end somewhat practical convention in classical field theory. For instance, you can see that this very same derivative is used in the Lagrange equations
$$\partial_\mu ( \frac{\partial \mathcal{ L}}{\partial ( \partial _\mu \phi)}) - \frac{\partial \mathcal{L}}{\partial \phi} = 0$$
etc. This notation is possible only thanks to the fact that the "real partial derivative" of $\mathcal{L}$ is postulated to be always zero and that there cannot thus be any ambiguity in what is meant by $\partial_\mu \mathcal{L}$.
Some authors prefer to use notation such as $ d/dx^\mu$ or $D/dx^\mu$ to underline the "totalness" but this always feels like notation abuse to me. My opinion is that if anything should change then it is the discussion of the "omitted pullback" $\mathcal{L}(\phi,...) \to \mathcal{L}(\phi(x^\mu),...)$, what does $\partial/\partial (\partial_\mu \phi)$ really mean in term of the pullback and so on.