I find it helpful to think of QFT as a signal analysis formalism; we have to think about both how to model both creating and measuring physical states, which are physically realized as signals (if you think about CERN or indeed about any experiment, everything comes down to signals that are stored and subsequently analyzed, perhaps only a few or many millions of signals; much of the signal analysis is ad hoc, as when an electronic circuit decides when an event happens and records the time it happened instead of recording the signal in detail attosecond by attosecond, but the QFT part of the signal analysis is much more idealized and systematic). For an elementary starting point, think about the vacuum state, which maps an operator $\hat A$ to a complex number, which we can write as $\omega(\hat A)=\langle 0|\hat A|0\rangle$ (other states are possible as starting points, particularly thermal states, but not in this answer). All the operators $\hat A$ are generated by quantum field operators $\hat\phi_f$, by multiplying and adding operators together.
The $f$'s are called "test functions", but in signal analysis they would be called "window functions", which is useful insofar as they describe what "window" we're looking through when we record measurement results; we'll see below that we can also usefully call them "modulation functions", because they also describe modulations of the vacuum state. The "usual" quantum fields, which are indeed operator-valued distributions, can be obtained by taking a test function that we can loosely think of as zero except at a point $y$, say, where it's infinite, which we write as the Dirac delta function (in four dimensions), $\delta_y(x)=\delta^4(x-y)$, so we can write $\hat\phi(y)=\hat\phi_{\delta_y}$ (and we can do something similar for the fourier transform), but this object has to be used with considerable care because products such as $\hat\phi(y)\hat\phi(y)$ are not well-defined (there are various strategies to accommodate this, taking the quantum field to be generalized functions of various kinds instead of as distributions, or, if we do flagrantly use such products, we will have to adjust the results after the event by subtracting infinite numbers as necessary).
The vacuum state provides a probability distribution for any self-adjoint operator $\hat A$ (that is, any operator for which $\hat A^\dagger=\hat A$, which I'll assume you know of from QM). For free bosonic quantum fields, for which we have a good mathematical definition (the other elementary example being free fermion quantum fields), the characteristic function for $\hat\phi_f$,
$$\langle 0|\mathrm{e}^{\mathrm{i}\lambda\hat\phi_f}|0\rangle=\mathrm{e}^{-\frac{1}{2}\lambda^2(f^*,f)},$$
allows us to work out the probability distribution for $\hat\phi_f$ by fourier inverse transform, provided $\hat\phi_f^\dagger=\hat\phi_{f^*}=\hat\phi_f$,
$$\langle 0|\delta(\hat\phi_f-v)|0\rangle=\frac{1}{\sqrt{2\pi(f,f)}}\mathrm{e}^{-\frac{v^2}{2(f,f)}},$$
indeed with the commutator $[\hat\phi_f,\hat\phi_g]=\hat\phi_f\hat\phi_g-\hat\phi_g\hat\phi_f=(f^*,g)-(g^*,f)$ it allows us to work out a probability distribution for any other self-adjoint operator. The object $(f,f)$ is the crucial geometric object for a free QFT; for the Klein-Gordon field it is
$$(f,g)=\hbar\int\tilde f^*(k)\tilde g(k)2\pi\delta(k{\cdot}k-m^2)\theta(k_0)\frac{\mathrm{d}^4k}{(2\pi)^4},$$
while for the EM field it is
$$(f,g)=-\hbar\int k^\mu\tilde f_{\mu\alpha}^*(k)k^\nu\tilde g_\nu^{\ \alpha}(k)2\pi\delta(k{\cdot}k)\theta(k_0)\frac{\mathrm{d}^4k}{(2\pi)^4},$$
both of these being positive semi-definite sesquilinear forms, which in math we would also call pre-inner products, both also being translation and manifestly Lorentz invariant, and both also being zero except when the wave-number $k^\mu$ is in the forward light-cone. A final property is that the commutator $[\hat\phi_f,\hat\phi_g]=(f^*,g)-(g^*,f)$ is (somewhat non-obviously, but as it has to be to ensure the kind of causality required in QFT) zero whenever the supports of $f$ and $g$ are space-like separated (the support of $f$ is the region of space-time where $f(x)\not=0$). Notice that $(\delta_x,\delta_x)$ is undefined ---loosely we can say it's infinite, so that the variance of the probability distribution for $\hat\phi(x)$ is infinite---, which is the most graphic reason for thinking we can't measure $\hat\phi(x)$.
The vacuum state allows us to construct a Hilbert space of what I find it helpful to call "modulated states", with the simplest example being
$$\omega_g(\hat A)=\frac{\langle 0|\hat\phi_g^\dagger\hat A\hat\phi_g|0\rangle}{\langle 0|\hat\phi_g^\dagger\hat\phi_g|0\rangle}=\frac{\langle 0|\hat\phi_g^\dagger\hat A\hat\phi_g|0\rangle}{(g,g)},$$
for which we obtain a different probability distribution for $\hat\phi_f$,
$$\omega_g(\delta(\hat\phi_f{-}v)){=}\frac{\langle 0|\hat\phi_g^\dagger\delta(\hat\phi_f{-}v)\hat\phi_g|0\rangle}{(g,g)}\qquad\qquad\qquad\qquad$$
$$=\frac{1}{\sqrt{2\pi(f,f)}}\left[1{-}\frac{\textstyle|(f,g)|^2}{\textstyle(f,f)(g,g)}\left(1{-}\frac{\textstyle v^2}{\textstyle(f,f)}\right)\right]\mathrm{e}^{-\frac{v^2}{2(f,f)}}.$$
Notice the very significant fact that what we have modulated is not the field, but the probability distributions. QFT is an essentially stochastic theory. Notice the $v^2$ term in the above expression (not in the exponential, in the part that changed): if we work out what the probability distribution is for more elaborately modulated states, we'll find $v^4$ terms, $v^3$ terms, any power you like. In the way I've presented QFT (not the only way it can be done, of course), the highest power is one measure of how excited the state is. Ways of talking about QFT are important, IMO; I think it's good to talk about modulating a "state" to obtain a different "state" (not about modulating "the field"), which determines how the expected measurement results (for many different $\hat A$'s) will be modulated relative to the original state.
QFT proper takes the test function space to be infinite-dimensional, most often what's called the Schwartz space of functions that are smooth both in real space and in wave-number space, but all the above construction works quite well if we just take the test function space to contain just a few test functions, say $\{f_1, f_2, f_3\}$ (making sure that $(f_1,f_1)$, $(f_2,f_2)$, and $(f_3,f_3)$ are all finite), because we have preserved manifest Lorentz covariance.
There are lots of details that I've not covered, but for me this is the most helpful way I know to understand QFT (and also helpful when thinking about alternative ways to introduce interactions, on which I've not touched at all, although interacting theories are still about measuring and modulating states). Good luck, but, finally, a disclaimer: the above math can all be checked, it's OK, but the formalism and language I've used is very different than you will see and hear if you watch videos of lectures on QFT from the Perimeter Institute, say (and many others are available on YouTube), so use the above with discretion.