The problem is that the observed "discreteness" is not due to highly localized wave packet. In a double-slit experiment the wave is wide and the observed signals are discrete, at certain positions whose uncertainty is much smaller than the wave size. So the "wave picture" consists of (drawn by) many discrete points.
It looks like before the screen, particles propagate as a wave, they "know "about rather distant obstacles, and while registration they show discreteness, as if they were "point-like" entities. Thus we assign the wave to a probability amplitude rather than to the particle wave packet concentration.
If a particle is indeed point-like or small, how does it know about distant obstacles? Note, for determining the wave propagation, one should take into account some boundary conditions, which are "distant", but influence the wave form. Hence, people speak of this duality.
@Dilaton: As I have no right to reply to your comment and to comment the post, I am writing my disagreement in the answer body. A detector can reveal the effect of extendedness of a photon. For example, if a photon has a very narrow width in energy, i.e., if is is nearly monochromatic, in order to register it "completely" by absorption, the detector should not change its velocity (or state) during detecting. The period of detecting should be rather large for such a photon (a long wave-train, many-many wave-lengths). Otherwise the photon may escape detecting (I mean a resonance or a thershold method).