Yes, one never knows what the data distribution is and one does not need to. Thicker tails however are not always the case, for example time based distributions (time to answer phone in a call center for example). This has one fat tail and the other is truncated. Shewhart charts still work well, none the less.
Wheeler's "Normality and the Process Behaviour Chart" is a great read. See p88 and his comments on Chebychev vs actual distributions.
Actually, this is a sad urban legend,
you really do need to know the distribution. Yes, Shewhart's charts apply to
many distributions (see Shewhart's and Wheeler's evaluation of an array of distributions) -
but not all. Chebychev has limitations, if one was to look at its full definition.
Fact is, these charts, and Chebychev's theorem only apply to distributions that arise from processes (not sampling) that generate
random, independent variation - by definition. If that does not apply - such as in tool wear - then the basic assumptions fail, and you need to move onto other approaches. So, yes,
you need to know the distribution.
To claim you will never know the distribution is only partially true. You can perform some curve fitting to get into the ball park and generate a statistically supported model. Yes, according to the total variance equation, they will always be multimodal, but one's goal is to control many of the variations to a statistically insignificant level. But, in the example of tool wear, once you reduce the various causes of variance to insignificant level (as in precision machining), you know
exactly what the distribution is.