Sampling Size and Sampling Frequency Continued

Bev D

Heretical Statistician
Leader
Super Moderator
<big sigh>
The process does not have to be normal. This is a myth. Shewhart - who developed ‘SPC’ - was adamant that there is NO DISTRIBUTIONAL REQUIREMENT. I have used SPC successfully for 40 years and never even checked if the processes were Normal.

Please read Wheeler’s articles regarding this.
 

Semoi

Involved In Discussions
Why did I expect that response ;)
No, I do not claim that the data must follow a normal distribution for the SPC to work, but it was Shewhart, who empirically selected 3 sigma as the control limits. He argued that this level is a good balance between false alarms and and resolution and thus he used economical arguments to argue for this value 3. Accordingly, using a transformation to come closer to normality provides you with a good starting point. Just imagine what you would write if somebody dared to use a 2.5-sigma limit.

To emphasise the help of transformation here is a dataset, which clearly needs to be transformed.
Bildschirmfoto 2023-10-22 um 16.54.56.png
Bildschirmfoto 2023-10-22 um 16.55.07.png

These are micro roughness measurements in nm filtered for a specific spatial frequency interval. After the transformation we find that the process is in control, and that it is much better than I ever dreamed of. So, yes, data transformation helps.
 

optomist1

A Sea of Statistics
Super Moderator
Echoing Bev's post above yep, "transforming" or "force fitting data" to "normalize" it is a risky proposition many times leading to bad decisions and steps
 

Bev D

Heretical Statistician
Leader
Super Moderator
Sorry Semoi you are plain wrong. I don’t see that the distribution you show needs to be transformed t all. That is a natural distribution for surface finish and many other processes. Deal with reality. You only have enumerative statistical tools in your tool box so you only use them. Try some analytic tools…

You are coming at this from a pure mathematical statistics standpoint and not an analytic/empirical standpoint which is what Shewhart used. You are a statistical mathematician and cannot see anything except thru that lense.
Your barb regarding the choice of 2.5 sigma is a false argument and makes no sense in the context of your argument. Shewhart did not use the Normal distribution he used Chebyshev’s theorem. Read some of Wheeler’s work. Read the original Shewhart and Deming. I have.
Transformation hides variation and is a simplistic trick to make the real world mold itself to theoretical math. In theory reality matches theory, in reality theory doesn’t resemble reality at all.
 

Semoi

Involved In Discussions
Bev, I honestly have trouble following you. I'm here to learn, but I feel that often I just read statements instead of arguments.

Try some analytic tools
What do you consider an analytic tool? Why is statistics not considered an analytic tool?

Your barb regarding the choice of 2.5 sigma is a false argument and makes no sense in the context of your argument.
My point is simple: I know that you will argue that the 3-sigma level does not define a false alarm rate, but that instead it is an empirical rule. As far as I remember it was initially argued to be an economical balance point. Now, we can take two perspectives:
1. The statement is true and the value 3 close to the optimal economical balance point. As the optimal balance point is defined by weighting the costs of false alarms agains the cost of bad quality products, the initial statement implies that the false alarm rate is (close to) optimal. If we accept this to be true, and if we assume that the transformation brings us closer to normality, the dataset should be transformed to reduce the cost of investigating "common causes".
2. The 3-sigma level is a random selection. If this would be the case, we could easily replace it by 2.5 or 3.5 and Chebyshev's inequality tells us that this will equally work. In fact, one could easily argue that the selected factor should reflect the criticality of the characteristic we try to control weighted by the quality target of the organisation.

I didn't want to write all that, but I believe that you were aware of the argument. I don't see why you say this out of context.

I don’t see that the distribution you show needs to be transformed t all.
I reckon that you will agree that any data analysis should be performed with the goal in mind that we take action -- e.g. to bring the process back into control. We have optimised the process for years now, and we have huge datasets showing that from the displayed process we should not expect a different distribution. Thus, the outliers indicated by the SPC chart are (with a high probability) common causes, and not special causes. Therefore, we should not investigate their root causes. It's a waist of money and working hours. Thus, the SPC chart of the raw data does not reflect this inside and would generate unnecessary actions.
 

Bev D

Heretical Statistician
Leader
Super Moderator
My statements come from many sources. Instead of writing them out I refer people to the posted articles. SPC is quite involved and short writings here cannot so it justice. If you are here to learn then read the articles and then post questions.
 
Top Bottom