Data normality versus capability

Dan Watson

Involved In Discussions
I feel rather stupid for asking this question, but it is a reality check for me. I work with a QA Manager who has stated that the normality of the data does not matter when calculating capability indices for different product parameters. His numbers are pretty much always above 1.33. I thought, since I have been taught since 1987, that if the data are non-normally distributed, that the descriptive statistics for normally distributed data cannot be used. The root cause of why the data are non-normally distributed needs to be investigated, i.e., what are the causes of the special variation influencing the data. Am I wrong? Would process capability indices be an alternative measure until the data distribution is more normal? This individual will rework PPAP numbers that the technicians give to him to that the numbers "look appropriate." Sometimes the old saying is true, 'torture the numbers enough and they'll confess to anything."
 

John C. Abnet

Teacher, sensei, kennari
Leader
Super Moderator
i.e., what are the causes of the special variation influencing the data. Am I wrong?

You are indeed not wrong. Until special causes of variation are removed, the process is unstable and process improvements can not properly be applied and the capability data results, therefore, are not valuable to you.

Hope this helps.
Be well.
 

Miner

Forum Moderator
Leader
Admin
I feel rather stupid for asking this question, but it is a reality check for me. I work with a QA Manager who has stated that the normality of the data does not matter when calculating capability indices for different product parameters. His numbers are pretty much always above 1.33. I thought, since I have been taught since 1987, that if the data are non-normally distributed, that the descriptive statistics for normally distributed data cannot be used. The root cause of why the data are non-normally distributed needs to be investigated, i.e., what are the causes of the special variation influencing the data. Am I wrong? Would process capability indices be an alternative measure until the data distribution is more normal? This individual will rework PPAP numbers that the technicians give to him to that the numbers "look appropriate." Sometimes the old saying is true, 'torture the numbers enough and they'll confess to anything."

It depends on why the data are not normal. There are a number of reasons and each requires a specific response:
  • Not-normal due to special causes. The process is not stable and therefore is not predictable. Capability indices only hold true for that specific set of data and will not apply to future data sets. The special causes must be identified and eliminated to stabilize the process and make the indices meaningful.
  • Not-normal due to mixtures. The process may be stable, but is a mixture of different process streams that result in a multi-modal or uniform distribution. The process streams must be separated and the capability of each determined.
  • Not normal due to tool wear. The process is stable, but trends due to tool wear resulting in a quasi-uniform distribution. Use alternate methods for non normal capability such as CNp/CNpk or PPM.
  • Not normal due to the process itself. Certain processes are inherently non normal and will always be so in the absence of special causes. Use alternate methods for non normal capability such as CNp/CNpk or PPM.

Process capability/performance indices formula are for normal data. If your data is not normal, data needs to be transformed and if the transformed data is normal, it can be used for process capability/performance calculation.

Transforming data into normality is ONE way of dealing with the fourth scenario above (never for the first three scenarios), but is not the only approach. It is often overused in inappropriate situations (first three scenarios) by people that do not understand what they are doing.

I personally do not like to transform data because you lose a lot of information in the data. You also transform the specifications and it becomes very difficult to explain to non statisticians. I prefer to perform a non normal capability analysis. It is relatively easy for others to interpret and retains all of the information in the data.
 

Matt Savage

Trusted Information Resource
... His numbers are pretty much always above 1.33 ...
Does this mean his Ppk number is above 1.33?
Ignoring the question about normal or not, transpose the data or not, the 'customer' is looking for consistency (in statistical control) and on the desired Target.
The control charts show that all three batches are not stable, with Batch #2 being the least predictable. (See attached.) If the processes are not in statistical control, by what means is acceptability determined?
 

Attachments

  • Data normality versus capability
    Batch Data.png
    91 KB · Views: 51

Dan Watson

Involved In Discussions
Does this mean his Ppk number is above 1.33?
Ignoring the question about normal or not, transpose the data or not, the 'customer' is looking for consistency (in statistical control) and on the desired Target.
The control charts show that all three batches are not stable, with Batch #2 being the least predictable. (See attached.) If the processes are not in statistical control, by what means is acceptability determined?

Hi Matt. His "1.33" is the Cpk index. He does not control chart but uses a histogram and formats it so that it will appear normal. The"acceptability" is that the data are within the specification limits, not being in statistical control with reviewing sources of special variation. It is purely a numbers game for the customer. And, yes, we have had complaints of consistency from are large customers.
 

Matt Savage

Trusted Information Resource
Hi Matt. His "1.33" is the Cpk index. He does not control chart but uses a histogram and formats it so that it will appear normal. The"acceptability" is that the data are within the specification limits, not being in statistical control with reviewing sources of special variation. It is purely a numbers game for the customer. And, yes, we have had complaints of consistency from are large customers.
Hi Dan, If a control chart is not used or used but shows out-of-control conditions, I would not put much emphasis on Cpk. (One of the pre-requisites for Cpk to be valid is that the data be in statistical control.) I prefer to evaluate Ppk instead of Cpk in situations like this since it looks at the within subgroup and between subgroup variation. Some will argue that any statistic is bogus when the chart shows out-of-control conditions.
 

Steve Prevette

Deming Disciple
Leader
Super Moderator
It is important to note that there are some processes / data sources that are NOT normal and there is nothing wrong with that. If I am measuring the strength of a steel beam, it can't have a strength below zero. So it is NOT normal. Now, sometimes normal is good enough. But it may come into play if zero is within a few standard deviaitions of the sample average.

As a reminder - Statistical Process Control does not require normality. So I may have a non-normal situation (log normal as above, Poisson counting events) but it is stable and "in control".

However, we tend to overrely on ppk and cpk, in my opinion. I'll agree with Matt Savage above - understand what is happening on a control chart.
 

Semoi

Involved In Discussions
Reading Minors answer above I hope it is obvious that the
QA Manager who has stated that the normality of the data does not matter when calculating capability indices
is wrong. However, giving people the credit they you might have misunderstood his/her argument and thus misquoted it here, you should probably talk to the QA manager and clarify.
Although Minors argument should be pretty self-explanatory, you could also look into ISO 22514-4:2016. In section 4.4.3 it contains a method for non-normal data: The key idea is to use quantiles instead of the standard deviation. In statistics we call this a non-parametric method. It is commonly accepted that non-parametric methods are more robust (against outliers and other assumption violations) compared to parametric methods, but they have a slightly lower (Pitman) efficiency. Not going into statistical details, I believe it is obvious that normality matters, if an ISO norm contains a section for capability index calculation for non-normal data.
 

Matt Savage

Trusted Information Resource
In today's Quality Digest, Wheeler's article may shed some additional light on this topic. Analyzing Observational Data

In the article, Wheeler states: "... You do not need to fit a probability model to your data. Neither should you place your data on a normal probability plot. And you certainly do not need to transform your data to make them “more normal.” All of these prequalification activities assume the process is already being operated predictably. Assignable causes completely undermine this assumption, making these activities nonsense.

So regardless of what your histogram may look like, put your data on a suitable process behavior chart and characterize your process behavior as predictable or unpredictable. Use the data to make predictions for your predictable processes, and use the chart to look for the assignable causes that are taking your unpredictable processes on walkabout."
 
Top Bottom