Data trends

qcman

Registered Visitor
I've seen people say a trend can be produced with 2 data points and others say 6-7 are needed. What's the groups thoughts?
 

Miner

Forum Moderator
Leader
Admin
It is based on probability. You have a 50/50 chance that with two data points the second point is greater (or lesser) than the first. That is insufficient to declare a trend. With 6 data points. the probability that all are increasing (or decreasing) is 0.5^6, or 1.6%, which sufficient. If you are willing to accept a higher risk of a type 1 error, you could reduce the number (e.g., 0.5^5 = 3.1%, 0.5^4 = 6.3%), but the fewer consecutive points, the greater the risk of being wrong.
 

Bev D

Heretical Statistician
Leader
Super Moderator
I have heard and used “Two data points make a trend” as a (very sarcastic) response to someone who can’t interpret data…Actually what I tend to say is “two data points make a straight line and 3 data points make a plane”
 

Steve Prevette

Deming Disciple
Leader
Super Moderator
Some of the answer to the original question depends upon your definition of "trend". I take the definition to be - a pattern of data that represents a changing condition. Thus, a stable set of data from a stable process has "no trends". The original question might also consider a stable pattern of data to be a trend in itself. The important focus for management should be - can I predict future performance?

In the context of Dr. Wheeler's book, and Statistical Process Control, I came to discover in a 25 year career of making hundreds to thousands of control charts of metrics a month (spanning many facilties) that:

1. If you are asking can I detect "a trend" on one or two data points - the answer is yes, IF you have sufficient historical data. A single point outsid of the three standard deviation control limits is a trend. I figure though, that is not the context of the original question, I only have two data points to judge a trend from then the reasonable answer is "no".

2. Dr. Shewhart (who developed Statistical Process Control and peformed many simulations with random numbers, normal and non-normal to develop it) said do not declare a process to be stable (or lack of trends) unless you have at least 25 data points.

3. I did find that when I told people on the job that they need 25 data points, I got the response that - how can I possibly wait that long? BUT in many cases I found these same people were sitting on years of data that they had never analyzed or in many cases, even attempted to retrieve. So I retrieved the 25 data points (minimum) and set up the new charts.

4. If I was faced with a new data stream (or a new set of results following a significant change to an existing process) I found that if I had a pattern of three changes of direction (like the letter "M" or "W") I could get a set of average and control limits that would predict the future 90% of the time. 10% of the time, a "trend" signal appeared before getting a total of 25 new points, which pointed to the original baseline average and control limits from the M or W pattern was too small a set of data. To form a M or W pattern, you need at least 5 points to get the three changes of direction. If one was so inclined, it would be easy to validate or refute this claim with simulations. All I will say is the MW pattern worked for my work and gave a reasonable "rule of thumb" for trending.
 

Bev D

Heretical Statistician
Leader
Super Moderator
One issue that I routinely encountered was that if a single data point was outside the control limits (even far outside the limits) the managerial response was often “let’s wait and see if it comes back down next week - or next month” (ignore it and it will go away because it’s just a fluke). But if there was a very slight ‘increase or decrease’ of the most recent data point in relationship to the immediately preceding data point the managerial response was to claim that the process had ticked up - or down. They would claim victory, or they would use it as evidence that some other group had failed. This would happen even if the current data point was clearly within the overall variation of the process.

I once had an influential person tell me that another person was an expert at interpreting data (he couldn’t spell data let alone statistics). What he could do was see the bunny rabbit or duck in any data cloud he looked at and he could bully the room into believing his interpretation because that was what they wanted to believe…

While this might appear to be an unsupportable diametrically opposed misinterpration I came to realize that it was really a monolithic response to the ‘data’: anything that they could bend to support their agenda or belief system is what they would grab onto. I was often vilified for pointing out the reality of the situation…This delusionary response is very common in the public sphere as well; whether it’s business or weather or the economy or whatever, the general understanding of data is woefully inadequate, divisive and at times dangerous. This is why I believe that we must bring this to our high schools so that more people can see when others are bending data to con the public. We need a data literate public.
 

Tidge

Trusted Information Resource
I've seen people say a trend can be produced with 2 data points and others say 6-7 are needed. What's the groups thoughts?

From my (Bayesian-influenced) perspective: The first data point is mathematically necessary to prove that a hypothesis is realistic, and the second data point simply provides a means of estimating the scale of the hypothesis. Think: the first observation prove blacks swans exists; the second observation helps set the scale.

If these data points exist relative to some already-studied hypothesis (e.g. these are data points relative to a validated manufacturing process) see the post by @Steve Prevette above.
 
Top Bottom