Comparing Two Data Sets

in #bitcoin7 years ago

It is often desirable to be able to compare two sets of reliability or life data in order to determine which of the data sets has a more favorable life distribution. The units from which the data are obtained could either be from two alternate designs, alternate manufacturers or alternate lots or assembly lines. Many methods are available in statistical literature for doing this when the units come from a complete sample, i.e. a sample with no censoring. This process becomes a little more difficult when dealing with data sets that have censoring or when trying to compare two data sets that have different distributions. In general, the problem boils down to that of being able to determine any statistically significant difference between the two samples of potentially censored data from two possibly different populations. This article discusses some of the methods that are applicable to censored data using Weibull++.

Simple Plotting
One popular graphical method for making this determination involves plotting the data with confidence bounds and seeing whether the bounds overlap or separate at the point of interest. This can be effective for comparisons at a given point in time or a given reliability level, but it is difficult to assess the overall behavior of the two distributions, as the confidence bounds may overlap at some points and be far apart at others. This can be done easily using the overlay plot feature in Weibull++.

Using Contour Plots
The contour plots in Weibull++ allow you to determine if two data sets are significantly different at a specific confidence level. By overlaying two contour plots from two different data sets (analyzed using the same distribution) at the same confidence level, you can visually assess if the data sets are significantly different at that confidence level if there is no overlap on the contours. The disadvantage of this method is that the same distribution must be fitted to both data sets.

Example
The following data represent the times-to-failure for a product. Certain modifications were performed to this product in order to improve its reliability. Reliability engineers are trying to determine whether the improvements were significant in improving the reliability.

Times-to-failure data

Figure 1: Times-to-failure data

At what significance level can the engineers claim that the two designs are different?

The two-parameter Weibull distribution best fits both data sets. The contour plots were generated and compared using an overlay plot.

Contour plot

From this plot it can be seen that there is an overlap at the 95% confidence level and that there is no overlap at the 90% confidence level. It can then be concluded that the new design is better at the 90% confidence level.

Estimating P[t2t1] Using the Life Comparison Tool
Another methodology, suggested by Gerald G. Brown and Herbert C. Rutemiller, is to estimate the probability that the time-to-failure of a unit selected from one population is better or worse than the time-to-failure of a unit selected from a second population. The equation used to estimate this probability is given by:
image

where f1(t) is the pdf of the first distribution and R2(t) is the reliability function of the second distribution. The evaluation of the superior data set is based on whether this probability is smaller or greater than 0.5. If the probability is equal to 0.5, that is equivalent to saying the two distributions are identical.
image

If given two alternate designs with life test data, where x and y represent the life test data from two different populations, choose the component at time t with the higher reliability. One possible option would be to simply select the component with the higher reliability at time t. However, if you wanted to design a product as long-lived as possible, you would want to calculate the probability that the entire distribution of one product is better than the other and choose x or y when this probability is above or below 0.50, respectively.
image
The statement that the probability that x is greater or equal to y can be interpreted as follows:

If P = 0.50, then the statement is equivalent to saying that both x and y are equal.
If P < 0.50 or, for example, P = 0.10, then the statement is equivalent to saying that P = 1 - 0.10 = 0.90, or y is better than x with a 90% probability.

Weibull++'s Life Comparison tool allows you to perform such calculations. Ideally the same sample size should be used for both products under comparison. If the sample sizes are different, you can use the confidence bounds of the comparison result to consider the effect of the sample size.
image
Life Comparison tool

Sort:  

Hi! I am a robot. I just upvoted you! I found similar content that readers might be interested in:
http://www.weibull.com/hotwire/issue23/relbasics23.htm

Congratulations @hambali06! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of posts published
You published 4 posts in one day

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

Upvote this notification to help all Steemit users. Learn why here!