RPC: Statistics Freeware: ProbPlot 3.0 Download (Formerly Cumulative Probability Plot)

1. General Software Description for Laymen (click on these title bars to expand and collapse the text)

There are many uses for this software in the world of data interpretation. If you are a manufacturer and need to test the quality of your product, the tests may destroy the product, so it is not possible to test 100% of the products. If you are a dairy, the laboratory tests of the milk for microbes and fat content make the tested milk undrinkable, so if you were to test all of the milk, there would be none left to sell to the public. In this case, you must sample a percentage of the milk and interpret the data from the tests in a way that insures that all of the milk bottled at the time meets the regulator's and your quality standards. If the tests for quality are nondestructive, then it may be too expensive and time consuming to test 100%, so you are in need of statistical analysis of data from sampling of a small portion of the lot.

ProbPlot 3.0 (formerly Cumulative Probability Plot) is a useful tool for these purposes. Not only does it do all of the statistical mathematics for you, it outputs the data in a visual format that can be easily interpreted by people with a limited knowledge of statistics. The graph is also useful for a quick demonstration to regulators or other interested parties that your product or process is within acceptable tolerance or meets regulatory limits.

If you are releasing a nuclear site under MARSSIM, you will still find the ProbPlot graphs a useful addition to your final status survey report to visually demonstrate that your site is clean. This is important for interested parties who do not understand the complexities of MARSSIM and are not statistics experts, such as members of the public who are your neighbors. Many regulators also appreciate how the graphs reinforce the results of your MARSSIM conclusions about meeting the DQO.

One look at a ProbPlot graph can tell the same story as all of the other report sections and paragraphs do in words (1 picture = 1000 words or every picture tells a story). For nuclear final status surveys, it can also help you detect a faulty survey instrument. For instance, if an instrument has a cable that is going bad, it may only malfunction occasionally when moved around during the survey, but may be fine when sitting still during background and source checks which may be performed 3 times a day. One look at the graph would tell you something is wrong with the data. Also if your cleanup effort was not thorough, the shape of the graph will tell you that with one glance.

If most of the survey or sample points are below the release limit and only a few places are above, you will get the expected straight line but the outlier points will stand out on the graph. If a few points are far enough out to affect the shape of the expected straight line on the graph, the software allows you to eliminate those few data points a see how the graph looks without them. This can tell you how just cleaning up those few small hot spots will make the probability that your site is below release limits increase.

The software is also useful for looking at very large datasets. A manufacturer may test their product and look at the data on a weekly or monthly basis, but it may also be useful for them to look at the data from several years or even several decades. If you try to look at statistical mathematical output from 10,000 or more data points, it may be difficult to interpret. ProbPlot allows you to look at over 65,000 data points on one graph, which can tell you a story about your data instantly, with one look at the graph.

You can easily get the data into the software by copying it from a spreadsheet and pasting it into ProbPlot or by importing a comma separated value (.CSV) file. After selecting options for the graph, you may then copy the graph to your clipboard as a picture and paste it into a word processor such as Word or presentation software such as Power Point.

The software was first developed to demonstrate to regulators and the public that nuclear facilities were clean below regulatory limits after decommissioning (now called remediation). Today, there is a guidance document in the US called MARSSIM (Multi-Agency Radiation Survey and Site Investigation Manual) that is the accepted standard for statistical tests to demonstrate that the Data Quality Objective (DQO) has been met for a nuclear facility remediation effort. Back in the late 1970s and early 1980s MARSSIM did not exist nor was there any written regulatory guidance for release of former nuclear facilities. Each decommissioning project had to come up with their own way of statistically proving the facility was clean and have the plan approved by the overseeing agency.

Robert J. Tuttle was the manager of Radiation and Nuclear Safety (aka Health Physics, aka Radiation Protection) and a statistics expert for Atomics International (AI) during that time period. He developed ProbPlot as a tool to help demonstrate that decommissioned AI research nuclear reactors and nuclear fuel fabrication facilities were releasable for future non-nuclear use. When the Windows operating system was released in the early 1990s, Brian Oliver, PH.D. took the Tuttle FORTRAN code for MSDOS and rewrote it with a graphical user interface for MS Windows. That version was 16 bit and when Windows 2000 was released, shortly after the new millennium, the software did not work on all Windows 2000 machines. We took the 16 bit version and rewrote it for 32 bit machines and also added new enhancements such as the ability to resize and rotate the graph before copy and paste or print, the ability to change the graph fonts, expanding the data input to over 65,000 points and adding a help system.

Version 3.0 of ProbPlot was recently tested on 64 bit versions of Windows Vista and Windows 7, using the WOW 32 bit mode, in conjunction with Office 2007. Everything works just as it does on older operating systems with the exception of the Help system. The first time you open the Help file in Vista or Windows 7, a window pops up with a link to Microsoft to download and install a help reader file that you need that does not ship with those operating systems. Once you install this file, the help system works as expected.

Remember, ProbPlot is not just for nuclear site release. It is also useful for looking at any related data such as business metrics, the brightness of stars in far away galaxies, or tolerance measurements of complex parts coming off of an assembly line.

2. Software Description for Those With Statistics Experience and Knowledge

This section is an excerpt from the Help file included with the software. This program is designed to interpret the results of a sampling inspection, for the purpose of judging compliance with chosen limits. It may also be used to identify outlying values or departure from the assumed (Gaussian or Student’s-t) distribution. Uncertainties for the individual values may be entered, and a mean and standard deviation for the set are calculated. The statistical test is based on a selection of the "Consumer’s Risk" (CR) and the "Lot Tolerance Percent Defective" (LTPD), so that the stringency of the test may be adjusted. Typical values are CR=0.1 and LTPD=10%. More stringent tests may be made by choice of smaller values of the CR and LTPD. Confidence limits for the fitted distribution may be plotted for identification of outlying points, values that do not fit in the distribution.

Statistical analysis is used to convert a large amount of data into a manageable amount of understandable information. This process can involve a variety of techniques, the simplest being to determine the average (or mean) value for a given set of data. This simple determination is improved upon by also calculating the standard deviation of the data about the mean, which gives an estimate of the variability of the data. In many cases, this variability represents variations both in the characteristics or values being measured and in measurement technique fluctuations.

The significance of these quantities (mean and standard deviation) depends upon the distribution assumed for the data. Sometimes there is a theoretically known distribution for a particular measurement process, such as the binominal, or Poisson distribution for counting radioactivity. These distributions are relatively well approximated by the Gaussian, or normal, distribution. In fact, the Gaussian distribution approximates the distribution of many different kinds of measurements and for simplicity is generally assumed to be the proper distribution. The Gaussian distribution is generally seen in the form of a bell-shaped curve, with most values occurring near the mean value and fewer and fewer values existing at increasing distances from the mean, both greater and less than the mean.

However, it is difficult to derive the bell-shaped curve from experimental data unless the data are specifically selected to demonstrate the curve, and deviations from the distribution are difficult to see. A better version is the so-called "cumulative probability function" utilized in this program, which forms an S-shaped curve when plotted in the usual manner. This can be further improved by adjusting the abscissa (the "X" values in an X-Y graph) so that the "S" curve becomes a straight line. This is a standard statistical technique and is the basis for special graph paper used for probability analysis of data. The parameters of the Gaussian distribution (the mean and the standard deviation) are determined by the usual calculational methods.

Where the data are not well-represented by a Gaussian distribution (and this is true in most cases) the departure is readily apparent; the data points do not lie along a straight line representing the Gaussian distribution. In most cases, this departure takes a single typical form. Much of the data lies along the theoretical straight line, with a few points at either extreme lying somewhat above it. This form can usually be interpreted as showing a large number of uncontaminated measurements where the variability is due to random fluctuations in the measurements themselves, with the balance being locations that harbor more or less residual contamination.

If the contaminated area is large, there will be many points departing from the curve. In these cases, the points will not fit the theoretical straight line. If most of the region in question is contaminated, the distribution will be dominated by the contaminated data points, in a line of points generally sloping from the lower left to the upper right, fitting more or less closely, a theoretical straight line.

This program is used to provide a sampling inspection test. It uses a standard quality control technique called inspection by variables, in which the distribution of the measured values is used to predict the probability that other unmeasured values would exceed a specified limit. The standard test method requires calculating the mean and the standard deviation-(s). Then, depending the values chosen for certain parameters that reflect the performance of the test in accepting bad lots, or rejecting good lots, the necessary number of samples is determined and a multiplier, k, is computed so that the inequality mean + ks < U where U is the acceptance limit, representing an acceptable lot. The parameters used in the program to calculate the multiplier are the CR and LTPD This value of mean + ks, that is compared to the limit is what is called the Test Statistic, or Ts. The value of Ts is a point near the upper end of the observed data distribution. If this value is less than the acceptance limit U, the lot has passed the Sampling Inspection by Variables Test, according to the criteria chosen for CR and LTPD.

The usual manner of applying this inspection method is to use tables giving the value of the sample size (N) and multiplier (k) for the selected values of CR and LTPD. The program uses the number of measured values (N) in the lot to compute k, and this value is used to calculate mean + ks.