There are many uses for this software in the world of data interpretation. If you are a manufacturer and need to test the quality of your product,
the tests may destroy the product, so it is not possible to test 100% of the products. If you are a dairy, the laboratory tests of the milk for microbes and fat content
make the tested milk undrinkable, so if you were to test all of the milk, there would be none left to sell to the public. In this case, you must sample a percentage of the milk and
interpret the data from the tests in a way that insures that all of the milk bottled at the time meets the regulator's and your quality standards. If the tests for quality
are nondestructive, then it may be too expensive and time consuming to test 100%, so you are in need of statistical analysis of data from sampling of a small portion of the lot.
ProbPlot 3.0 (formerly Cumulative Probability Plot) is a useful tool for these purposes. Not only does it do all of the statistical mathematics for you, it outputs the data in a visual format that can be
easily interpreted by people with a limited knowledge of statistics. The graph is also useful for a quick demonstration to regulators or other interested parties that
your product or process is within acceptable tolerance or meets regulatory limits.
If you are releasing a nuclear site under MARSSIM, you will still find the ProbPlot graphs a useful addition to your final status survey report to visually demonstrate that your site is clean. This is important
for interested parties who do not understand the complexities of MARSSIM and are not statistics experts, such as members of the public who are your neighbors. Many regulators also appreciate how the graphs reinforce the results
of your MARSSIM conclusions about meeting the DQO.
One look at a ProbPlot graph can tell the same story as all of the other report sections and paragraphs do in words (1 picture = 1000 words or every picture tells a story). For nuclear final status surveys, it can also help you detect a faulty survey instrument. For instance, if
an instrument has a cable that is going bad, it may only malfunction occasionally when moved around during the survey, but may be fine when sitting still during background and source checks which may be performed 3 times a day. One look at the graph would tell you something is wrong with the data. Also
if your cleanup effort was not thorough, the shape of the graph will tell you that with one glance.
If most of the survey or sample points are below the release limit and only a few places are above, you will get the expected straight line but the outlier points will stand out on the graph. If
a few points are far enough out to affect the shape of the expected straight line on the graph, the software allows you to eliminate those few data points a see how the graph looks without them. This can tell you how just cleaning up those few small hot spots will make the probability that your site is
below release limits increase.
The software is also useful for looking at very large datasets. A manufacturer may test their product and look at the data on a weekly or monthly basis, but it may also be useful for them to
look at the data from several years or even several decades. If you try to look at statistical mathematical output from 10,000 or more data points, it may be difficult to interpret. ProbPlot
allows you to look at over 65,000 data points on one graph, which can tell you a story about your data instantly, with one look at the graph.
You can easily get the data into the software by copying it from a spreadsheet and pasting it into ProbPlot or by importing a comma separated value (.CSV) file. After selecting options for the graph,
you may then copy the graph to your clipboard as a picture and paste it into a word processor such as Word or presentation software such as Power Point.
The software was first developed to demonstrate to regulators and the public that nuclear facilities were clean below regulatory limits after decommissioning (now called remediation).
Today, there is a guidance document in the US called MARSSIM (Multi-Agency Radiation Survey and Site Investigation Manual) that is the accepted standard for statistical tests to demonstrate
that the Data Quality Objective (DQO) has been met for a nuclear facility remediation effort. Back in the late 1970s and early 1980s MARSSIM did not exist nor was there any written regulatory guidance for release of former nuclear
facilities. Each decommissioning project had to come up with their own way of statistically proving the facility was clean and have the plan approved by the overseeing agency.
Robert J. Tuttle was the manager of Radiation and Nuclear Safety (aka Health Physics, aka Radiation Protection) and a statistics expert for Atomics International (AI) during that time period. He developed ProbPlot as a tool to help demonstrate that
decommissioned AI research nuclear reactors and nuclear fuel fabrication facilities were releasable for future non-nuclear use. When the Windows operating system was released in the early 1990s, Brian Oliver, PH.D.
took the Tuttle FORTRAN code for MSDOS and rewrote it with a graphical user interface for MS Windows. That version was 16 bit and when Windows 2000 was released, shortly after the new millennium, the software did not work on all Windows 2000 machines. We
took the 16 bit version and rewrote it for 32 bit machines and also added new enhancements such as the ability to resize and rotate the graph before copy and paste or print, the ability to change the graph fonts, expanding the data input to over 65,000 points and adding a help system.
Version 3.0 of ProbPlot was recently tested on 64 bit versions of Windows Vista and Windows 7, using the WOW 32 bit mode, in conjunction with Office 2007. Everything works just as it does on older operating systems with the exception of the Help system. The
first time you open the Help file in Vista or Windows 7, a window pops up with a link to Microsoft to download and install a help reader file that you need that does not ship with those operating systems. Once you install this file, the help system works
as expected.
Remember, ProbPlot is not just for nuclear site release. It is also useful for looking at any related data such as business metrics, the brightness of stars in far away galaxies, or tolerance measurements of complex parts coming off of an assembly line.
This section is an excerpt from the Help file included with the software. This program is designed
to interpret the results of a sampling inspection, for the purpose of judging compliance
with chosen limits. It may also be
used to identify outlying values or departure from the assumed (Gaussian or Student’s-t)
distribution. Uncertainties for the
individual values may be entered, and a mean and standard deviation for the set
are calculated. The statistical test
is based on a selection of the "Consumer’s Risk" (CR) and the "Lot Tolerance Percent
Defective" (LTPD), so that the stringency of the test may be adjusted. Typical values are CR=0.1 and LTPD=10%. More stringent tests may be made by choice of smaller values of the CR and
LTPD. Confidence limits for the fitted
distribution may be plotted for identification of outlying points, values that do
not fit in the distribution.
Statistical analysis is used to convert a large amount of data into a manageable
amount of understandable information.
This process can involve a variety of techniques, the simplest being to determine
the average (or mean) value for a given
set of data.
This simple determination
is improved upon by also calculating the standard deviation of the data about the
mean, which gives an estimate of the variability of the data.
In many cases, this variability represents variations both in the characteristics
or values being measured and in measurement technique fluctuations.
The significance of
these quantities (mean and standard deviation) depends upon the distribution assumed
for the data. Sometimes there is a
theoretically known distribution for a particular measurement process, such as the
binominal, or Poisson distribution for counting radioactivity. These distributions are relatively well approximated by the Gaussian, or
normal, distribution. In fact, the
Gaussian distribution approximates the distribution of many different kinds of measurements
and for simplicity is generally assumed to be the proper distribution. The Gaussian distribution is generally seen in the form of a bell-shaped
curve, with most values occurring near the mean value and fewer and fewer values
existing at increasing distances from the mean, both greater and less than the mean.
However, it is difficult
to derive the bell-shaped curve from experimental data unless the data are specifically
selected to demonstrate the curve, and deviations from the distribution are difficult
to see. A better version is the so-called
"cumulative probability function" utilized in this program, which forms an S-shaped
curve when plotted in the usual manner.
This can be further improved by adjusting the abscissa (the "X" values in an X-Y
graph) so that the "S" curve becomes a straight line.
This is a standard statistical technique and is the basis for special graph
paper used for probability analysis of data.
The parameters of the Gaussian distribution (the mean and the standard deviation)
are determined by the usual calculational methods.
Where the data are
not well-represented by a Gaussian distribution (and this is true in most cases)
the departure is readily apparent; the data points do not lie along a straight line
representing the Gaussian distribution.
In most cases, this departure takes a single typical form. Much of the data lies along the theoretical straight line, with a few points
at either extreme lying somewhat above it.
This form can usually be interpreted as showing a large number of uncontaminated
measurements where the variability is due to random fluctuations in the measurements
themselves, with the balance being locations that harbor more or less residual contamination.
If the contaminated area is large, there will be many points departing from the
curve. In these cases, the points will not fit the theoretical straight line.
If most of the region in question is contaminated, the distribution will be dominated
by the contaminated data points, in a line of points generally sloping from the
lower left to the upper right, fitting more or less closely, a theoretical straight
line.
This program is used
to provide a sampling inspection test.
It uses a standard quality control technique called inspection by variables, in
which the distribution of the measured values is used to predict the probability
that other unmeasured values would exceed a specified limit. The standard test method requires calculating the mean and the standard deviation-(s). Then, depending the values chosen for
certain parameters that reflect the performance of the test in accepting bad lots,
or rejecting good lots, the necessary number of samples is determined and a multiplier,
k, is computed so that the inequality mean + ks < U where U is the acceptance limit, representing
an acceptable lot. The parameters used in the program to calculate
the multiplier
are the CR and LTPD This value of mean
+ ks, that is compared to the limit is what is called the Test Statistic, or Ts. The value of Ts is a point near the
upper end of the observed data distribution.
If this value is less than the acceptance limit U, the lot has passed the Sampling
Inspection by Variables Test, according to the criteria chosen for CR and LTPD.
The usual manner of
applying this inspection method is to use tables giving the value of the sample
size (N) and multiplier (k) for the selected values of CR and LTPD. The program uses the number of measured values (N) in the lot to compute
k, and this value is used to calculate mean + ks.