Statistical Treatment Of Data
Two important, though often neglected, parts of an analysis are error analysis and correct results reporting. Results should always be reported along with some estimation of the errors involved. The best way to do this is to report the most likely value along with a confidence interval. The confidence interval gives the range of values thought to contain the "true" value. The statistical treatment of data involves basing the error estimation on firm theoretical principles. This laboratory exercise on treatment of data should help you understand and apply these principles.
Classification of errors
Two basic classes of error occur in measurements; systematic and random. Systematic
errors have a determinate origin. Determinate means that there is cause for the error,
and that the error itself can be determined by performing an auxiliary measurement. Though
possible, it is more often the case that determinate errors are not recognized until it is
too late to make the auxiliary measurement. Systematic errors produce measurements that
are either consistently high or low, relative to the "true" value. These errors
are always in the same direction and are often of the same magnitude. An example of this
is a contaminant on a balance pan that always causes measured weights to be too high. Random
errors are indeterminate in origin and cause a measured quantity to fluctuate around a
central value. Indeterminate means that one is not able to determine the error by an
auxiliary measurement, just as one cannot determine the outcome of flipping a (unbiased)
coin prior to the actual event. Random errors vary in direction and magnitude from
measurement to measurement.
The size of the error is often independent of measurement magnitude. These types of errors are called constant errors. With constant errors, there is no relationship between, or correlation, measurement and error magnitude. The constant error becomes less significant as the magnitude of the measurement increases. On the other hand, the error may increase with the magnitude of the measurement. In this case, the errors are called proportional errors. In this case increasing the sample size does not diminish the significance of the error.
Working with systematic errors
In general, a result based on the addition (or subtraction) of a number of values will
have a systematic error that is the sum (or difference) of all the systematic errors of
the individual measurements. For example, if a value Z is to be determined from the
sum X+Y, and if X and Y have errors E_{X} and E_{Y},
respectively, then Z is
The error in Z, E_{Z}, found by subtracting Z=X+Y from the above, is
Note that in subtraction where
the error is E_{Z}= E_{X}E_{Y}. If errors in X and Y are constant, E_{X}=E_{Y}, and the error in Z is zero. This is the reason why many values are determined using the measurement by difference technique. For example, the volume delivered by a buret is determined by the difference in readings before and after fluid delivery, and the mass of a sample placed into a beaker is determined as the difference in beaker weight measurements before and after the sample has been added. The error canceling effect of the measurement by difference technique strictly applies to constant determinate errors. It can also apply when proportional errors occur and the magnitude of all measurements are about the same. Another technique, used to reduce the significance of constant errors, is to use sample sizes that are large compared to the error. In a word, the key to reduction of determinate errors is good laboratory technique.
Data reduction with random errors
Although good laboratory technique may reduce systematic error, random errors cannot be
reduced by operator technique. However, the significance of random error is reduced with
repeated measurements and data reduction. Although measurement precision does not change,
the confidence we have in reporting the "true" value is enhanced with each
replicate measurement. In other words, one has more confidence in the reported value after
repeating the measurement several times. If one could repeat a measurement an infinite
number of times, one could, in theory, report the "true" value of the
measurement. This is because measurement errors tend to cancel due to their random nature.
In general, the significance of random error decreases in rough proportion to the inverse
squareroot of the number of measurements.
Central tendency
When reporting central tendency for a series of measurements, we often use the mean
or average value. The symbol for the mean is the variable with a line over the top.
The formula for the mean, for example of x, is
where the x_{i} are the individual measurements and N the total number of replicate measurements of a value. The capital sigma indicates that we sum all of the x values. The range of values to be summed is from 1 to N. If, for example, we measured the weight of a sample 5 times, and the individual weight values were 25.234, 25.132, 24.976, 25.030, and 24.983, then the average would be
The mean of replicate measurements is a good indication of the central tendency of the measurement value.
Reporting errors
It is also important to indicate a measure of the errors associated with particular set of
measurements. For example, when presented with a choice, measurements with higher error
may be discounted in favor of more precise measurements. Unless the errors are reported,
there is no basis for this qualitative decision making. Ways to report the errors
associated with a set of measurements are illustrated below.
Range
The range of a data set is the absolute maximum difference observed in the data. It
is calculated as the difference between the maximum, x_{max}, and minimum, x_{min},
values
The vertical bars indicate absolute value, i.e., positive values. For the weight measurement data set, the maximum and minimum values are x_{max}=25.234 g and x_{min}=24.983 g. The range is thus R = 25.23424.983 = 0.251 g. The range is useful for qualitative evaluation of errors.
Maximum uncertainty
A maximum uncertainty may be used report "worstcase" errors. The maximum
uncertainty is the difference between a value calculated using the measurement data, and
that calculated using the measured data with their associated maximum errors. Maximum
errors are often estimated from the quoted instrument precision, e.g., pipets,
burets, volumetric flasks, scales, etc., often print an indication of error on the
device. The maximum relative uncertainty is the maximum uncertainty, divided by the
calculated value. For example, E_{Z}/Z is the maximum relative uncertainty
if E_{Z}= E_{X}+E_{Y} and E_{X}
and E_{Y} are the maximum uncertainties in X and Y
measurements used to calculate either Z=XY or Z=X+Y.
Measurement variance
The formula for measurement variance, s^{2}, is
The symbols are the same as those used in the mean calculation formula. The units of variance are the squared measurement units. If the measurements are in grams (g), the variance units are gramssquared (g^{2}). The variance of the data used to calculate the mean value is
which when evaluated yields s^{2}=0.012185 g^{2}.
Measurement standard deviation
The measurement standard deviation is more often used to indicate precision or
probable error. The greater the standard deviation, the less precise the data. The
measurement standard deviation is simply related to the measurement variance through
Units of the measurement standard deviation are the same as those of the measurements that it is based on. For the weight measurements used above, the standard deviation is s=0.11038 g.
A word of caution; If you use predefined functions to perform statistical data analysis on a calculator, be sure that variance formula uses the correct degreesoffreedom. Some calculators will divide by N and other will divide by N1 when calculating variance. "Scientific calculators" often calculate both. In this case the s is the measurement standard deviation and the is the standard deviation calculated with N degreesoffreedom. You should know what your calculator does. Confirm that you get the results found in this narrative using the example data if you are not sure how your calculator performs the calculations.
Relative standard deviation
Reporting a measurement standard deviation alone does not indicate the significance of the
probable errors. A more informative way to indicate probable error is to report relative
standard deviation, RSD for short. The RSD is the ratio of the
measurement standard deviation to the mean of the quantity being measured
Notice that for constant random errors, the RSD decreases with measurement size. For example, say that the standard deviation for measuring a delivered volume from 10 or 40 mL from the same 50 mL buret is 0.10 mL. The relative standard deviation is 0.10 mL/10 mL=0.010 and 0.10 mL/40mL=0.0025, respectively.
RSD are unitless quantities. They are often reported in %, partsperthousand (ppth), partspermillion (ppm), etc., errors. Multiply the RSD by 100 to get the % error; by 10^{3} to get the error in partsperthousand (ppth); by 10^{6} to get the error in partspermillion (ppm).
For the weight example with a mean of 25.071 g and a standard deviation of 0.11038 g, the . RSD is 0.11038 g/25.071 g =0.0044026 The % error is 0.44026% and error in partsperthousand is 4.4026 ppth.
Degreesoffreedom
The sum over the squareddifferences in the variance is divided by N1. The factor
"N1" called the degreesoffreedom of the calculation. In this
case, the degreesoffreedom is one less than the total number of measurements because the
variance calculation is based on all of the x_{i} replicate measurements and
the mean. Since the mean is also based on all the replicate measurements, the summation in
the measurement variance formula is effectively "double counting" one of the
values. In fact, all N replicate measurements could be determined with knowledge of
the mean, and only N1 of the x values. Dividing by the N1
degreesoffreedom in the variance formula is a way to account for this double counting.
Keep in mind that the degreesoffreedom is one less than the number of measurements when doing calculations. You will need this information to get the correct Student'st value for calculating the in confidence intervals, as discussed below. Degreesoffreedom are not always one less than the number of measurements. The degreesoffreedom depend on how many parameters are calculating from the data. The more parameters calculated, the more degrees of freedom we use up. In the case above, only one parameter was calculated from the data.
Probability distributions
One can do a better job of reporting errors by making some assumptions regarding the
distribution of possible errors. The error probability density function is a
theoretical formula used to calculate the probability that a particular measurement will
be obtained. The most common probability density function to assume for random errors the Gaussian
distribution
The Gaussian distribution expressed as a probability, P, which is a function of the measurement x, the "true" mean, , and the "true" standard deviation, ("true" values are given using Greek). The formula gives the probability that an individual measurement will differ from the "true" mean by a given amount, as a result of random error. The random error are indicated by the parameter.
The Gaussian distribution curve gives the relative probability of obtaining a particular measurements of x. The unitless xaxis is plotted in relation to the "true" mean and is relative to the standard deviation. The curve shows that it is more likely to get a small error (small xµ) than a large one (large xµ). The most likely value is the mean itself, (x=µ).
The Gaussian distribution has several interesting properties. First, the total area under the curve is equal to one. Distribution functions that possess this property are called normalized. This property allows one to determine the probability (out of one) of obtaining a measurement within a certain range. For example, the area between 0 and 1 standard deviation is 0.3413 Thus the probability (or chance) of obtaining a measurement between x=µ and (x+µ)/o is 34.13%. Second, the curve is symmetrical with respect to the xaxis. The probability for a negative offset is equal to that for a positive offset. This property is sometimes referred to as that of an even function. The probability of obtaining a measurement between 1 and 1 standard deviation, or (x+µ)/o, is twice that of obtaining one between 0 and 1. Since the later is 34.13%, the former is 68.26%.
Third, the distribution may only be perfectly known from measurement data in the limit of an infinite number of measurements, or from inference from well understood experiments (like flipping coins, drawing cards, or counting molecules). Since this is rarely the case, the Gaussian distribution is most often used as a model of the ideal measurement situation. Because one cannot make an infinite number of measurements, the error formulas used are approximations to the "true" error distribution. The measurement average or mean is an estimate of the "true" mean
The measurement standard deviation is an estimate of the "true" standard deviation
Much of statistics is concerned with how good these approximations are. One thing is certain, the measurement mean and standard deviation are equal to the "true" values only in the limit as N approaches infinity.
Estimation with Gaussian distribution
With knowledge of the "true" mean and standard deviation of a set of
measurements with random errors, the range over which a certain fraction of the
measurements occur can be specified. For instance, the range of x values over which
95% of the measurements occur can be found from the area under the Gaussian distribution.
The area under the Gaussian distribution curve is found by integration.
Note that the integration limits are between µ+1.96o. Since the area for this integration is 0.95, and the total area under the curve is 1, there is a 95% probability that the actual measurement value will differ from the "true" mean by + 1.96o due to random error only.
Clearly, the greater the "true" standard deviation, the greater the range over which 95% of the measurements will be. For example, if the "true" mean is 25.000 g, and the "true" standard deviation is 0.5102 g, then 95% of the measurements should be between 25+1 g. Put another way, one is 95% confident that a single measurement will be within the 25+1 g range. For this case, +1 g is the confidence interval, or range, at a 95% confidence level. If, on the other hand, the "true" standard deviation was 0.05102 g, then the one would state that they were 95% confident of a single measurement would be in the range or 25.0+0.1 g. The smaller standard deviation results in a smaller confidence interval.
Probable measurement values are conveniently indicated by
It is interesting to write this result in a different fashion
This formula indicates that one is 95% confident that the "true" mean is within +1.96 of any given measurement. There is also a 5% chance that the measurement will be outside the 95% confidence range. Put a different way, if the measured x value differs by more than this, then there is a 95% chance that something other than random error has corrupted the measurement. It would then be time to consider sources of systematic error.
Confidence intervals
In the realworld where N is less than infinity, errors associated with estimates
of the "true" mean and standard deviation result in more uncertainty in the
confidence interval than is indicated by the Gaussian distribution. In general, the fewer
the measurements, the less the confidence level that can be assigned to a particular
interval. Similarly, fewer measurements also means larger confidence intervals for a given
confidence level.
These concept are quantitatively expressed in the "Student'st" statistic. The Student'st number is an integral over a distribution function similar to the Gaussian. It indicates probabilities that the mean found from a finite number of measurements will differ from the "true" mean by a given amount. A useful form of Student'st formula is
where t is the Student'st number. One looks up t in a table for a given confidence and number of degreesoffreedom.
Ideally, one would really like to report the "true" mean. But, due to random errors, it is not possible to specify the "true" mean as a single number. Instead, one uses the Student'st formula in the form given above to specify the "true" mean. The "true" mean is reported as the measurement mean, or average, and the confidence interval for that reported value, at a particular confidence level. For example
is the way one reports the "true" mean at the 95% confidence level.
Let's use the 5 weight measurements given above as an example. The measurement mean and standard deviation were 25.071 g and 0.11038 g, respectively. We will report the "true" mean at a 95% confidence level. The Student'st value for a 95% confidence level and the degreesoffreedom is obtained from the Table below. In this case, there are 4 degreesoffreedom (one is taken up in the calculation of the mean). The appropriate table value is t=2.776 Substituting these values into the Student'st formula, the "true" mean estimate is
which when numerically evaluated yields
The numbers may be roundedoff to indicate precision. Since the confidence interval of the estimated mean indicates uncertainty in the first digit to the right of the decimal, i.e., a confidence interval of +0.1…, there is no need express the result to any greater precision than implied by the most significant digit of the interval. Thus
gives an adequate indication of the "true" mean.
Student'st Values 

Degrees 
50% 
90% 
95% 
99% 
1 
1.000 
6.314 
12.71 
63.66 
2 
0.816 
2.920 
4.303 
9.925 
3 
0.765 
2.353 
3.182 
5.841 
4 
0.741 
2.132 
2.776 
4.604 
5 
0.727 
2.015 
2.571 
4.032 
6 
0.718 
1.943 
2.447 
3.707 
7 
0.711 
1.895 
2.365 
3.500 
8 
0.706 
1.860 
2.306 
3.355 
9 
0.703 
1.833 
2.262 
3.250 
10 
0.700 
1.812 
2.228 
3.169 
20 
0.687 
1.725 
2.086 
2.845 
infinite 
0.674 
1.645 
1.960 
2.576 
Significance testing
A number of statistical tests are available to check for significant differences between
measurement values. Two common tests used in measurement science are the "Qtest",
for rejecting suspect data points, and the "Student'st test", for
determining differences between means.
QTest
Very often, when examining the results of a set of measurements, one finds that there is
one datum that appears to be different than the others. The question is; should this datum
be rejected? If a reason for rejection cannot be found after critical examination of the
evidence (hopefully as recorded in a laboratory note book), then one must resort to
statistical tests. The Qtest is a statistical test used to determine whether or
not a suspected datum can be rejected from a data set when the total number of
measurements is less than 10.
The Qtest is based on the ratio of the interval between the suspect datum and the datum of a value closest to the suspect point, to the range of the data set. The range is the difference between the minimum and maximum data points. The ratio of these differences is the Q statistic
In performing the test, one "formulates a null hypothesis" and then checks to see if the hypothesis is invalid. (One cannot prove that it is true; it can only be shown to be false) In this case, the null hypothesis is that the Q value calculated from a data set including the suspect datum is not statistically different from an extreme Q value from a normally behaved data set. If the calculated Q value is less than that of the regular data, then the null hypothesis is true. In this case, the datum cannot be rejected based on statistical evidence. If, on the other hand, the calculated Q is greater than that for normal data, then the null hypothesis is false. This means that the suspect datum is not from a normal data set and may be rejected. Q values for normal data are tabulated according to confidence level and number of measurements. The Qtest is outlined below.
Step 1, calculate a Q value using
The suspect datum will be one of the terms in the range calculation since it is suspect because of its extreme value.
Step 2, look up the value of Q in the table corresponding to the number of measurements at a confidence level. This value is the extreme Q value expected from data with random errors
Step 3: If Q_{calc} > Q_{table}, then the suspect value can be rejected. All other statistical quantities such as the mean and the standard deviation are then calculated from the remaining values. If the opposite is true, (Q_{calc} < Q_{table}) then the suspect datum remains in the data set.
Keep in mind that the rejected datum may be valid (anything is possible in statistics). But, although valid, including this datum would unduly influence calculations of mean and standard deviation. There is thus good reason to apply the Qtest when a datum is suspected of being different. Also, use this test only once. There are better tests for rejecting more than one datum. Multiple application of the Qtest may lead to serious errors.
Q (rejection quotient) 90% confidence 

N  Q 
3  0.94 
4  0.76 
5  0.64 
6  0.56 
7  0.51 
8  0.47 
9  0.44 
10  0.41 
Student'st test
The Student'st statistic is useful for comparing data sets of finite number that
have random errors characterized by a Gaussian distribution. It is the correct metric for
comparing most real data and can be used for a variety of tests. For example, it may be
used to compare measured means obtained in different experiments, or to determine to what
confidence level two estimated "true" means are the same. It may also be used to
test whether or not a suspect point may be rejected from a data set by using subsets. The
use of Student'st to test for differences between measured and "true"
means will be illustrated here.
As with the Q test, Student'st tests require formulation of a null hypothesis. Under the null hypothesis, all data are the same. The data can then be manipulated as a combined set under this assumption. A value for Student'st is calculated from the combined data and compared to table values which are based on normal, Gaussiandistributed data. If the calculated Student'st is statistically different than the table value, then the null hypothesis is false, at a particular confidence level. A false null hypothesis indicates that the two sets of data are different.
A different form of the Student'st formula is needed for the test. Rearranging the formula for reporting the "true" mean with confidence interval
This formula is used to calculate a value for t based on a "true" or comparison mean, and the measurement mean, standard deviation, and number of data. The actual "test" is performed using the following steps.
Step 1: Determine a t_{calc} using the above formula with the "true" mean and the measurement statistics using the formula given above.
Step 2: Compare the calculated t (t_{calc}) to one from the table of Student'st values (t_{table}) for a particular confidence level, and the degreesoffreedom of the measurement.
Step 3: Test the null hypothesis by comparing the two t values. If the calculated value is greater than the table value (t_{calc}> t_{table}), then the null hypothesis is false to within the confidence level of the table value. In this case, the means are different. That is, the variation from the reported value is greater than you would expect from random error alone, and something is likely wrong with your experiment. Else, (t_{calc}< t_{table}), the null hypothesis is not shown to be false, and the two means are not different at the chosen confidence level.
Summary
The proper treatment of error in data is critical in any experimental science. The mere
reporting of numbers without any indication of how reliable the numbers are is useless.
The application of any of the statistical procedures described in this laboratory exercise
depends, first of all, on acquiring multiple data for any process or phenomenon measured.
Without at least three trials for each result, error analysis is reduced to a tedious and
often phony recitation of possible error sources that is boring to write and even more
boring to read. In order to perform a scientifically valid error analysis, you must make
your measurements at least three times.
Tuesday, August 03, 2004