Statistical Treatment Of Data

Two important, though often neglected, parts of an analysis are error analysis and correct results reporting. Results should always be reported along with some estimation of the errors involved. The best way to do this is to report the most likely value along with a confidence interval. The confidence interval gives the range of values thought to contain the "true" value. The statistical treatment of data involves basing the error estimation on firm theoretical principles. This laboratory exercise on treatment of data should help you understand and apply these principles.

Classification of errors
Two basic classes of error occur in measurements; systematic and random. Systematic errors have a determinate origin. Determinate means that there is cause for the error, and that the error itself can be determined by performing an auxiliary measurement. Though possible, it is more often the case that determinate errors are not recognized until it is too late to make the auxiliary measurement. Systematic errors produce measurements that are either consistently high or low, relative to the "true" value. These errors are always in the same direction and are often of the same magnitude. An example of this is a contaminant on a balance pan that always causes measured weights to be too high. Random errors are indeterminate in origin and cause a measured quantity to fluctuate around a central value. Indeterminate means that one is not able to determine the error by an auxiliary measurement, just as one cannot determine the outcome of flipping a (unbiased) coin prior to the actual event. Random errors vary in direction and magnitude from measurement to measurement.

The size of the error is often independent of measurement magnitude. These types of errors are called constant errors. With constant errors, there is no relationship between, or correlation, measurement and error magnitude. The constant error becomes less significant as the magnitude of the measurement increases. On the other hand, the error may increase with the magnitude of the measurement. In this case, the errors are called proportional errors. In this case increasing the sample size does not diminish the significance of the error.

Working with systematic errors
In general, a result based on the addition (or subtraction) of a number of values will have a systematic error that is the sum (or difference) of all the systematic errors of the individual measurements. For example, if a value Z is to be determined from the sum X+Y, and if X and Y have errors EX and EY, respectively, then Z is

The error in Z, EZ, found by subtracting Z=X+Y from the above, is

Note that in subtraction where

the error is EZ= EX-EY. If errors in X and Y are constant, EX=EY, and the error in Z is zero. This is the reason why many values are determined using the measurement by difference technique. For example, the volume delivered by a buret is determined by the difference in readings before and after fluid delivery, and the mass of a sample placed into a beaker is determined as the difference in beaker weight measurements before and after the sample has been added. The error canceling effect of the measurement by difference technique strictly applies to constant determinate errors. It can also apply when proportional errors occur and the magnitude of all measurements are about the same. Another technique, used to reduce the significance of constant errors, is to use sample sizes that are large compared to the error. In a word, the key to reduction of determinate errors is good laboratory technique.

Data reduction with random errors
Although good laboratory technique may reduce systematic error, random errors cannot be reduced by operator technique. However, the significance of random error is reduced with repeated measurements and data reduction. Although measurement precision does not change, the confidence we have in reporting the "true" value is enhanced with each replicate measurement. In other words, one has more confidence in the reported value after repeating the measurement several times. If one could repeat a measurement an infinite number of times, one could, in theory, report the "true" value of the measurement. This is because measurement errors tend to cancel due to their random nature. In general, the significance of random error decreases in rough proportion to the inverse square-root of the number of measurements.

Central tendency
When reporting central tendency for a series of measurements, we often use the mean or average value. The symbol for the mean is the variable with a line over the top. The formula for the mean, for example of x, is

where the xi are the individual measurements and N the total number of replicate measurements of a value. The capital sigma indicates that we sum all of the x values. The range of values to be summed is from 1 to N. If, for example, we measured the weight of a sample 5 times, and the individual weight values were 25.234, 25.132, 24.976, 25.030, and 24.983, then the average would be

The mean of replicate measurements is a good indication of the central tendency of the measurement value.

Reporting errors
It is also important to indicate a measure of the errors associated with particular set of measurements. For example, when presented with a choice, measurements with higher error may be discounted in favor of more precise measurements. Unless the errors are reported, there is no basis for this qualitative decision making. Ways to report the errors associated with a set of measurements are illustrated below.

The range of a data set is the absolute maximum difference observed in the data. It is calculated as the difference between the maximum, xmax, and minimum, xmin, values

The vertical bars indicate absolute value, i.e., positive values. For the weight measurement data set, the maximum and minimum values are xmax=25.234 g and xmin=24.983 g. The range is thus R = 25.234-24.983 = 0.251 g. The range is useful for qualitative evaluation of errors.

Maximum uncertainty
A maximum uncertainty may be used report "worst-case" errors. The maximum uncertainty is the difference between a value calculated using the measurement data, and that calculated using the measured data with their associated maximum errors. Maximum errors are often estimated from the quoted instrument precision, e.g., pipets, burets, volumetric flasks, scales, etc., often print an indication of error on the device. The maximum relative uncertainty is the maximum uncertainty, divided by the calculated value. For example, EZ/Z is the maximum relative uncertainty if EZ= EX+EY and EX and EY are the maximum uncertainties in X and Y measurements used to calculate either Z=X-Y or Z=X+Y.

Measurement variance
The formula for measurement variance, s2, is

The symbols are the same as those used in the mean calculation formula. The units of variance are the squared measurement units. If the measurements are in grams (g), the variance units are grams-squared (g2). The variance of the data used to calculate the mean value is

which when evaluated yields s2=0.012185 g2.

Measurement standard deviation
The measurement standard deviation is more often used to indicate precision or probable error. The greater the standard deviation, the less precise the data. The measurement standard deviation is simply related to the measurement variance through

Units of the measurement standard deviation are the same as those of the measurements that it is based on. For the weight measurements used above, the standard deviation is s=0.11038 g.

A word of caution; If you use predefined functions to perform statistical data analysis on a calculator, be sure that variance formula uses the correct degrees-of-freedom. Some calculators will divide by N and other will divide by N-1 when calculating variance. "Scientific calculators" often calculate both. In this case the s is the measurement standard deviation and the is the standard deviation calculated with N degrees-of-freedom. You should know what your calculator does. Confirm that you get the results found in this narrative using the example data if you are not sure how your calculator performs the calculations.

Relative standard deviation
Reporting a measurement standard deviation alone does not indicate the significance of the probable errors. A more informative way to indicate probable error is to report relative standard deviation, RSD for short. The RSD is the ratio of the measurement standard deviation to the mean of the quantity being measured

Notice that for constant random errors, the RSD decreases with measurement size. For example, say that the standard deviation for measuring a delivered volume from 10 or 40 mL from the same 50 mL buret is 0.10 mL. The relative standard deviation is 0.10 mL/10 mL=0.010 and 0.10 mL/40mL=0.0025, respectively.

RSD are unitless quantities. They are often reported in %, parts-per-thousand (ppth), parts-per-million (ppm), etc., errors. Multiply the RSD by 100 to get the % error; by 103 to get the error in parts-per-thousand (ppth); by 106 to get the error in parts-per-million (ppm).

For the weight example with a mean of 25.071 g and a standard deviation of 0.11038 g, the . RSD is 0.11038 g/25.071 g =0.0044026 The % error is 0.44026% and error in parts-per-thousand is 4.4026 ppth.

The sum over the squared-differences in the variance is divided by N-1. The factor "N-1" called the degrees-of-freedom of the calculation. In this case, the degrees-of-freedom is one less than the total number of measurements because the variance calculation is based on all of the xi replicate measurements and the mean. Since the mean is also based on all the replicate measurements, the summation in the measurement variance formula is effectively "double counting" one of the values. In fact, all N replicate measurements could be determined with knowledge of the mean, and only N-1 of the x values. Dividing by the N-1 degrees-of-freedom in the variance formula is a way to account for this double counting.

Keep in mind that the degrees-of-freedom is one less than the number of measurements when doing calculations. You will need this information to get the correct Student's-t value for calculating the in confidence intervals, as discussed below. Degrees-of-freedom are not always one less than the number of measurements. The degrees-of-freedom depend on how many parameters are calculating from the data. The more parameters calculated, the more degrees of freedom we use up. In the case above, only one parameter was calculated from the data.

Probability distributions
One can do a better job of reporting errors by making some assumptions regarding the distribution of possible errors. The error probability density function is a theoretical formula used to calculate the probability that a particular measurement will be obtained. The most common probability density function to assume for random errors the Gaussian distribution

The Gaussian distribution expressed as a probability, P, which is a function of the measurement x, the "true" mean, , and the "true" standard deviation, ("true" values are given using Greek). The formula gives the probability that an individual measurement will differ from the "true" mean by a given amount, as a result of random error. The random error are indicated by the parameter.

The Gaussian distribution curve gives the relative probability of obtaining a particular measurements of x. The unitless x-axis is plotted in relation to the "true" mean and is relative to the standard deviation. The curve shows that it is more likely to get a small error (small x-) than a large one (large x-). The most likely value is the mean itself, (x=).

The Gaussian distribution has several interesting properties. First, the total area under the curve is equal to one. Distribution functions that possess this property are called normalized. This property allows one to determine the probability (out of one) of obtaining a measurement within a certain range. For example, the area between 0 and 1 standard deviation is 0.3413 Thus the probability (or chance) of obtaining a measurement between x= and (x+)/o is 34.13%. Second, the curve is symmetrical with respect to the x-axis. The probability for a negative offset is equal to that for a positive offset. This property is sometimes referred to as that of an even function. The probability of obtaining a measurement between -1 and 1 standard deviation, or (x+)/o, is twice that of obtaining one between 0 and 1. Since the later is 34.13%, the former is 68.26%.

Third, the distribution may only be perfectly known from measurement data in the limit of an infinite number of measurements, or from inference from well understood experiments (like flipping coins, drawing cards, or counting molecules). Since this is rarely the case, the Gaussian distribution is most often used as a model of the ideal measurement situation. Because one cannot make an infinite number of measurements, the error formulas used are approximations to the "true" error distribution. The measurement average or mean is an estimate of the "true" mean

The measurement standard deviation is an estimate of the "true" standard deviation

Much of statistics is concerned with how good these approximations are. One thing is certain, the measurement mean and standard deviation are equal to the "true" values only in the limit as N approaches infinity.

Estimation with Gaussian distribution
With knowledge of the "true" mean and standard deviation of a set of measurements with random errors, the range over which a certain fraction of the measurements occur can be specified. For instance, the range of x values over which 95% of the measurements occur can be found from the area under the Gaussian distribution. The area under the Gaussian distribution curve is found by integration.

Note that the integration limits are between +1.96o. Since the area for this integration is 0.95, and the total area under the curve is 1, there is a 95% probability that the actual measurement value will differ from the "true" mean by + 1.96o due to random error only.

Clearly, the greater the "true" standard deviation, the greater the range over which 95% of the measurements will be. For example, if the "true" mean is 25.000 g, and the "true" standard deviation is 0.5102 g, then 95% of the measurements should be between 25+1 g. Put another way, one is 95% confident that a single measurement will be within the 25+1 g range. For this case, +1 g is the confidence interval, or range, at a 95% confidence level. If, on the other hand, the "true" standard deviation was 0.05102 g, then the one would state that they were 95% confident of a single measurement would be in the range or 25.0+0.1 g. The smaller standard deviation results in a smaller confidence interval.

Probable measurement values are conveniently indicated by

It is interesting to write this result in a different fashion

This formula indicates that one is 95% confident that the "true" mean is within +1.96 of any given measurement. There is also a 5% chance that the measurement will be outside the 95% confidence range. Put a different way, if the measured x value differs by more than this, then there is a 95% chance that something other than random error has corrupted the measurement. It would then be time to consider sources of systematic error.

Confidence intervals
In the real-world where N is less than infinity, errors associated with estimates of the "true" mean and standard deviation result in more uncertainty in the confidence interval than is indicated by the Gaussian distribution. In general, the fewer the measurements, the less the confidence level that can be assigned to a particular interval. Similarly, fewer measurements also means larger confidence intervals for a given confidence level.

These concept are quantitatively expressed in the "Student's-t" statistic. The Student's-t number is an integral over a distribution function similar to the Gaussian. It indicates probabilities that the mean found from a finite number of measurements will differ from the "true" mean by a given amount. A useful form of Student's-t formula is

where t is the Student's-t number. One looks up t in a table for a given confidence and number of degrees-of-freedom.

Ideally, one would really like to report the "true" mean. But, due to random errors, it is not possible to specify the "true" mean as a single number. Instead, one uses the Student's-t formula in the form given above to specify the "true" mean. The "true" mean is reported as the measurement mean, or average, and the confidence interval for that reported value, at a particular confidence level. For example

is the way one reports the "true" mean at the 95% confidence level.

Let's use the 5 weight measurements given above as an example. The measurement mean and standard deviation were 25.071 g and 0.11038 g, respectively. We will report the "true" mean at a 95% confidence level. The Student's-t value for a 95% confidence level and the degrees-of-freedom is obtained from the Table below. In this case, there are 4 degrees-of-freedom (one is taken up in the calculation of the mean). The appropriate table value is t=2.776 Substituting these values into the Student's-t formula, the "true" mean estimate is

which when numerically evaluated yields

The numbers may be rounded-off to indicate precision. Since the confidence interval of the estimated mean indicates uncertainty in the first digit to the right of the decimal, i.e., a confidence interval of +0.1…, there is no need express the result to any greater precision than implied by the most significant digit of the interval. Thus

gives an adequate indication of the "true" mean.

Student's-t Values
at Common Confidence Levels


































































Significance testing
A number of statistical tests are available to check for significant differences between measurement values. Two common tests used in measurement science are the "Q-test", for rejecting suspect data points, and the "Student's-t test", for determining differences between means.

Very often, when examining the results of a set of measurements, one finds that there is one datum that appears to be different than the others. The question is; should this datum be rejected? If a reason for rejection cannot be found after critical examination of the evidence (hopefully as recorded in a laboratory note book), then one must resort to statistical tests. The Q-test is a statistical test used to determine whether or not a suspected datum can be rejected from a data set when the total number of measurements is less than 10.

The Q-test is based on the ratio of the interval between the suspect datum and the datum of a value closest to the suspect point, to the range of the data set. The range is the difference between the minimum and maximum data points. The ratio of these differences is the Q statistic

In performing the test, one "formulates a null hypothesis" and then checks to see if the hypothesis is invalid. (One cannot prove that it is true; it can only be shown to be false) In this case, the null hypothesis is that the Q value calculated from a data set including the suspect datum is not statistically different from an extreme Q value from a normally behaved data set. If the calculated Q value is less than that of the regular data, then the null hypothesis is true. In this case, the datum cannot be rejected based on statistical evidence. If, on the other hand, the calculated Q is greater than that for normal data, then the null hypothesis is false. This means that the suspect datum is not from a normal data set and may be rejected. Q values for normal data are tabulated according to confidence level and number of measurements. The Q-test is outlined below.

Step 1, calculate a Q value using

The suspect datum will be one of the terms in the range calculation since it is suspect because of its extreme value.

Step 2, look up the value of Q in the table corresponding to the number of measurements at a confidence level. This value is the extreme Q value expected from data with random errors

Step 3: If Qcalc > Qtable, then the suspect value can be rejected. All other statistical quantities such as the mean and the standard deviation are then calculated from the remaining values. If the opposite is true, (Qcalc < Qtable) then the suspect datum remains in the data set.

Keep in mind that the rejected datum may be valid (anything is possible in statistics). But, although valid, including this datum would unduly influence calculations of mean and standard deviation. There is thus good reason to apply the Q-test when a datum is suspected of being different. Also, use this test only once. There are better tests for rejecting more than one datum. Multiple application of the Q-test may lead to serious errors.

Q (rejection quotient)
90% confidence
3 0.94
4 0.76
5 0.64
6 0.56
7 0.51
8 0.47
9 0.44
10 0.41

Student's-t test
The Student's-t statistic is useful for comparing data sets of finite number that have random errors characterized by a Gaussian distribution. It is the correct metric for comparing most real data and can be used for a variety of tests. For example, it may be used to compare measured means obtained in different experiments, or to determine to what confidence level two estimated "true" means are the same. It may also be used to test whether or not a suspect point may be rejected from a data set by using sub-sets. The use of Student's-t to test for differences between measured and "true" means will be illustrated here.

As with the Q test, Student's-t tests require formulation of a null hypothesis. Under the null hypothesis, all data are the same. The data can then be manipulated as a combined set under this assumption. A value for Student's-t is calculated from the combined data and compared to table values which are based on normal, Gaussian-distributed data. If the calculated Student's-t is statistically different than the table value, then the null hypothesis is false, at a particular confidence level. A false null hypothesis indicates that the two sets of data are different.

A different form of the Student's-t formula is needed for the test. Rearranging the formula for reporting the "true" mean with confidence interval

This formula is used to calculate a value for t based on a "true" or comparison mean, and the measurement mean, standard deviation, and number of data. The actual "test" is performed using the following steps.

Step 1: Determine a tcalc using the above formula with the "true" mean and the measurement statistics using the formula given above.

Step 2: Compare the calculated t (tcalc) to one from the table of Student's-t values (ttable) for a particular confidence level, and the degrees-of-freedom of the measurement.

Step 3: Test the null hypothesis by comparing the two t values. If the calculated value is greater than the table value (tcalc> ttable), then the null hypothesis is false to within the confidence level of the table value. In this case, the means are different. That is, the variation from the reported value is greater than you would expect from random error alone, and something is likely wrong with your experiment. Else, (tcalc< ttable), the null hypothesis is not shown to be false, and the two means are not different at the chosen confidence level.

The proper treatment of error in data is critical in any experimental science. The mere reporting of numbers without any indication of how reliable the numbers are is useless. The application of any of the statistical procedures described in this laboratory exercise depends, first of all, on acquiring multiple data for any process or phenomenon measured. Without at least three trials for each result, error analysis is reduced to a tedious and often phony recitation of possible error sources that is boring to write and even more boring to read. In order to perform a scientifically valid error analysis, you must make your measurements at least three times.

Tuesday, August 03, 2004