Effects of Determinant Errors

Constant error example: A scale is off by +5 mg. What are the errors produced with 15 mg and 1.5 g weighings?

The error could have been reported as parts-per-thousand (´ 103) (ppth), parts-per-million (´ 106) (ppm), parts-per-billion (´ 109) (ppb), parts-per-trillion (´ 1012) (ppt), in addition to the parts-per-hundred (´ 102) (%). Since the number is a weight ratio, one should use the convention, (w/w), to indicate this. This avoids confusing the result with that obtained with volumes, (v/v), or moles, (mol/mol).

For example, the error in the 1.5 g object measurement, in ppth, is


Reporting Uncertainty

The uncertainty is typically reported with the value using the ± symbol. For example, if a 5 mg uncertainty is known to be associated with a weight measurement of 1.505 g, we report

weight = 1.505 ± .005 g


Propagation of Error

The uncertainty in a number obtained as a result of a calculation is obtained using the propagation of errors formulas. For addition and subtraction:

where e is the error.

For multiplication and division:

where %e indicates the relative error; %ex=ex/x.

Other useful formulas are:


Terms Used In Statistical Analysis

Number of Data:

N

Data Set:

x1, x2, x3, ... ,xN

Mean:

Variance:

Standard Deviation:


Normal Distribution

The most general normal distribution is

where:

The normal distribution is:

The normal distribution is often given in terms of the generalized parameter, z

where:

Combined, these two formulas are equivalent to the general form.


Estimates Using the Normal Distribution

When we calculate the average and standard deviation from our data set, we are estimating the parameters of the parent population

In this case

Areas under the normal distribution tell us the probability of occurrence. Areas are tabulated as a function of z as in Table 4-1 of your text. The area is that of the integral equation


Student's t

The Student's t is given by

The Student's t is a parameter that describes the probable error in a finite series of measurements that sample a normally distributed parent population. It is the valid parameter to use when the standard deviation is also estimated from the series of measurements.

The first use of Student's t is for reporting values with confidence intervals. In this case, we use the form

A value for t is found in a table for a particular confidence level (% chance of being correct). The values are reported, for example, like


Student's t Test

The second use for Student's t is to test for significance. In this case we have two or more data sets that we wish to compare. We want to know whether or not they came from the same parent population (same object, etc.). To do this we

1) Formulate the null hypothesis, i.e, the two results come from sampling the same parent population.

2) If this is true, then the two results have the same m and s.

3) Using two Student's t formulas, equating the m, we calculate tcalc

where

4) we then compare tcalc to that of the table value for N1+N2-2 degrees of freedom.

5) If tcalc<ttable, then the null hypothesis is not wrong, else if tcalc>ttable, then the null hypothesis is wrong, and the two results are different.

This test may also be used to reject data from a data set.


Q-Test Datum Rejection

We may be in a situation where one datum apparently has a strong influence on the mean of a data set.

Consider the data set:

The 69.2 datum appears to be out of class, e.g., it comes from a different population.

Notice how this number affects the calculated means:

Since the mean obtained using 69.2 is very different from the majority of the data, we suspect it is out of class.

The Q-test is often used to test this hypothesis. Like Student's t-test, we calculate a Q value under the null hypothesis, e.g., data are the same, and then compare it to a table value using the logical scheme

In the later event, we should reject the data.

To perform the test we...Calculate the Q value is calculated using:

where the GAP is the difference between the suspect datum and its nearest neighbor and the RANGE is the difference between the maximum and minimum data in the set.

Rearranging the data set in order from the minimum to the maximum:

The RANGE and GAP are:

and the calculated Q value is:

We look into the table of Q values (e.g., Table 4-6 in Harris) for the number of data. In this case, N=6 and Qtable=0.56 Since

the null hypothesis is wrong, and we should reject the datum 69.2 from the set.


Linear Regression

When a set of standards are used to obtain a working curve, the data can be used to predict a straight line that will aid us in determining the analyte value in the unknown (sample) measurement.

The equation for the line is

and the data, in the form of pairs, (xi, yi), comes from a parent population distributed as

Assuming 1) that all the error is in the y parameter, and 2) that the errors, e.g., s, are the same for each measurement. Notice that m is the parent population parameter equivalent to m, and b is that of b.

Defining

The regression results for the slope and intercept are

Defining

the parameter uncertainty are


Back to Chemistry 360 Home


This page was created by Professor Stephen Bialkowski, Utah State University.

Friday, October 03, 2003