Sampling Errors Associated With Statistics Produces From The ASCO/CCLO: Link File

Introduction

Any statistics produced from the sample ASCO/ CCLO: Link file will be subject to sampling error. This means, for example, that any Australian totals produced from the file will be unlikely to correspond exactly to the published census figures for the same characteristics.

What is sampling error?

Since only a sample of persons is included on the file, estimates derived from the file may differ from figures which would have been obtained if all persons had been included. One measure of the likely difference is given by the 'standard error' which indicates the extent to which an estimate might have varied by chance because only a sample of persons was included.

The particular sample selected was only one of a large number of possible samples of the same size. Each possible sample would yield different estimates. The standard error measures the variation of all the possible sample estimates around the figures which would have been obtained if all persons had been included.

Given an estimate and the standard error on that estimate, there are about two chances in three that the sample estimate will differ by less than one standard error from the figure that would have been obtained if all persons had been included, and about nineteen chances in twenty that the difference will be less than two standard errors.

Another measure of the sampling variability is the relative standard error, which is obtained by expressing the standard error as a percentage of the estimate to which it refers, that is:

Relative Standard Error = (Standard Error / Estimate) X 100

The following example illustrates the use of the concepts of standard error and relative standard error:

EXAMPLE. If an estimate of 6,000 has a relative standard error of 10 per cent, then the standard error of that estimate is 10 per cent of 6,000 or 600. Thus there are two chances in three that the figure that would have been obtained if all persons had been included on the sample file is in the range 6,000 ± (1 x 600) or 5,400 to 6,600, and nineteen chances in twenty that this figure is between 6,000 ± (2 x 600) or 4,800 to 7,200.

Small estimates, which are based on very few persons, may have standard errors greater than 50 per cent. In this case the range for the 'all persons' figure is no longer as given above; rather there are about nineteen chances in twenty that the 'all persons' figure is less than the sample estimate plus twice the standard error. Estimates with such high relative standard errors are very unreliable and therefore should be used with considerable caution.

Presentation of sampling errors

In order to assist the user of the sample file in evaluating the reliability of estimates produced from the file, a number of tables which relate the relative standard error of an estimate to the size of the estimate have been produced. These are given at the end of this section.

As can be seen from the tables, the larger an estimate, the greater its reliability, and thus the smaller the relative standard error. The tables are not intended to give a precise measure of the error for a particular estimate, but provide an indication of the likely magnitude of the relative standard error for estimates of any particular size. A complete description of the methods to be used to obtain the relative standard error for any estimate is given below.

Sampling errors on estimates of number of persons

Types of estimates

Sampling errors depend on the type of estimates concerned. Because two sampling systems were used, one in NSW and one in all other states, there are three types of estimates to consider:

(a) Estimates for NSW;
(b) Estimates for any state or group of states excluding NSW; and

The relative standard errors for the three types are given in Table 1.

An example

Consider an estimate of the number of persons in NSW in a particular ASCO/ CCLO major group combination. The relative standard error can be derived from the NSW column of Table 1. If the estimated number of persons with the particular combination is 4,000 then, reading from this line, the relative standard error is approximately 10 per cent. The standard error on the estimate is 4,000 x 10/ 100 = 400. Therefore, there are nineteen chances in twenty that the number of persons in NSW with the particular combination is in the range 4,000 ± (2 x 400) or 3,200 to 4,800.

Formulae for relative standard errors

The approximate relative standard errors (RSE) presented in Table 1 can be obtained from the following formulae.

	NSW estimates

RSE = 10 (2.870 - .517 (log10 (estimate)))

Estimates for other states

RSE = 10 (2.732 - .537 (log10 (estimate)))

Estimates for Australia and groups of states including NSW

RSE = 10 (2.683 - .493 (log10 (estimate)))

These formulae differ from one based on simple random sampling because of the effects of clustering and systematic sampling.

Estimates with large sampling errors

Estimates of less than 200 persons in NSW, of less than 80 persons for other states or of less than 150 persons for all Australia or groups of states including NSW will be subject to a sampling error so large that the estimates will be of limited use in most situations.

Sampling errors on estimates of proportions and percentages

Proportions and percentages formed from the ratio of the two sample estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. The formula for the relative standard error of a proportion is given below. It assumes that the design effects on the numerator and denominator are the same.

Relative standard error (x/y) = SQRT( [Relative -standard error (x)]2 - [Relative standard error(y)]2)

The relative standard error on a percentage is the same as for the corresponding proportion. Thus the relative standard error on an estimate of 58 per cent is the same as that on the proportion 0.58.

EXAMPLE. Consider an estimate of the proportion of females in NSW in a particular ASCO/CCLO cross-classification cell. If the estimated number of females in the cell is 15,000 and the total number of persons in the cell is estimated to be 25,000, then the estimated proportion is 15,000 / 25,000 = 0.60. The relative standard errors for both the numerator and denominator are derived from the NSW column of Table 1. Interpolating from this table the relative standard error of the numerator (i.e. the estimated number of females who are in the ASCO/ CCLO cross-classification cell) is approximately 5.4 per cent, and the relative standard error of the denominator (i.e. the estimated number of persons in the cell) is approximately 4.0 per cent. The relative standard error of the estimate of the proportion is therefore:

SQRT(5.42- 42) = 3.6%

The standard error on the proportion is 0.6 x 3.6 / 100 = 0.02. Therefore, there are nineteen chances in twenty that the proportion of females for the particular ASCO/ CCLO cross-classification cell is in the range of 0.60±(2 X 0.02) or 0.56 to 0.64.

As can be seen from the above formula, the relative standard error of a proportion or percentage will always be less than the relative standard error of the numerator. However, whenever a proportion or percentage is small (i.e. the denominator is considerably greater than the numerator), it will be reasonable to approximate the relative standard error of the proportion or percentage by the relative standard error of the numerator.

Relative standard errors for estimates of proportions or percentages may also be determined from Table 2 which sets out relative standard errors for selected percentages or proportions.

In the above example, if the total number of persons in the particular ASCO/ CCLO cell were estimated as 50,000 and 60% were estimated to be female, then the relative standard error of the percentage could be read directly from Table 2A as 2.3%.

Sampling errors on estimates of differences

The relative standard error on the difference between two estimates of numbers or between two estimates of proportions (or percentages) can also be derived from the tables of relative standard errors. For the difference between two estimates produced from the sample file the standard error of the difference may be approximated by the following formula:

Standard error (z-y) = SQRT[(Standard error (Z))2 + (Standard error (y))2].

This approximation will be exact for differences between estimates of the same characteristic in two different states, or for differences between other separate and uncorrelated characteristics. If, however, there is positive correlation between the characteristics (e.g. number of plumbers compared with the number of tradespersons), the above approximation will overestimate the true standard error. If there is a negative correlation between the characteristics (e.g. percentage of persons in a particular occupation who are male compared with the percentage who are female), it will underestimate the true standard error.

EXAMPLE. If the estimates produced from the file of the number of persons who are classified as para-professionals in ASCO (major group 3) are 139,000 and 30,000 in NSW and SA respectively, and the corresponding numbers of persons who are also classified to CCLO group 1 (professional, technical and related occupations) are 75,000 and 23,000, then the percentage of persons in ASCO major group 3 who are also in CCLO major group 1 is:

(75,000/139,000) x 100 = 54 per cent in NSW and

(23,000/39,000) x 100 = 59 per cent in SA.

The difference between these estimated percentages is, therefore, 5 per cent. The calculation of the standard error of this difference requires the standard error of each of the percentages to be calculated. The relative standard errors on each of the estimates of number (139,000, 39,000, 75,000 and 23,000) can be derived from the NSW and Other columns of Table 1. Using the formula given previously for the relative standard error of a percentage, the relative standard errors on the estimated percentages are:

SQRT (2.32 - 1.72) = 1.5 per cent for NSW; and

SQRT (2.52 - 1.82) = 1.7 per cent for SA.

The standard errors on each of the percentages are therefore:

54 x 1.5 / 100 = 0.8 per cent; and

59 x 1.7 / 100 = 1.0.

Finally, using the formula for the standard error on a difference, the standard error on the difference of 5 per cent is:

SQRT (0.82 + 1.02) = 1.3 per cent.

Therefore, there are nineteen chances in twenty that the difference between the percentage of persons in ASCO major group 3 who are in CCLO major group 1 in NSW and SA is within the range 5.0 ± (2 x 1.3) or 2.4 per cent to 7.6 per cent.

TABLE 1. RELATIVE STANDARD ERROR FOR NEW SOUTH WALES, OTHER

STATES AND AUSTRALIA

(per cents)

Value of estimateNew South WalesOther StatesAustralia
8051
10046
150563741
200483135
300392529
400342225
500301922
600271721
700251619
800231518
900221417
1,000211316
2,000159.111
3,000127.39.3
4,000106.38.1
5,0009.15.67.2
6,0008.25.06.6
7,0007.64.66.1
8,0007.14.35.7
9,0006.74.15.4
10,0006.33.85.1
20,0004.42.63.6
30,0003.62.13.0
40,0003.11.82.6
50,0002.81.62.3
60,0002.51.52.1
70,0002.31.42.0
80,0002.21.31.8
90,0002.01.21.7
100,0001.91.11.6
200,0001.40.81.2
300,0001.10.61.0
400,0000.90.50.8
500,0000.80.50.8
600,0000.80.40.7
700,0000.70.40.6
800,0000.70.40.6
900,0000.60.30.6
1,000,0000.60.30.5
2,000,0000.40.20.4
6,000,0000.2

TABLE 2. RELATIVE STANDARD ERRORS ON PERCENTAGES OR

PROPORTIONS

TABLE 2A: WITHIN NEW SOUTH WALES


Percentage
Value of
Denominator
152030456075
5,000221914107.65.3
10,0001613107.25.33.7
30,0008.97.45.64.13.02.1
50,0006.85.74.33.12.31.6
75,0005.54.63.52.51.91.3
100,0004.84.03.02.21.61.1

TABLE 2B: WITHIN STATES EXCLUDING NEW SOUTH WALES


Percentage
Value of
Denominator
152030456075
5,00014129.16.54.83.3
10,0009.98.36.24.53.32.3
30,0005.54.63.52.51.81.3
50,0004.23.52.61.91.40.97
75,0003.42.82.11.51.10.78
100,0002.92.41.81.30.950.67

TABLE 2C: ACROSS AUSTRALIA


Percentage
Value of
Denominator
152030456075
5,0001714117.95.94.1
10,00012107.85.64.22.9
30,0007.05.94.53.32.41.7
50,0005.44.63.52.51.91.3
75,0004.53.82.92.11.51.1
100,0003.93.32.51.81.30.95

 

General Enquiries: assda@anu.edu.au
Web Enquiries: webmaster@assda.anu.edu.au