Sampling Errors Associated With Statistics Produces From The ASCO/CCLO: Link File
Introduction
Any statistics produced from the sample ASCO/ CCLO: Link file will be subject to sampling error. This means, for example, that any Australian totals produced from the file will be unlikely to correspond exactly to the published census figures for the same characteristics.
What is sampling error?
Since only a sample of persons is included on the file, estimates derived from the file may differ from figures which would have been obtained if all persons had been included. One measure of the likely difference is given by the 'standard error' which indicates the extent to which an estimate might have varied by chance because only a sample of persons was included.
The particular sample selected was only one of a large number of possible samples of the same size. Each possible sample would yield different estimates. The standard error measures the variation of all the possible sample estimates around the figures which would have been obtained if all persons had been included.
Given an estimate and the standard error on that estimate, there are about two chances in three that the sample estimate will differ by less than one standard error from the figure that would have been obtained if all persons had been included, and about nineteen chances in twenty that the difference will be less than two standard errors.
Another measure of the sampling variability is the relative standard error, which is obtained by expressing the standard error as a percentage of the estimate to which it refers, that is:
Relative Standard Error = (Standard Error / Estimate) X 100
The following example illustrates the use of the concepts of standard error and relative standard error:
EXAMPLE. If an estimate of 6,000 has a relative standard error of 10 per cent, then the standard error of that estimate is 10 per cent of 6,000 or 600. Thus there are two chances in three that the figure that would have been obtained if all persons had been included on the sample file is in the range 6,000 ± (1 x 600) or 5,400 to 6,600, and nineteen chances in twenty that this figure is between 6,000 ± (2 x 600) or 4,800 to 7,200.
Small estimates, which are based on very few persons, may have standard errors greater than 50 per cent. In this case the range for the 'all persons' figure is no longer as given above; rather there are about nineteen chances in twenty that the 'all persons' figure is less than the sample estimate plus twice the standard error. Estimates with such high relative standard errors are very unreliable and therefore should be used with considerable caution.
Presentation of sampling errors
In order to assist the user of the sample file in evaluating the reliability of estimates produced from the file, a number of tables which relate the relative standard error of an estimate to the size of the estimate have been produced. These are given at the end of this section.
As can be seen from the tables, the larger an estimate, the greater its reliability, and thus the smaller the relative standard error. The tables are not intended to give a precise measure of the error for a particular estimate, but provide an indication of the likely magnitude of the relative standard error for estimates of any particular size. A complete description of the methods to be used to obtain the relative standard error for any estimate is given below.
Sampling errors on estimates of number of persons
Types of estimates
Sampling errors depend on the type of estimates concerned. Because two sampling systems were used, one in NSW and one in all other states, there are three types of estimates to consider:
(a) Estimates for NSW;
(b) Estimates for any state or group of states excluding NSW; and
The relative standard errors for the three types are given in Table 1.
An example
Consider an estimate of the number of persons in NSW in a particular ASCO/ CCLO major group combination. The relative standard error can be derived from the NSW column of Table 1. If the estimated number of persons with the particular combination is 4,000 then, reading from this line, the relative standard error is approximately 10 per cent. The standard error on the estimate is 4,000 x 10/ 100 = 400. Therefore, there are nineteen chances in twenty that the number of persons in NSW with the particular combination is in the range 4,000 ± (2 x 400) or 3,200 to 4,800.
Formulae for relative standard errors
The approximate relative standard errors (RSE) presented in Table 1 can be obtained from the following formulae.
NSW estimatesThese formulae differ from one based on simple random sampling because of the effects of clustering and systematic sampling.
RSE = 10 (2.870 - .517 (log10 (estimate)))
Estimates for other states
RSE = 10 (2.732 - .537 (log10 (estimate)))
Estimates for Australia and groups of states including NSW
RSE = 10 (2.683 - .493 (log10 (estimate)))
Estimates with large sampling errors
Estimates of less than 200 persons in NSW, of less than 80 persons for other states or of less than 150 persons for all Australia or groups of states including NSW will be subject to a sampling error so large that the estimates will be of limited use in most situations.
Sampling errors on estimates of proportions and percentages
Proportions and percentages formed from the ratio of the two sample estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. The formula for the relative standard error of a proportion is given below. It assumes that the design effects on the numerator and denominator are the same.
Relative standard error (x/y) = SQRT( [Relative -standard error (x)]2 - [Relative standard error(y)]2)The relative standard error on a percentage is the same as for the corresponding proportion. Thus the relative standard error on an estimate of 58 per cent is the same as that on the proportion 0.58.
EXAMPLE. Consider an estimate of the proportion of females in NSW in a particular ASCO/CCLO cross-classification cell. If the estimated number of females in the cell is 15,000 and the total number of persons in the cell is estimated to be 25,000, then the estimated proportion is 15,000 / 25,000 = 0.60. The relative standard errors for both the numerator and denominator are derived from the NSW column of Table 1. Interpolating from this table the relative standard error of the numerator (i.e. the estimated number of females who are in the ASCO/ CCLO cross-classification cell) is approximately 5.4 per cent, and the relative standard error of the denominator (i.e. the estimated number of persons in the cell) is approximately 4.0 per cent. The relative standard error of the estimate of the proportion is therefore:
SQRT(5.42- 42) = 3.6%The standard error on the proportion is 0.6 x 3.6 / 100 = 0.02. Therefore, there are nineteen chances in twenty that the proportion of females for the particular ASCO/ CCLO cross-classification cell is in the range of 0.60±(2 X 0.02) or 0.56 to 0.64.
As can be seen from the above formula, the relative standard error of a proportion or percentage will always be less than the relative standard error of the numerator. However, whenever a proportion or percentage is small (i.e. the denominator is considerably greater than the numerator), it will be reasonable to approximate the relative standard error of the proportion or percentage by the relative standard error of the numerator.
Relative standard errors for estimates of proportions or percentages may also be determined from Table 2 which sets out relative standard errors for selected percentages or proportions.
In the above example, if the total number of persons in the particular ASCO/ CCLO cell were estimated as 50,000 and 60% were estimated to be female, then the relative standard error of the percentage could be read directly from Table 2A as 2.3%.
Sampling errors on estimates of differences
The relative standard error on the difference between two estimates of numbers or between two estimates of proportions (or percentages) can also be derived from the tables of relative standard errors. For the difference between two estimates produced from the sample file the standard error of the difference may be approximated by the following formula:
Standard error (z-y) = SQRT[(Standard error (Z))2 + (Standard error (y))2].This approximation will be exact for differences between estimates of the same characteristic in two different states, or for differences between other separate and uncorrelated characteristics. If, however, there is positive correlation between the characteristics (e.g. number of plumbers compared with the number of tradespersons), the above approximation will overestimate the true standard error. If there is a negative correlation between the characteristics (e.g. percentage of persons in a particular occupation who are male compared with the percentage who are female), it will underestimate the true standard error.
EXAMPLE. If the estimates produced from the file of the number of persons who are classified as para-professionals in ASCO (major group 3) are 139,000 and 30,000 in NSW and SA respectively, and the corresponding numbers of persons who are also classified to CCLO group 1 (professional, technical and related occupations) are 75,000 and 23,000, then the percentage of persons in ASCO major group 3 who are also in CCLO major group 1 is:
(75,000/139,000) x 100 = 54 per cent in NSW andThe difference between these estimated percentages is, therefore, 5 per cent. The calculation of the standard error of this difference requires the standard error of each of the percentages to be calculated. The relative standard errors on each of the estimates of number (139,000, 39,000, 75,000 and 23,000) can be derived from the NSW and Other columns of Table 1. Using the formula given previously for the relative standard error of a percentage, the relative standard errors on the estimated percentages are:(23,000/39,000) x 100 = 59 per cent in SA.
SQRT (2.32 - 1.72) = 1.5 per cent for NSW; andThe standard errors on each of the percentages are therefore:
SQRT (2.52 - 1.82) = 1.7 per cent for SA.
54 x 1.5 / 100 = 0.8 per cent; andFinally, using the formula for the standard error on a difference, the standard error on the difference of 5 per cent is:
59 x 1.7 / 100 = 1.0.
SQRT (0.82 + 1.02) = 1.3 per cent.Therefore, there are nineteen chances in twenty that the difference between the percentage of persons in ASCO major group 3 who are in CCLO major group 1 in NSW and SA is within the range 5.0 ± (2 x 1.3) or 2.4 per cent to 7.6 per cent.
STATES AND AUSTRALIA
(per cents)
| Value of estimate | New South Wales | Other States | Australia |
| 80 | 51 | ||
| 100 | 46 | ||
| 150 | 56 | 37 | 41 |
| 200 | 48 | 31 | 35 |
| 300 | 39 | 25 | 29 |
| 400 | 34 | 22 | 25 |
| 500 | 30 | 19 | 22 |
| 600 | 27 | 17 | 21 |
| 700 | 25 | 16 | 19 |
| 800 | 23 | 15 | 18 |
| 900 | 22 | 14 | 17 |
| 1,000 | 21 | 13 | 16 |
| 2,000 | 15 | 9.1 | 11 |
| 3,000 | 12 | 7.3 | 9.3 |
| 4,000 | 10 | 6.3 | 8.1 |
| 5,000 | 9.1 | 5.6 | 7.2 |
| 6,000 | 8.2 | 5.0 | 6.6 |
| 7,000 | 7.6 | 4.6 | 6.1 |
| 8,000 | 7.1 | 4.3 | 5.7 |
| 9,000 | 6.7 | 4.1 | 5.4 |
| 10,000 | 6.3 | 3.8 | 5.1 |
| 20,000 | 4.4 | 2.6 | 3.6 |
| 30,000 | 3.6 | 2.1 | 3.0 |
| 40,000 | 3.1 | 1.8 | 2.6 |
| 50,000 | 2.8 | 1.6 | 2.3 |
| 60,000 | 2.5 | 1.5 | 2.1 |
| 70,000 | 2.3 | 1.4 | 2.0 |
| 80,000 | 2.2 | 1.3 | 1.8 |
| 90,000 | 2.0 | 1.2 | 1.7 |
| 100,000 | 1.9 | 1.1 | 1.6 |
| 200,000 | 1.4 | 0.8 | 1.2 |
| 300,000 | 1.1 | 0.6 | 1.0 |
| 400,000 | 0.9 | 0.5 | 0.8 |
| 500,000 | 0.8 | 0.5 | 0.8 |
| 600,000 | 0.8 | 0.4 | 0.7 |
| 700,000 | 0.7 | 0.4 | 0.6 |
| 800,000 | 0.7 | 0.4 | 0.6 |
| 900,000 | 0.6 | 0.3 | 0.6 |
| 1,000,000 | 0.6 | 0.3 | 0.5 |
| 2,000,000 | 0.4 | 0.2 | 0.4 |
| 6,000,000 | 0.2 |
PROPORTIONS
| Percentage | ||||||
| Value of Denominator | 15 | 20 | 30 | 45 | 60 | 75 |
| 5,000 | 22 | 19 | 14 | 10 | 7.6 | 5.3 |
| 10,000 | 16 | 13 | 10 | 7.2 | 5.3 | 3.7 |
| 30,000 | 8.9 | 7.4 | 5.6 | 4.1 | 3.0 | 2.1 |
| 50,000 | 6.8 | 5.7 | 4.3 | 3.1 | 2.3 | 1.6 |
| 75,000 | 5.5 | 4.6 | 3.5 | 2.5 | 1.9 | 1.3 |
| 100,000 | 4.8 | 4.0 | 3.0 | 2.2 | 1.6 | 1.1 |
| Percentage | ||||||
| Value of Denominator | 15 | 20 | 30 | 45 | 60 | 75 |
| 5,000 | 14 | 12 | 9.1 | 6.5 | 4.8 | 3.3 |
| 10,000 | 9.9 | 8.3 | 6.2 | 4.5 | 3.3 | 2.3 |
| 30,000 | 5.5 | 4.6 | 3.5 | 2.5 | 1.8 | 1.3 |
| 50,000 | 4.2 | 3.5 | 2.6 | 1.9 | 1.4 | 0.97 |
| 75,000 | 3.4 | 2.8 | 2.1 | 1.5 | 1.1 | 0.78 |
| 100,000 | 2.9 | 2.4 | 1.8 | 1.3 | 0.95 | 0.67 |
| Percentage | ||||||
| Value of Denominator | 15 | 20 | 30 | 45 | 60 | 75 |
| 5,000 | 17 | 14 | 11 | 7.9 | 5.9 | 4.1 |
| 10,000 | 12 | 10 | 7.8 | 5.6 | 4.2 | 2.9 |
| 30,000 | 7.0 | 5.9 | 4.5 | 3.3 | 2.4 | 1.7 |
| 50,000 | 5.4 | 4.6 | 3.5 | 2.5 | 1.9 | 1.3 |
| 75,000 | 4.5 | 3.8 | 2.9 | 2.1 | 1.5 | 1.1 |
| 100,000 | 3.9 | 3.3 | 2.5 | 1.8 | 1.3 | 0.95 |