SAMPLING ERRORS ASSOCIATED WITH STATISTICS PRODUCED FROM THE SAMPLE FILES

1.0 Introduction

Any statistics produced from the sample files will be subject to sampling error. This means, for example, that any Australian totals produced from the sample files will be unlikely to correspond exactly to the published census figures for the same characteristics.

In the explanation and tables of standard errors given here it is assumed that all figures produced by counting the number of records on the sample file are multiplied by 100 to give an estimate of the total number of persons or households in the population with the same characteristics. These figures multiplied by 100 are referred to as 'estimates'. If the 0.1% sub-samples are used to produce estimates, figures should be multiplied by 1000 to obtain estimates.

2.0 What is sampling error?

Since only a sample of persons is included on the sample files, estimates derived from the files may differ from figures which would have been obtained if all persons had been included. One measure of the likely difference is given by the 'standard error' which indicates the extent to which an estimate might have varied by chance because only a sample of persons was included.

Each particular one per cent sample selected was only one of a large number of possible one per cent samples. Each possible sample would yield different estimates. The standard error measures the variation of all the possible one per cent sample estimates around the figures which would have been obtained if all persons had been included.

Given an estimate and the standard error on that estimate, there are about two chances in three that the sample estimate will differ by less than one standard error from the figure that would have been obtained if all persons had been included, and about nineteen chances in twenty that the difference will be less than two standard errors.

Another measure of the sampling variability is the relative standard error, which is obtained by expressing the standard error as a percentage of the estimate to which it refers, that is:

Relative Standard Error =( Standard Error / Estimate ) x 100

The following example illustrates the uses of the concepts of standard error and relative standard error:

EXAMPLE. If an estimate of 6,000 has a relative standard error of 10 per cent, then the standard error of that estimate is 10 per cent of 6,000 or 600. Thus there are two chances in three that the figure that would have been obtained if all persons had been included on the sample file will be in the range 6,000 + (1x600) or 5,400 to 6,600, and nineteen chances in twenty that this figure is between 6,000 + (2x600) or 4,800 to 7,200.

3.0 Estimates with large sampling errors

Generally, estimates of less than 400 persons or 500 households will be subject to sampling errors so large that the estimates will be of limited usefulness in most situations. In addition, for an estimate of persons derived from the Households Sample File involving any of the characteristics:

    year of arrival;
    period of residence;
    birthplace (if overseas);
    country of citizenship (if overseas);
    religion; or
    Aboriginal/Torres Strait Islander;

an estimate of below 1,000 persons will be of limited usefulness.

4.0 Presentation of sampling errors

In order to assist the user of the census sample files in evaluating the reliability of estimates produced from the files, a number of tables which relate the relative standard error of an estimate to the size of the estimate have been produced. These are given at the end of this document. The estimates of standard error are based on those produced from the sample processing of the 1976 Census and may be subject to revision in the future after detailed analysis of the sample files has been undertaken. However, they should provide a very good approximation to the standard errors of estimates produced from the sample files.

As can be seen from the tables, the larger an estimate, the greater its reliability, and thus the smaller the relative standard error. The tables are not intended to give a precise measure of the error for a particular estimate, but provide an indication of the likely magnitude of the relative standard error for estimates of any particular size.

All sampling errors presented here refer to estimates produced from the full 1% sample file. If one of the 0.1% sub-samples is used to produce estimates, these sampling errors should be multiplied by 3.2 (the square root of 10).

5.0 Sampling errors on estimates of number of households, dwellings and persons

5.1 Types of estimates

Sampling errors depend on the type of estimate concerned. It is possible to produce three types of estimates from these files:

(a) Household, dwelling or family estimates. The relative standard errors are given by the H-Line row in Table 1. These estimates are relevant only to the Households Sample File.

(b) Person estimates from the Households Sample File. The relative standard errors are given by the A-Line and B-Line rows of Table 1.

(c) Person estimates from the Persons Sample File. The relative standard errors are given by the B-Line row of Table 1.

The reason both the A-Line and B-Line are necessary is that there is a difference between the relative standard errors for different person estimates from the Households Sample File because some characteristics are generally similar for persons in the same household (i.e. they are `clustered'), but differ for persons in different households, e.g. religion and birthplace. In the Households Sample File, the sampling scheme used involved the inclusion of ALL persons in selected households rather than the selection of every hundredth person in the population. Hence, for characteristics which are clustered by household, there is a greater chance that such persons would have been either undersampled or oversampled. Thus estimates of number of persons classified by characteristics which are clustered by household will have somewhat higher relative standard errors.

'Clustered' topics where the A-Line should be used include year of arrival, period of residence, birthplace (if overseas), country of citizenship (if overseas), religion, Aboriginal/Torres Strait Islander or any dwelling characteristics.

As the Persons Sample File is selected by taking every one hundredth person, rather than all persons within every one hundredth household (which is the method used for the Households Sample File), there is no clustering of characteristics within households. Thus the B-Line can be used for all estimates from the Persons Sample File.

5.2 Use of Table 1

Consider an estimate of the number of female university graduates with some particular characteristics. The relative standard error can be derived from the B-Line of Table 1. If the estimated number of female university graduates with the particular characteristics is 10,000, then, reading from this line, the relative standard error is approximately 8 per cent. The standard error on the estimate is 10,000 x 8/100 = 800. Therefore, there are nineteen chances in twenty that the number of female university graduates with the particular characteristics is in the range 10,000 + (2x800) or 8,400 to 11,600.

6.0 Sampling errors on estimates of proportions and percentages

Proportions and percentages formed from the ratio of two census sample estimates are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. The formula for the relative standard error of a proportion is given below:

Relative standard error (x/y)

= SQRT{[Relative standard error (x)]2 - [Relative standard error (y)]2}

The relative standard error on a percentage is the same as for the corresponding proportion. Thus the relative standard error on an estimate of 58 per cent is the same as that on the proportion 0.58.

EXAMPLE. Consider an estimate from the Households Sample File of the labour force participation rate for persons born overseas for a particular cross-classification cell. If the estimated number of persons born overseas who are in the labour force is 15,000 and the total number of persons born overseas is estimated to be 25,000, then the estimated proportion is 15,000/25,000 = 0.60. The relative standard errors for both the numerator and denominator will be derived from graphing the A-Line in Table 1. Reading from this line, the relative standard error of the numerator (i.e. the estimated number of persons born overseas who are in the labour force) is approximately 14 per cent, and the relative standard error of the denominator (i.e. the estimated number of persons born overseas) is approximately 11 per cent. The relative standard error of the estimate of the proportion is therefore

SQRT {14x14 - 11x11} = SQRT {75} = 8.6%

The standard error on the proportion is 0.6 x 8.6/100 = 0.05. Therefore, there are nineteen chances in twenty that the labour force participation rate for persons born overseas for the cross-classification cell is in the range of 0.60 + (2x0.05) or 0.50 to 0.70.

As can be seen from the above formula, the relative standard error of a proportion or percentage will always be less than the relative standard error of the numerator. However, whenever a proportion or percentage is small (i.e. the denominator is considerably greater than the numerator), it will be reasonable to approximate the relative standard error of the proportion or percentage by the relative standard error of the numerator.

Relative standard errors for estimates of proportions or percentages may also be determined from Table 2 which sets out relative standard errors for selected percentages or proportions.

7.0 Sampling errors on estimates of differences

The relative standard error on the difference between two estimates of numbers or between two estimates of proportions (or percentages) can also be derived from the tables of relative standard errors.

For the difference between two estimates produced from the census sample files the standard error of the difference may be approximated by the following formula:

Standard error (z-y) = SQRT{[Standard error (z)]2 + [Standard error (y)]2}

This approximation will be exact for differences between estimates of the same characteristic in two different States, for estimates from the Persons Sample File, or for differences between other separate and uncorrelated characteristics. If, however, there is positive correlation between the characteristics (e.g. number of lawyers compared with the number of persons with law degrees), the above approximation will overestimate the true standard error. If there is a negative correlation between the characteristics (e.g. percentage of persons who travel to work by train compared with the percentage who travel by car), it will underestimate the true standard error.

EXAMPLE. If the estimates produced from the Households Sample File for 'Other urban' and 'Rural' of the number of occupied dwellings with 5 bedrooms are 88,000 and 73,000 respectively, and the number of occupied dwellings with 5 bedrooms and outer walls of brick are 23,000 and 59,000 respectively, then the percentage of occupied 5 bedroom dwellings with brick walls in each of these areas is (23,000/88,000) x 100 = 26% and (59,000/73,000) x 100 = 81% respectively. The difference between these estimated percentages is therefore 55 per cent.

The calculation of the standard error of this difference requires the standard error of each of the percentages to be calculated. The relative standard errors on each of the estimates of numbers (88,000, 73,000, 23,000 and 59,000) can be derived from graphing the H-Line in Table 1 (3%, 3.3%, 6% and 3.7% respectively). Using the formula previously given for the relative standard error of a percentage, the relative standard errors on the estimated percentages are:

SQRT {6x6 - 3x3}= 5.2% and SQRT{3.7x3.7 - 3.3x3.3}= 1.7%

The standard errors on each of the percentages are then

26 x 5.2/100 = 1.4% and 81 x 1.7/100 = 1.4%.

Finally, the standard error on the difference is

SQRT{1.4 x 1.4 + 1.4 x 1.4} = 2.0%.

Therefore, there are nineteen chances in twenty that the difference between the percentage of occupied 5 bedroom dwellings with brick walls in the two different areas will be within the range 55 + (2 x 2.0) or 51 per cent to 59 per cent.

TABLE 1. RELATIVE STANDARD ERRORS OF HOUSEHOLD, DWELLING AND PERSON ESTIMATES

Estimate
400
500
700
1,000
2,000
5,000
8,000
10,000
30,000
50,000
100,000
A-Line
72
65
56
47
35
23
18
17
10
8.0
5.8
B-Line
41
36
3l
25
18
11
8.7
7.8
4.4
3.4
2.4
H-Line
47
42
35
30
21
13
10
9.1
5.2
4.0
2.8

TABLE 2. RELATIVE STANDARD ERRORS ON PERCENTAGES OR PROPORTIONS

TABLE 2A. CLUSTERED PERSON TOPICS (A LINE)


Percentage
Denominator
15
20
30
45
60
75







5,000
49
42
32
24
18
13
10,000
36
30
24
17
13
9.1
30,000
22
18
14
11
7.7
5.5
50,000
17
15
11
8.2
6.1
4.4
75,000
14
12
9.4
6.9
5.1
3.6
100,000
13
11
8.2
6.0
4.5
3.2

TABLE 2B. UNCLUSTERED PERSON TOPICS (B LINE)


Percentage
Denominator
15
20
30
45
60
75







5,000
27
23
17
13
9.2
6.5
10,000
19
16
12
8.8
6.5
4.6
30,000
11
9.1
6.9
5.0
3.7
2.6
50,000
8.3
7.0
5.3
3.8
2.8
2.0
75,000
6.8
5.7
4.3
3.1
2.3
1.6
100,000
5.8
4.9
3.7
2.7
2.0
1.4

TABLE 2C. NUMERATOR - CLUSTERED PERSON TOPICS (A LINE)
DENOMINATOR - UNCLUSTERED PERSON TOPICS (B LINE)


Percentage
Denominator
15
20
30
45
60
75







5,000
53
46
38
31
27
24
10,000
39
34
28
23
20
17
30,000
24
21
17
14
12
11
50,000
19
16
13
11
9.5
8.4
75,000
16
14
11
9.1
7.9
7.0
100,000
-
14
12
9.8
8.0
6.9

TABLE 2D. HOUSEHOLD AND DWELLING TOPICS (H LINE)


Percentage
Denominator
15
20
30
45
60
75







5,000
32
27
20
15
11
7.6
10,000
22
19
14
10
7.6
5.3
30,000
13
11
8.1
5.9
4.3
3.1
50,000
9.8
8.2
6.2
4.5
3.3
2.4
75,000
8.0
6.7
5.1
3.7
2.7
1.9
100,000
6.9
5.8
4.4
3.2
2.3
1.7

 
 

General Enquiries: assda@anu.edu.au
Web Enquiries: webmaster@assda.anu.edu.au

Last Updated: 16/11/05