Sampling errors associated with census estimates
1.0 Introduction
This Section discusses the sampling errors associated with the data from the main processing phase of the 1976 Census. Data from the preliminary processing phase is not subject to sampling errors because all schedules were included. Unlike the preliminary data, the final data from the main processing phase is based on the processing of all census schedules from non-private dwellings, all schedules from the Northern Territory and a 50% sample of private dwellings in the other States and the A.C.T. Any estimate for the Northern Territory from either the preliminary or main processing phase is not subject to sampling error since all schedules from the Northern Territory were processed. Counts of the total number of males, total number of females and total number of persons for a CD or LGA from the final processing phase were constrained to agree with those from the preliminary processing phase. Therefore, these estimates of total are not subject to sampling error.
2.0 What is sampling error
Since only a 50% sample of private dwelling schedules was processed, it is likely that the estimates derived from this 50% sample would differ from figures which would have been obtained if all schedules were included. These differences are called sampling errors. The sampling error associated with any estimate can be estimated from the sample results and one measure so derived of this sampling error is the standard error. The particular 50% sample selected was one of a large number of possible 50% samples. Each possible 50% sample would have yielded different estimates and the standard error measures the variation of all the possible 50% sample estimates around the figures which would have been obtained if all schedules had been processed.
Given an estimate and the standard error on that estimate, there are about two chances in three that the sample estimate will differ by less than one standard error from the figure that would have been obtained if all schedules had been processed, and about nineteen chances in twenty that the difference will be less than two standard errors.
Another measure of the sampling error is the relative standard error which is obtained by expressing the standard error as a percentage of the estimate:
Relative Standard Error = ( Standard Error / Estimate ) x 100
Both standard error and relative standard error are used in the following discussion of the reliability of the estimates. An example of their application is as follows:
Example:
If an estimate of 70 has a relative standard error of 10% then the standard
error of that estimate is 10% of 70 or 7. Thus there are 2 chances in 3 that
the figure that would have been obtained if all schedules had been processed
will lie in the range 63 to 77 and about 19 chances in 20 that this figure is
between 56 and 84.
3.0 Presentation of sampling errors
It would have been impracticable to publish standard errors of all census estimates because difficulties in presentation would have been encountered with the large number of estimates. In addition, computer production of all standard errors would have been costly.
Consequently, tables which relate the relative standard error of an estimate to the size of the estimate are given at the end of this document. As can be seen from the tables, the larger an estimate, the greater its reliability and thus the smaller the relative standard error. The tables are not intended to give a precise measure of the error for a particular estimate, but provide an indication of the likely magnitude of the relative standard error for estimates of any particular size.
4.0 How to determine the sampling error on an estimate
There is no sampling error on an estimate if:
-
(a) the estimate is total males, total females or total persons in a
CD, LGAor aggregations of these areas.
(b) the estimate refers to the Northern Territory
The relative standard error or standard error for any other estimate may be found by reference to the tables given at the end of this document. A complete description of the methods to be used to obtain the relative standard error for any estimate is given in the following sections.
5.0 Sampling errors on dwelling and person estimates
Sampling errors depend on the type of estimate concerned.
-
(1) For dwelling estimates the relative standard errors are given by LINE D in
TABLE 1
(2) Sampling errors of person estimates depend on the particular topic of interest. Two groups of topics have been identified:
-
Use LINE A in TABLE 1 if your estimate involves any of the following topics:
Year of arrival in Australia; Birthplace (if overseas); Country of Citizenship
(if overseas); Religion; Languages regularly used; Racial Origin; Period of
Residence.
Use LINE B for all other topics related to persons.
This difference between the relative standard errors for different person estimates arises because some characteristics are generally similar for persons in the same dwelling but differ from persons in different dwellings. That is these characteristics are clustered by dwelling (for example, religion and racial origin). The sampling scheme used involved the inclusion of ALL persons in selected dwellings rather than selection of every second person in a dwelling, hence for characteristics which are clustered by dwelling there is a greater chance that such persons would have been either undersampled or oversampled. Thus estimates of number of persons classified by characteristics which are clustered by dwelling will have somewhat higher relative standard errors.
If an estimate is known to include a large number of persons from non private dwellings where all schedules were processed (for example an estimate of males ten to fifteen years of age in a CD with a large boarding school for boys), then the relative standard error as read from the table will overestimate the true relative standard error.
Example:
Consider an estimate of the number of female university graduates in an LGA.
The relative standard error will be derived from LINE B of TABLE 1. If the
number of female university graduates in the LGA is 50 then reading from this
line, the relative standard error is approximately 12%. The standard error on
the estimate is 50 X 12/100 = 6. Therefore, there are nineteen chances in
twenty that the number of female university graduates in the LGA is in the
range 38 to 62.
6.0 Sampling errors on estimates of proportions and percentages
Proportions and percentages formed from the ratio of two Census estimates are also subject to sampling errors and the size of the error depends on the accuracy of both the numerator and the denominator. The formula for the relative standard error of a proportion is given below.
-
Relative Standard Error (x/y)
= SQRT{[ Relative standard error (x)]2 - [Relative Standard Error (y)]2}
Example:
Consider an estimate of the labour force participation rate for persons born
overseas for a particular LGA. If the number of persons born overseas who are
in the labour force is 100 and the total number of persons born overseas is 160
then the estimated proportion is 100/160 = 0.63. The relative standard errors
for both the numerator and denominator will be derived from TABLE 1 LINE A.
Reading from this table, the relative standard error of the numerator (ie the
number of persons born overseas who are in the labour force) is approximately
13% and the relative standard error of the denominator (ie the number of
persons born overseas) is approximately 11%. The relative standard error
of the estimate of the proportion is therefore
-
SQRT {I3X13 ~11X11} = SQRT [48] = 6.9%
As can be seen from the above formula the relative standard error of a proportion will always be less than the relative standard error of the numerator. However, whenever a proportion is small (ie the denominator is considerably greater than the numerator) it will be reasonable to approximate the relative standard error of the proportion by the relative standard error of the numerator.
For proportions or percentages where the denominator is the total number of males, females or persons in a CD or group of CDs, the relative standard error of the denominator is zero because these totals were derived from the preliminary processing phase. In these cases, the relative standard error of the proportion or percentage is given simply by the relative standard error of the numerator.
Example:
Consider an estimate of the percentage of persons born overseas for a
particular CD. If the number of persons born overseas in the CD is 300 and the
total number of persons in the CD is 1000, then the estimated percentage is
(300/1000) X 100 = 30%. The relative standard error on the denominator is zero
since estimates of total persons in a CD are not subject to sampling error. The
relative standard error on the numerator can be obtained from interpolating
TABLE 1 LINE A. This table gives the relative standard error on the numerator
as approximately 8.1%. Therefore, the relative standard error on the percentage
is also 8.1% and hence the standard error on the estimate of percentage is 8.1
X 30/100 = 2.5 percentage points. Therefore, there are nineteen chances in
twenty that the percentage of persons born overseas in the CD will lie within
the range 25% to 35%.
Relative standard errors for estimates of proportions or percentages may also be determined from TABLE 2 which sets out relative standard errors for selected percentages or proportions.
7.0 Sampling errors on estimates of differences
The relative standard error on differences between two estimates of numbers or between two estimates of proportions (or percentages) can also be derived from the tables of relative standard errors.
For differences between the 1976 Census and the 1971 Census the standard error of the difference will be identical to the standard error of the 1976 estimate alone, since 1971 estimates are not subject to sample error.
Example:
If estimates for the 1971 and 1976 Censuses are 500 dwellings and 800 dwellings
respectively then the difference is estimated as 300 dwellings. The 1971
estimate has no relative standard error whilst the 1976 estimate has a
relative standard error (as read from TABLE1 LINE D) of approximately 3% and
hence a standard error of approximately 3% of 800 or 24. The standard error of
the difference is therefore 24 and there are nineteen chances in twenty that if
all schedules from the 1976 Census had been processed that the observed
difference would be within the range 252 to 348.
For differences between two 1976 Census estimates the standard error of the differences may be approximated by the following formula.
-
Standard Error(x-y) = SQRT {[Standard Error (x)]2 + [Standard Error
(y)]2}
Example:
If the estimates for two LGAs of the total number of occupied dwellings are
1000 and 800 and the number of occupied dwellings with outer walls of brick are
250 and 650 respectively, then the percentage of occupied dwellings with brick
walls in each of these LGAs is (250/1000) X 100 = 25% and (650/800) X 100 =
81.2% respectively. The difference between these estimated percentages is
therefore 56.2%. The calculation of the standard error of this difference
requires the standard error of each of the percentages to be calculated. The
relative standard errors on each of the estimates of numbers (1000, 800, 250
and 650) can be derived from TABLE 1 LINE D and are approximately 3.0, 3.3, 6.0
and 3.7 respectively. Using the formula given in the previous section, the
relative standard errors on each of the percentages are:
-
SQRT {(6.0)2 - (3.0)2} = 5.2% and SQRT
{(3.7)2 - (3.3)2} = 1.7%
-
5.2 X 25/100 = 1.3 and 1.7 X 81.2/100 = 1.4
-
SQRT {(1.3)2 + (1.4)2} = 1.9 percentage points
Estimate
| 2
|
5
|
10
|
15
|
20
|
30
|
40
|
50
|
75
|
100
|
500
|
1000
| |
| A-LINE
|
80
|
53
|
38
|
32
|
28
|
23
|
20
|
18
|
15
|
13
|
6.4
|
4.7
|
| B-LINE
|
62
|
39
|
27
|
22
|
19
|
15
|
13
|
12
|
9.6
|
8.3
|
3.6
|
2.5
|
| D-LINE
|
70
|
44
|
31
|
25
|
22
|
18
|
15
|
14
|
11
|
9.6
|
4.2
|
3
|
Table 2A: Clustered Person Topics (A-LINE)
Percentage
| 15
|
20
|
30
|
45
|
60
|
75
| |
| Denominator
|
||||||
| 50
|
40
|
34
|
26
|
19
|
14
|
10
|
| 100
|
29
|
25
|
19
|
14
|
10
|
7.3
|
| 200
|
21
|
18
|
14
|
10
|
7.5
|
5.3
|
| 500
|
14
|
12
|
9.1
|
6.7
|
5.0
|
3.5
|
| 750
|
11
|
9.8
|
7.6
|
5.5
|
4.1
|
2.9
|
| 1000
|
10
|
8.6
|
6.6
|
4.9
|
3.6
|
2.6
|
| 15
|
20
|
30
|
45
|
60
|
75
| |
| Denominator
|
||||||
| 50
|
29
|
24
|
19
|
13
|
9.9
|
7.0
|
| 100
|
20
|
17
|
13
|
9.4
|
6.9
|
4.9
|
| 200
|
14
|
12
|
9.1
|
6.6
|
4.8
|
3.4
|
| 500
|
8.9
|
7.5
|
5.7
|
4.1
|
3.0
|
2.1
|
| 750
|
7.3
|
6.1
|
4.6
|
3.3
|
2.5
|
1.7
|
| 1000
|
6.3
|
5.2
|
4.0
|
2.9
|
2.1
|
1.5
|
Numerator - Clustered Person Topics (A-LINE)
Denominator - Unclustered Person Topics (B-LINE)
Percentage
| 15
|
20
|
30
|
45
|
60
|
75
| |
| Denominator
|
||||||
| 50
|
42
|
36
|
29
|
24
|
20
|
17
|
| 100
|
31
|
27
|
22
|
17
|
15
|
13
|
| 200
|
23
|
19
|
16
|
13
|
11
|
9.5
|
| 500
|
15
|
13
|
11
|
8.5
|
7.3
|
6.4
|
| 750
|
12
|
11
|
8.8
|
7.1
|
6.1
|
5.3
|
| 1000
|
11
|
9.5
|
7.7
|
6.3
|
5.3
|
4.7
|
Percentage
| 15
|
20
|
30
|
45
|
60
|
75
| |
| Denominator
|
||||||
| 50
|
33
|
28
|
21
|
15
|
11
|
7.9
|
| 100
|
23
|
19
|
15
|
11
|
7.9
|
5.6
|
| 200
|
16
|
14
|
10
|
7.5
|
5.5
|
3.9
|
| 500
|
10
|
8.6
|
6.5
|
4.7
|
3.5
|
2.5
|
| 750
|
8.3
|
7.0
|
5.3
|
3.8
|
2.8
|
2.0
|
| 1000
|
7.2
|
6.0
|
4.6
|
3.3
|
2.4
|
1.7
|