Census of Population and Housing 1986: Socio-Economic Status Indicator File

What is the C86 SES Indicator File?

Record structure and data code groups

Download an .rtf version of documents related to the SES Indicator File

What is the C86 SES Indicator File?

The Socio-economic Indicator Files produced from the 1986 Census were constructed for the Commonwealth Schools Commission as a measure of disadvantage in education. They are part of a series of such files constructed from the 1971, 1976 and 1981 Censuses.

These SES files provide a measure of disadvantage presented at the Statistical Local Area and Collection District level. For each SLA and CD, there is an index variable accompanied by variables ranking the CD or SLA within Australia and within State. Other variables include population of the CD or SLA, and additional geographical variables. The following document which discusses the methodology and interpretation of the index, is drawn from "Construction of an Indicator of Socio-economic Status" (Information Paper No. 15, Statistical Methods Section, Australian Bureau of Statistics, July 1986).

Construction of an Indicator of Socio-economic status

Background to the development of an indicator

In 1972, the newly elected Whitlam government appointed an interim committee to the Commonwealth Schools Commission, which was headed by Peter Karmel, and whose brief it was to report on all schools in Australia - on their-financial needs, on priorities within those needs, and on the measures required to meet them. The report of that committee (the so-called "Karmel Report") included recommendations for the funding of a "Disadvantaged Schools Programme" - which was to be a programme aimed at positive discrimination in favour of schools in areas of concentrated need.

The Disadvantaged Schools Programme (DSP) is concerned with areas of concentrated disadvantage. The basic premise from which it operates is that,

"More equal outcomes from schooling require unequal treatment of children".

The social and economic, home and neighbourhood environment of students is seen to be a major determinant of their capacity to benefit from education. Accordingly, the programme attempts to direct additional funds to socioeconomically deprived areas.

This allocation of resources on the basis of need (as opposed to, for example, a simple per capita allocation), is representative of a general trend towards needs-based resource allocation. Thus the type of analysis used for the DSP which aims to identify geographical areas of need has broader applications than the specific one of the DSP.

The Schools Commission required a measure of socio-economic status of schools to facilitate funding and enrolment allocations (ie dollars and places in the programme) - both between States and between the government and the nongovernment systems within each State. It was this initial division of resources which prompted the development of an indicator of socio-economic status.

Choice of data source and the unit of analysis

Given that socio-economic influences are the key focus of the programme, the identification of disadvantaged schools would ideally be achieved by assessing schools in terms of the socio-economic status (SES) of the neighbourhoods from which schools draw their students. However, data relating to the SES of school catchment areas is not immediately available. Whilst it could be obtained from schools, such a process is particularly susceptibie to "cheating" as it soon becomes clear to a school just what sort of questionnaire responses earn them a "disadvantaged" designation, also there are a number of aspects of SES which cannot be readily and reliably assessed by school-based surveys, due to their sensitive nature (eg income levels, single-parent family status).

An alternative to school-based data collection is the use of census data which allows for objective comparisons between disparate geographical locations. The chief disadvantages of census data are those of the nature of the smallest geographical unit for census data, and lack of timeliness. These are discussed below.

The finest level at which Census data is disseminated is the Census Collection District ("CD"), which corresponds to the workload of one census collector. In urban areas it typically comprises 200-300 dwellings, while in rural areas it usually contains less. There are about 27,000 CD's throughout Australia. The problem with CD's for the purpose of determining the SES of neighbourhoods for the DSP is that they bear little relationship to school catchment areas.

The disadvantage of census data concerning timeliness is that censuses are only conducted at five-yearly intervals, and they take a considerable time to process. Consequently information which is very out-of-date (by, for instance, seven years or more) may be used for determining SES and hence funding levels.

On balance, however, these disadvantages are outweighed by the objectivity, comparability and reliability offered by census data.

Indicator construction

In order to allocate resources between school systems, a unitary measure was required which discriminated between neighbourhoods on the basis of the SES of their residents. The census provides a fairly broad cross-section of data pertaining to SES. A multi-variate technique was required which would identify the common dimension of SES underlying suitably selected data items. Principal component analysis was selected as a method which makes no distributional assumptions, and which would provide a dimension common to all of the input variables. Principal component analysis essentially re-specifies a data set in terms of-independent underlying dimensions or 'components'. The components identified are linear combinations of the original variables. Each component is, in turn, the best linear summary of the variance left in the data, after accounting for the previous components. Thus the first principal component maximises the variance explained in a common dimension and therefore discriminates optimally between areas.

Principal component analysis was applied to a selection of variables, all of which were considered to be related to SES, and which could be obtained at the CD level. Variables pertaining to family income, educational attainment, unemployment, occupation, marital status, household crowding, residential mobility, Aboriginality and migrant status were included in the analysis. The specific variables chosen were selected according to the following criteria:

  • they were specifically suggested by the initial guidelines for the programme;
  • they were socio-economic variables which had been shown by educational research to be correlated with educational attainment;

  • they displayed face validity;

and, for revisions of the SES indicator,

  • they were included in, or were well correlated with, an earlier indicator. (This criterion assumes the previous indicator to have had construct validity.)
All the variables used in the analysis were expressed as ratios or percentages of the relevant sub-population (eg persons aged 15 years or more, males in the labour force, etc). The variables were standardised to have zero mean and unit variance prior to analysis, to give equal prominence to all variables in indicator construction.

So far, for each revision of the indicator, the variable loadings found for the first component, and the pattern of correlations between the variables and the first component, have indicated that the first principal component was identifying a clear socio-economic dimension. The indicator score for a CD is given by the linear combination of standardised variables, each weighted by the coefficient specified by the first principal component. Negative coefficients correspond to disadvantaging effects, so a low score on the indicator suggests that the CD is socioeconomically disadvantaged. The signs of the weights for variables have been consistent with subjective expectations as were the relative magnitudes of coefficients. The first principal component derived from the analysis has been adopted by the Schools Commission as an indicator of socioeconomic status.

Indicator scores are calculated for all CD's throughout Australia. Scores are standardised to have a mean of 100 and a standard deviation of 10. These values were chosen for ease of interpretation - to yield scores which were all positive, and which spanned a fairly large range. All scores fall between 0 and 150.

Socio-economic status indicators have been produced for the Schools Commission in the above way using 1986 Census data, and in a similar way using 1971, 1976 and 1981 Census data, except that scores were standardised to have a mean of 50 and a standard deviation of 8. Details of the first two indices and how they were used by the Schools Commission for the allocation of funds are to be found in Linacre et al. While the original aim of the indicator was to construct a measure likely to be highly correlated with educational disadvantage, the indicator may be useful in assessing many other aspects of social and economic need. However, it should be recognised that only those aspects of social and economic status covered by the variables included in analysis can be reflected in the indicator. Any application of the indicator should ensure that all aspects of socioeconomic status of importance to the application have been represented.

The 1986 Indicator of Socio-economic Status

The 1986 indicator of socio-economic status was derived by applying principal component analysis to a total of forty four ratio variables. The first principal component was again determined to be an indicator of socio-economic status, and accounted for 17.8% of the total variance.

Indicator coefficients (the weights assigned to each of the variables to form the index) are listed in Table 1, together with the correlations between variables and the indicator. (Indicator coefficients have been standardised so that the resulting scores exhibit a standard deviation of 8.) CD scores are found by multiplying variables by their respective coefficients, summing these, and adding 50. High values of the indicator correspond to high socio-economic status, and low values to low status.

Variables which weighted positively include:

  • the higher qualification variables;
  • the professional, administrative and clerical occupation variables;
  • the high income variable;
  • the better living standards variables such as house size and the provision of bedrooms.
Variables which weighted negatively include:

  • poor education variables;
  • the incidence of cultural difference variables, for example, single parent families, divorced persons;
  • the low income and low status occupation variables;
  • the generally poorer living standards variables, such as no motor vehicles and high unemployment.
CDs with high indicator scores have been identified as those areas which are conducive to educational advantage.

It should be recognised that, as a result of the manner in which the indicator was constructed, the indicator is not an "interval measure", but an "ordinal measure". That is, the order relationships between scores are meaningful (a lower indicator score indicates greater disadvantage, and vice versa), but other arithmetic relationships between score values are not meaningful. For example, a given score difference at one point is not equivalent to the same score difference at another point. For example the disadvantage differential between 40.0 and 45.0 cannot be equated with the distance between 45.0 and 50.0. Similarly, a score of 30.0 does not indicate twice the degree of disadvantage which corresponds to a score of 40.0 (relative to the Australian mean score of 50.0). A given decrease (increase) in the indicator score cannot be translated into a number which measures the increase (decrease) in disadvantage.

Table 1: 1986 Indicator of Socio-economic Status: Indicator Coefficients and Correlations of Variables with the Indicator (Indicator Coefficients are applied to standardised variables)

								Correlation
								of variable
								with the	Indicator
Variable							indicator	coefficient

1 INCOME

(1) % family income < $15,000					-0.705		-1.052
(2) % family income > $40,000					0.739		1.103

2 EDUCATION

(3) % degree or higher						0.630		0.941
(4) % trade or other qualifications				0.468		0.698
(5) % no qualifications						-0.741		-1.106
(6) % who left school < 15 years old
	or did not attend school				-0.669		-0.998
(7) % who never attended school					-0.376		-0.562
(8) % who are students						0.473		0.706

3 OCCUPATION

(9) % males in professional occupations				0.661		0.987
(10) % females in professional occupations			0.485		0.724
(11) % males in administrative occupations			0.009		0.014
(12) % females in administrative occupations			-0.180		-0.269
(13) % males in clerical occupations				0.292		0.437
(14) % females in clerical occupations				0.385		0.575
(15) % male sales workers					0.349		0.522
(16) % female sales workers					0.042		0.063
(17) % labourers						-0.706		-1.054
(18) % male tradesman						-0.090		-0.135
(19) % female tradesmen						-0.157		-0.235

4 WEALTH

(20) % households owning dwelling				0.037		0.055
(21) % households purchasing dwelling				0.512		0.765
(22) % dwellings with no motor vehicles				-0.405		-0.605
(23) % dwellings with < 1 bedroom				-0.209		-0.312
(24) % dwellings with > 4 bedrooms				0.310		0.463
(25) average number of bedrooms per person in CD		0.277		0.413

5 POWER/PRESTIGE

(26) % families with head and dependents only		 	-0.259		-0.386
(27) % persons separated or divorced 				-0.248		-0.370
(28) % households with > 7 occupants 				-0.386		-0.576
(29) % households with > 2 families			 	-0.324		-0.484
(30) % unemployed						-0.600		-0.895
(31) % Aboriginal or Torres Strait Islander			-0.425		-0.634
(32) % recent migrants from non-English speaking countries	-0.070		-0.104
(33) % persons born in Southern Europe				-0.074		-0.111
(34) % persons born in Vietnam					-0.239		-0.357
(35) % migrants lacking fluency in English			-0.165		-0.247
(36) % households renting dwelling				-0.399		-0.596
(37) % households renting dwelling - housing authority		-0.369		-0.551
(38) % improvised /mobile homes or dwellings			-0.302		-0.451
attached to non-dwellings
(39) % persons with same residence in 1985 as in 1986		0.104		0.155
(40) % persons with same residence in 1980 as in 1986		0.023		0.034
Interpretation of the Indicator - comments and caveats

The indicator construction process appears to be quite robust: as the input variables are largely highly intercorrelated, the inclusion or exclusion of a few variables appears to affect the indicator only slightly (as evidenced by the resulting ranking of CDs).

The variables measure characteristics of collections of persons and dwellings within CDs. The degree of heterogeneity within a CD therefore influences the indicator score of that CD, so that the most homogeneously disadvantaged and advantaged CDs tend towards the extreme indicator scores.

There are a number of features of the census data used to construct the indicator which render the indicator less than ideal; users should be fully aware of these:

(i) The variables included in analysis are limited to those for which data is collected by the census. Ideally, an indicator of socio-economic status should include all measures of relevance to SES. However, the census does not obtain any information relating, for example, to inherited wealth and property values. The indicator cannot therefore purport to reflect these facets of SES.

(ii) The limitations of data-availability at the CD level place further constraints on the potential variables which can be included in analysis: the data disseminated at CD level is less detailed and includes fewer cross-classifications than that available for larger geographical areas such as States.

(iii) Missing.data is a further impediment to indicator construction. Non-response to individual census items is overall quite low (well below 10% across Australia for most variables), but it varies between CDs, and it is likely that item non-response rates correlate directly with low SES. Non-response for each variable has been dealt with by 'redefining' the population associated with the variable to include only those persons who answered the relevant questions. This approach implicitly assumes that non-respondents resemble respondents with respect to the characteristics measured by the ratio variables. If indeed the incidence of non-response is greater for persons of low SES, heterogeneous CDs would exhibit inappropriately high indicator scores.

(iv) All variables pertaining to families and dwellings (c.f. persons) derive data from "private" dwellings only. Persons in non-private dwellings (eg caravans, motels, boarding houses, hospitals, refuges) are therefore "under-represented".

(v) The census does not aggregate data to CD's according to where people live, but according to where they are located at the time of the census. Clearly, the indicator requires usual residence data. While this affected all States, Queensland and the Northern Territory appear to have been particularly adversely affected; census night fell during a school holiday period for Queensland in 1981.

Aggregate indicator scores

CD indicator scores have been aggregated to the larger geographical units of Statistical Local Areas (SLA's) and Postcodes. The aggregate score was computed as a weighted average of the scores of constituent CD's, where the weight used was the CD population count.

Aggregate scores should be used with caution however, as larger geographical areas tend to encompass greater hetereogeneity with respect to the socio-economic characteristics of residents than do their constituent CD's. To assist with the interpretation of aggregate scores, some additional information can be provided at the aggregate level, namely:

(i) the number of constituent CD's ('NO-CDS')

(ii) the standard deviation of CD scores within the aggregate unit ('STD-DEV')

(iii) the highest and lowest indicator scores for a CD within the aggregate unit ('MAX-CD' and 'MIN-CD' respectively)

(iv) the total number of persons in the aggregate unit ('SLA-POPN' ).

Some postcodes legitimately cross State boundaries. (Small, typically rural, border areas may be serviced by post offices which are nearby but interstate.) As users of the indicator at the postcode level are expected to wish to observe State boundaries, postcode indicator scores have been computed at the State by postcode level. That is, postcodes which cross State boundaries appear as separate entries, corresponding to the division between States. only three States and five postcodes are affected, these are detailed in Table 2.

Table 2: Postcodes which cross State boundaries: population by constituent State

Postcode		    Population
		ACT		NSW		Vic
2540		787		32,549
2600		7,750		423
2620		312		21,092
3644				556		5,717
3691				155		5,837
Scores for alternative aggregations of CD's (ie aggregations other than SLA's and postcodes) can be computed as follows:

LetXij = indicator score for the ith CD of the jth region
Xj = indicator score the jth region
Nij = population of the ith CD in the jth region
Nj = population of the jth region
Then the indicator score for the jth region is

xi[Sigma]i in region jNij Xij

Nj

References

Linacre, S., Karmel, R., McBurney, P., McEwin, M. "Derivation of a Socio-Economic Index from the 1976 Population Census and Use of this Index as a Measure of Disadvantage" In-house ABS Report, Statistical Methods Section, July 1980.

"The 1981 Indicator of Socio-Economic Status" Internal file note, file 83/1726.

Ross, K. N., "The Development of the 1984 'Indicator of Disadvantage' and its Application to Resource Allocation Decisions for the Disadvantaged Schools Program in Australia." School of Education, Deakin University, July 1984.

Dimensions of the data set

The 1986 SES Indicator data set consists of one file at the Statistical Local Area level for the whole of Australia, and a separate file at the Collection District level for each State and Territory. There are 1332 logical records in the SLA file, one for each Statistical Local Area. For the CD level files, the number of records is as follows:

ACT 395NSW 9975Western Australia 2717
Victoria 7259Tas 954South Australia 2633
NT 283Queensland 5285

CODEBOOK

Record Structure

Each logical record in the SES files is either a Statistical Local Area or a Collection District. There are separate record structures for the SLA and CD files as follows:

Record structure - SLA level file
Variable NameVariable Description
STATEState
SLA-CODECode of Statistical Local Area
SLA-POPNPopulation of SLA
INDEXIndex
NO-CDSNumber of Collection Districts
MIN-CDMinimum CD Index
MAX-CDMaximum CD Index
STD-DEVStandard Deviation of CD Indices
A-RANKRank within Australia
S RANKRank within State

Record structure - CD level file
Variable NameVariable Description
STATEState
CD-CODECollection District Code
CD-POPNPopulation of CD
INDEXIndex
POSTPostcode
SLA-CODECode of Statistical Local Area
LOCALITYCode of Locality
A-RANKRank within Australia
S-RANKRank within State

Data Code Groups

STATE: The State codes are as follows:

  1. New South Wales
  2. Victoria
  3. Queensland
  4. South Australia
  5. Western Australia
  6. Tasmania
  7. Northern Territory
  8. Australian Capital Territory

SLA-CODE: Statistical local areas (SLAs) are for the most part local government area (i.e. legal LGA) based. In special cases, where a legal LGA is much larger and more populous than the general run of legal LGAs (as is the City of Brisbane), or where there are no legal local government authorities (as in the Australian Capital Territory), the administrative areas have been subdivided to form areas roughly equivalent in extent and population. SLAs cover, in aggregate, the whole of Australia without gaps or overlaps. The 1981 Census equivalent was the Census LGA.

They are identified by unique four digit numeric codes within a State/ Territory which have the following features:

(a) within each State/ Territory SLA codes are in the range 0001-9990, excluding those ending with 99. Codes ending with 99 and those within the range 9991-9999 have been reserved for special purposes;

(b) the arrangement of SLA codes within each State/Territory is in ascending numerical order for alphabetically listed SLAs. Gaps have been provided between the codes to provide space for future expansion or change;

(c) the fourth, i.e. last, digit of the SLA code is used as an indicator for the following characteristics:

0 indicates that the SLA equates with a legal LGA;
1-8 indicates that the SLA is a part legal LGA and gives each part a number; and
9 indicates that the SLA represents either, an unincorporated area, an off-shore/ migratory category or an undefined category.

Since SLA codes are unique only within State or Territory, for unique Australia wide identification State/Territory code and SLA code are needed.

A complete list of codes is contained in the ABS publication Australian Standard Geographical Classification Geographic Code List (Edition 2) (2188.0).

CD-CODE: The CD is the smallest geographical area used in the collection and dissemination of census data. Each represented by a two digit numeric code allocated sequentially, from 01, within each census subdivision.

POST: Each CD has been allocated the appropriate four digit Australia Post postcode. CDs split by postcode boundaries have been allocated to the postcode area in which the majority of dwellings were counted.

For rural CDs which cover more than one postcode area, the postcode allocated is from that postcode area which is assessed by inspection of census field maps and aerial photographs to have the largest population.

All migratory CDs have been allocated postcode 0000.

A complete list of codes is contained in the ABS publication Australian Standard Geographical Classification Geographic Code List (Edition 2) (2188.0).

LOCALITY: Urban centres/(rural) localities consist of a single whole CD, or adjoining whole CDs, with urban characteristics. In the case of (rural) localities they represent population clusters of between 200 and 999 people; urban centres are population clusters of 1,000 or more people (including some known holiday resorts of smaller size). Because of their nature they can straddle SLA, legal LGA, statistical subdivision and other ASGC geographic area boundaries.

They are represented by a five digit numeric code which uniquely identifies each urban centre/(rural) locality.

A complete list of codes is contained in the ABS publication Australian Standard Geographical Classification Geographic Code List (Edition 2) (2188.0).

 

General Enquiries: ada@anu.edu.au
Web Enquiries: webmaster.ada@anu.edu.au