Redes bayesianas
3.2 Definici´ on formal de red bayesiana
This chapter is devoted to outlining a number of aspects relating to data used to assess the research question of why young Māori women continue to have lower employment rates than young Pākehā women. It is broken into three main sections. The first provides a formal definition of the research subjects that are under examination, as well as the process used to divide young women into ethnic groups. The second section outlines the dataset used for the analysis and the advantages of using that particular dataset. The final, and most substantial section, establishes the factors and variables used to explain ethnic differences in employment rates. A significant proportion of the variables used in the forthcoming analysis have been included on the basis of results provided in previous female labour supply research summarised in the previous chapter.
Research Subjects
The term „young women‟ has been used frequently prior to this point, but at this time a formal statement defining the term is required. In particular, it refers to women aged between 15 and 24 (inclusive) at the time the data was collected.
As the crux of the forthcoming analysis relates to ethnic differences in employment rates of young women there is a need to clarify a number of points relating to the ethnicity of subjects. The definition for „ethnicity‟ used in this paper follows one of self identification applied by Statistics New Zealand. Namely “the ethnic group or groups that people identify with or feel they belong to. Ethnicity is a measure of cultural affiliation, as opposed to race, ancestry, nationality or citizenship. Ethnicity is self- perceived and people can belong to more than one ethnic group” (www.stats.govt.nz).12
The latter point regarding multiple ethnicities plays an instrumental role in how young women are allocated into various ethnic groups. While individuals can belong to more than one ethnicity, there is a need to measure ethnicity in a manner in which comparative analysis can take place. I choose to use the same prioritisation process that
12
Further information can be found at,
has previously been employed by government agencies and other New Zealand researchers in order to provide consistency.13
The Venn diagram in Figure 4.1 can assist in understanding this three stage categorisation process. Firstly, a young woman is classified as Māori if she states she is of Māori ethnicity, and this can be in combination with any other ethnicity, including New Zealand European/Pākehā. Therefore, of the 18.4% of young women categorised as Māori, roughly 51% have also identified themselves as belonging to at least one other ethnic group.
Figure 4.1:
Categorisation of Ethnicity for Young Women
Source: New Zealand Census of Population and Dwellings 2001
On the other hand, a young woman is classified as New Zealand European/Pākehā if she reports it as her only ethnicity.14 This requirements means that none of the 59.9% of young women categorised as being of New Zealand European/Pākehā ethnicity have identified themselves as belonging to another ethnic group. All individuals that report New Zealand European/Pākehā ethnicity in combination with any other ethnicity except Māori, are excluded from the New Zealand European/Pākehā group, and are instead classified as belonging to other ethnic groups. In chapters to follow, comparisons are made between young women categorised as Māori and young women categorised as being New Zealand European/Pākehā, with other ethnic groups being excluded from the analysis.
13
Further discussion relating to the ethnic prioritisation method can be found in Allan (2001).
14 The term „New Zealand European‟ generally refers to New Zealand residents of European ancestry NZ European (59.9%) All Other Ethnicities (21.7%) Māori (18.4%)
From here on I use the term Pākehā to refer to New Zealand Europeans in an attempt to avoid confusion with European immigrants. As roughly 18% of young women self identify themselves as being Māori, it is imperative that a suitable data source is chosen to adequately represent this ethnic minority. The following section will discuss in further detail the source of information chosen for this research topic.
Data Source
Data is drawn from unit records of the 2001 New Zealand Census of Population and Dwellings undertaken on Tuesday the 6th of March. Although the Census of Population and Dwellings (hereafter know as the census) is quinquennial in nature, access to unit records for the 2006 count was not available at the time of analysis.
The census, covering all individuals living in private and public dwellings, provides a range of information on individuals and the household in which they live.15 While the decision not to incorporate qualitative methods into this study may result in the loss of certain detailed information, which is usually gathered through a small focused sample, the trade-off is worthwhile for two reasons. Firstly, the use of a readily available source encourages replicatability and testing by other researchers – an essential feature of scientific enquiry. Secondly, resource and time constraints faced by this research makes quantitative research an attractive option. As the data has already been collected and prepared, and furthermore obtaining access to responses is relatively inexpensive, the use of the census is extremely cost-effective.
As the census surveys the entire population this provides an extremely large number of observations to work with. Even after removing individuals who were either living in a non-private dwelling or who were not at home on census night, the dataset still contains 178,776 young Māori and Pākehā women aged between 15 and 24 who were living in their usual residence on census night. Of these, 42,057 or 24% self-define as Māori.
Not only does the census provide the largest coverage size for sub-populations, the richness of unit record data allows for more in-depth analysis through the creation and modification of derived variables. In relation to the current research topic, records for all usual members that live in a household containing a young woman were used to
create a comprehensive array of variables, which represent in considerably more detail the composition and structure of households than has often been accomplished in past female labour supply studies. As a result the development of a more comprehensive and holistic dataset has been achieved, and it is in the next section in which each variable is described in detail.
Access to unit record data used in this study was granted and provided by Statistics New Zealand through their on-site data lab facility, with the cost-recovery fee being covered by Motu Economic and Public Policy Research. The secured environment is designed to give effect to the confidentiality provisions of the Statistics Act 1975. All results using census data have been subject to base three random rounding in accordance with Statistics New Zealand‟s release policy for census data.
Dataset Variables
Based on the combined individual and household records, Table A.1 in Appendix 4 summarises the variables used as arguments. Furthermore, the last two columns show either the distribution within each categorical variables for Pākehā and Māori women, or the mean and standard deviation for the continuous variables. These variables are broken down into a number of groups: the dependent variables, age, individual characteristics, household composition, socioeconomic status, education status, unpaid work activities and geography.
Dependent Variables
The dependent variable in the binomial Model (I) outlined in the next chapter, empstat, defines whether the woman is employed or not. It takes a value of one if she works for one or more hours per week for pay, and takes a value of zero if she is either unemployed or out of the labour force. When individuals are combined together the overall employment rate is defined as the proportion of young women who are employed. That is,
)
(E U N
E Rate
Employment , (4.1)
where E is the number of young women employed, U is the number of young women unemployed, and N is the number of young women not in the labour force.
Other studies have typically used labour force participation as the indicator of labour force engagement, whereby,
) ( ) ( N U E U E Rate ion Participat Force Labour . (4.2)
However, I argue that due to deficiencies in the census relating to the classification of unemployed individuals, employment status is more appropriate to use. For more discussion surrounding these deficiencies see Appendix 5.
In Model (II) and (III), as outline in the following chapter, the dependent variable is modified slightly to better represent the pattern shown in Figures 2.8, in which young women also have an additional option of studying. Essentially young women allocate themselves into one of four options which combine employment and education. Therefore the multinomial dependent variable, actstat, equals one if the young woman is employed but does not study, two if the young woman is employed and studying, three if they are studying but not employed, and four if they are not employed and not studying.16
Precisely how the dependent variables are utilised in the analysis will become clear in the following chapter. In the meantime, the variables used as arguments for understanding the difference in employment rates of young Pākehā and Māori women are summarised and discussed in the following section.
The sections to follow summarise the variables used as arguments in explaining the variation in employment rates between young Pākehā and Māori women. Following the description of each variable, comment will be made regarding the expected direction of the correlation with employment status. For clarity, each of the independent variables are allocated into one of eight categories; ethnicity, age, individual characteristics,
16 The education participation rates calculated from responses to the census understates the true level of
education involvement by young women. This understatement will also affect the category distribution within the activity status variable (actstat). That is, some individuals classified as being employed may actually be studying and employed, while some inactive individuals may actually be studying only. See Appendix 6 for further information. Stillman (2006) provides an overview of the methodological differences in calculating inactivity rates across a variety of data sources. He concludes that the Household Economic Survey (HES) measures more accurately inactivity rates over time than any other dataset. Unfortunately, the limited sample size, and restrictions in household responsibilities and
household composition, socioeconomic status, education status, unpaid work activities and geography.
Ethnicity
As the centre piece of this thesis relates to examining ethnic differences in employment rates, the variable, maori, is created to indicate if a young woman is of Māori ethnicity or not. It takes a value of one if a young woman is Māori and equals zero if she is not. As my dataset only contains young Māori and Pākehā women, as mentioned previously, a value of zero therefore refers to those young women who are classified as being Pākehā.
Age
The first group of independent (control) variables used as arguments for the difference in employment rates of young Pākehā and Māori women are age controls. As prior labour supply research has concluded age is fundamentally important in the relationship with employment, a series of age variables are included here as arguments.
Age is broken down into ten dummy variables (age15, age16, age17 etc) with each variable representing each year of age between 15 and 24. When age15 is used as the base for the age variable, the expectation is the rate of employment will increase at each year of age above 15. Not only does a higher age reflect a general movement away from compulsory education into other options such as employment, but a higher age also reflects greater market wages for two distinct reasons.
Firstly, one major component of an individual‟s market wage is a result of experience. Experience boosts productivity and therefore individuals with greater experience can demand higher wages. Since experience is unavailable in the given dataset, age is instead used as a proxy for experience, and hence we would expect older individuals to have a higher market wage.
Secondly, young women in the 15-24 age bracket face different minimum wage regulations depending on their age. The day before census night, new minimum wage rates came into effect so the adult and youth minimum wages were raised to $7.70 and $5.40 per hour respectively. Furthermore, the age at which individuals qualified for the adult minimum wage was lowered from 20 to 18 years. Again we would expect young
women above the adult minimum wage threshold will have higher market wages than young woman below the threshold. Overall the employment rate is expected to rise as young women get older, ceteris paribus.
Individual Characteristics
Previous labour supply studies have typically found a strong relationship between education and employment when an individual‟s highest qualification is used to represent their ability in the labour market. However, a problem arises from the inclusion of a similar variable in this research topic due to the age bracket under question. Many younger women may not have completed all of their studies when the census was conducted, and so highest qualification would not adequately summarise their ability. For example, if highest qualification was used as the predictor variable then an educationally able 16 years old, who would only have had time to complete year 11, would be equal in ability to a 23 year old that dropped out of school at 16 with only year 11 qualification.
To counter this issue, school_edu is instead used to represent whether a young woman has at least year 11 qualification regardless of age. This allows every young woman a chance to complete the first stage of formal qualifications and hence provide a better indication of her ability. As with experience, greater education sends an indicator to employers on an individual‟s ability and productivity. Given this, people holding year 11 qualifications would expect to receive a higher market wage than those who have not completed year 11, to compensate for higher productivity. Subsequently, higher wages for more educated women will push the market wage above the reservation wage from some women, and consequently increase the number of women in employment and the overall employment rate.
Two variables attempt to identify whether young women have an attachment to Māori culture. The dummy variable iwi indicates whether young women identified as belonging to any of New Zealand‟s iwi, while mancestry denotes whether young women identified as being a descendant from a Māori. Recalling that the definition used to identify Māori is based on self-perceived cultural affiliation, individuals who stated their iwi, or who have Māori ancestry, may be more likely to be involved in cultural activities, events, and ceremonies. Furthermore, if this involvement is regular
and time consuming then the prospects of young women finding suitable employment that fits around these commitments might be diminished.
The final individual variable included as an argument refers to whether a young woman was born in New Zealand, or was born overseas and immigrated (immigrant). Immigrants who experience cultural or language difficulties may find their competiveness in the New Zealand labour market diminished. Language difficulties in particular, are likely to reduce an individual‟s market wage as their productivity is lower than an equally skilled New Zealand born worker. It is predicted that, all other things equal, immigrants are less likely than New Zealand born individuals to work.
Some variables described above – age, school_edu, and immigrant - indicate the likely influence on the probability of employment via the influence on the expected wage and employability. However, an individual‟s labour supply may also be affected by the household composition and structure in which she lives.
Household Composition
The use of unit record data from the census has allowed a comprehensive array of variables to be created to describe the composition of a young woman‟s household. These represent in considerably more detail the composition and structure of households than has usually been the case in past female labour supply studies. Previous researchers have identified the individual or her family nucleus as the influential entity in determining labour supply. This thesis hypothesises however, that because many households contain multiple family nucleuses that may be related,17 it is the household composition and the characteristics of the entire household that matter, not just the young woman‟s family nucleus.
One of the primary features making the household unit relevant is the number of children present. As we have seen in the previous chapter, children are frequently related to a reduced likelihood of being engaged in the labour force. Therefore in this thesis children are expected to be negatively correlated with employment for two main reasons. Firstly, it may reflect the desire by young women to care for, and spend time with, their own (and other people‟s) children. That is, the opportunity cost of foregoing
this time is high as young women place great value on childcare responsibilities. Secondly, if a young mother were to enter paid employment she would likely need to find suitable childcare. Without access to free childcare, wage compensation for working mothers needs to be high enough to cover the cost associated with childcare. In many cases, the monetary benefit from working may not adequately cover the associated cost of childcare, therefore reducing the likelihood of young women seeking employment.
Unit record data allows certain household attributes, including children, to be broken down into a set of dummy and continuous variables. While many prior studies have simply included a continuous variable representing the number of children that a woman has, this assumes (with little substance) that the decline in the probability of employment is linear in relation to the number of children. This study recognises the non-linear effect of children by including a dummy variable for presence of at least one child, and a continuous variable for number of children beyond one. The breakdown into two separate variables relaxes the above assumption by presuming that the drop in the probability of employment is greatest for the first child, and that each subsequent child after the first has a smaller but linear effect on the probability of employment.
Figure 4.2 below shows the employment rate of young women living in households containing members with specific characteristics, and provides some empirical evidence for the process stated above. It illustrates that the employment rate for young women who have one child aged under five is 32 percentage points lower than young women who do not have any young children. However, the drop in the employment rate with each successive child after the first one is practically linear, therefore providing support for the use of both dummy and continuous variables.
Although a prefect linear trend from the first household member onwards is not present in the other characteristic, there still seems to be a general oscillation around a trend line drawn from the first household member to the last for most of the characteristics. One exception is the dramatic drop in the employment rate of young women who have four