Longitudinal studies allow for the investigation of how individuals change over time. From a statistical perspective, there are two types of questions that form the core of every study of change: (1) How does the outcome change over time? and (2) Can the differences in these changes be predicted? (Singer and Willett, 2003).
These questions are important to my study of the relationship between alcohol and violence.
In order to carry out a multilevel model of change, there are four main characteristics that are necessary:
1. There must be longitudinal data with three or more waves of data;
2. There must be an outcome whose values change systematically over time;
3. There must be an appropriate metric for time;
4. The value of the outcome on any occasion must be equitable over time; that is, the same measure should be used at each time point.
With regard to the appropriate measure of time, in the case of cohort studies in which data-collection has been carried out over equally spaced intervals, an appropriate metric of time may simply be the consecutive numbering of data collection occasions (time-point 1, time-point 2 etc). In the case of studies where there is an unequal period of time between data collection waves, the above is unsuitable, and a preferred measure of time may be the amount of time elapsed
since the beginning of the study, such as the number of days or weeks since the first episode of data-collection.
Collection of the Add Health the data (which I have used in my study) was unequally spaced over time (see also below), with the second wave occurring around 1 year after the first, followed by Wave III and Wave IV at an average of 7 and 13 years later. In addition, the cohort was designed to include individuals within an age range of between 12 and 18 at inception of the study. Furthermore, there was variation within the follow-up schedules between individuals, in some cases of up to a year. The appropriate metric of time therefore was the individual’s age at each data collection point, rather than the dates of data collection/data collection intervals.
When running a random-effects model for longitudinal data in which the data is clustered at the level of individuals, and there are multiple measures of a variable of interest at different time-points, the process is broken down into two stages.
Rather than simply performing a regression of the variable of interest by age over the entire sample, the first stage is to fit individual linear regression models for every individual in the sample. This produces a regression line for every individual, each with its individual intercept and slope. It is the individual intercepts and particularly their slopes which are of most interest, and which can be used as the object of further analysis. The purpose of this stage is to describe within-individual change over time. The equation for this stage is known as the “level-1”
sub-model. In the case of the analysis of a measure of violence, the violence score can be thought of as being plotted on the y-axis, and the age of the individual when the measure was observed on the x-axis. A linear regression line is then fitted through these data points, and this is the person’s individual growth trajectory.
The level-1 submodel is described as follows:
The model assumes a linear relationships that describes each person’s true change Yij
=[ p
0i+ p
1iTIME]+[e
ij]
measurement error or other unobserved factors ( ). and are known as the individual growth parameters, and characterise the hypothesized true trajectory for the ith subject. They are analogous to the population intercept and slope in linear regression, but relate to the individual. The first individual growth parameter is the intercept, the true value of Y when time=0. The second individual growth parameter is , which is the slope of the individual’s growth trajectory; it represents the rate at which the given individual changes over time with respect to the variable of interest. The error term, represents the vertical distance between the observed data and the fitted regression line. The level-1 residual variance , is the net vertical scatter of the observed data around the individual’s linear trajectory.
Fitting these level-1 models on every individual allows for every person to have his or her own trajectory, and hence their own individual growth parameters that describe them (intercepts and slopes). These growth parameters then become the object of analysis in the second stage, the “level-2 submodels”.
The second stage, involves fitting the level-2 submodel. Whereas the level-1 submodel is concerned with analysing change within individuals, the purpose of the level-2 submodel is to analyse differences in change between individuals, by analysing the individual growth parameters collectively as obtained from the level-1 submodel. It is of particular utility in investigating the relationship between predictors and these growth parameters (intercepts and slopes from the level-1 model, for example to test the hypothesis that the baseline and rate of change of violence is greater in those who drink alcohol compared with those who do not) (Singer and Willett pg 8). Statistical modelling of both these levels is known as the
“multilevel model of change”.
The level-2 submodels are in two parts and analyse the individual growth parameters ( and from the level-1 submodel). They are in the form of standard regression equations, but they treat the level-1 growth parameters as outcomes that may be associated with a predictor (such as level of alcohol consumption).
e
ijp
0ip
1ip
0ip
1ie
ijs
e2p
0ip
1iIn these models represents the average of the level-1 intercepts in the population; it is the value of the predictor at baseline (time=0). is the population average change in the level-1 intercepts when there is a 1-unit change in the level-2 predictor. is the population average of the level-1 slopes, for those with a level-2 predictor of 0. is the population average change in level-1 slope when there is a 1-unit change in the level-2 predictor. and are the error terms and represent deviation between individual growth parameters and their respective population averages. The first of the level-2 equations is concerned with modelling the individual’s intercept. It states that the true baseline (intercept) of the outcome for person i is equal to the population average baseline (intercept) plus the product of the value of the predictor and the difference in baseline for a 1-unit increase in the predictor, plus the amount of the outcome that is not explained by the predictor. The second of the level-2 equations is concerned with modelling the individual’s slope. It states that the true rate of change (slope) of the outcome for person i is equal to the population average rate of change, plus the product of the value of the predictor and the difference in rate of change per unit increase in the predictor, plus the amount of the slope that is not explained by the predictor.
The level-1 and level-2 model can be represented equivalently as a composite model by combining and rearranging the above formulae.
In the case of modelling a continuous measure of violence over time, in which alcohol consumption is the predictor, and the subject’s age is used as the measure
p
0i= g
00+ g
01PREDICTORi+ z
0ip
1i= g
10+ g
11PREDICTORi+ z
1ig
00g
01g
10p
1ig
11z
0iz
1iYij
=[ g
00+ g
10TIMEij+ g
01PREDICTORi+ g
11(PREDICTOR
i´TIME
ij)]
+[ z
oi+ z
1iTIMEij+ e
ij]
of time, this model therefore states the following: the amount of violence for person i at occasion j is equal to the average population intercept (average population baseline level of violence when alcohol consumption is zero), plus the individual’s level of alcohol consumption multiplied by the population average change in the intercept per unit of alcohol consumption, plus the population average slope multiplied by the product of their age and the level of alcohol consumption, plus individual i’s difference in intercept from the population average, and the individual’s difference in slope multiplied by a product of his/her age and the level of alcohol consumption, plus the total amount of violence that is unobserved and not predicted by his or her age.
In addition, the random-effects model allows one, for a given outcome, to investigate the specific amount of variation within individuals versus that which is between individuals. This is measured using the intraclass correlation coefficient (ICC), which is the ratio of the between individual variance to the total variance, thus if all of the variation was within individuals, the ICC would be equal to one; if all of the variation was between individuals, i.e. there is no evidence of clustering, then the ICC would be equal to zero.
In contrast to fixed-effects models (see below), random-effects models sometimes referred to as mixed models or multi-level models, allow for the changes both within and between individuals to be analysed.