EMPRESA DE MANTENIMIENTO DE MOTORES
2.3. ENSAYOS DE ELECTROCOAGULACIÓN Y FLOTACIÓN
For decades, researchers have worked to answer a question obviously important to policymakers: What factors are associated with teacher turnover? More recently, Donaldson and colleagues examined factors associated with the turnover of Teach For America teachers in particular. Unsurprisingly, they found that corps members placed in more difficult teaching positions—teaching multiple grades in elementary school, multiple subjects in secondary school, or a subject out of their field of expertise—were more likely to leave the profession (Donaldson & Johnson, 2010). Teachers of special education, however, were less likely to leave. Donaldson and colleagues also identified several demographic characteristics related to corps member turnover. Female corps members remained teachers longer than male corps members, black and Hispanic corps members remained longer than white corps members, and corps members older than 25 remained longer than younger corps members (Donaldson, 2012; Irizarry & Donaldson, 2012). Finally, retention was associated with “initial intentions” (2011). Corps members who deferred graduate school before starting TFA left teaching sooner than average, whereas corps members with a minor, major, or coursework in education remained teachers longer than average.
In this chapter I extend the work of Donaldson and colleagues. I examine factors that predict corps members’ exit from the teaching profession. First, I describe my analytical strategy in detail. Second, I use the Kaplan-Meier method to examine differences across a single factor (Kaplan & Meier, 1958). For example, how does attrition differ among district and charter school teachers? Third, I use logistic regression to simultaneously examine multiple predictors of attrition.
The data for this analysis are described in detail in chapter 3. Data are organized in the long form, with a single observation per person-year. Initially, each observation has an ordinal
year variable coded “1” for first year, “2” for second year, and so on. Each observation includes an event variable that is coded “0” by default but “1” if a person is no longer a teacher at the start of the following school year. In accordance with the norms of survival analysis, censored observations are removed from the data because no additional information is provided when
event is unknown. Censored observations include all observations from the 2013-2014 school year and all observations when it is ambiguous whether a corps member continues to teach the following year or works in another school-based role. Because the analysis focuses on first exit from teaching, observations are dropped after a person’s first exit or first instance of censoring.
Finally, each observation includes multiple predictor variables, some that are time-varying and some that are time-invariant.
For technical and substantive reasons, two slight changes were made to the analytical population. The five corps members who chose not to report a gender were dropped from the analysis because it was unclear whether to treat them as transgender or gender unknown. Also, observations were deleted whenever a teacher started the year at a private school. These teachers account for less than 1% of all person-years and exactly 0% of year one and year two person-years. Treating these teachers as censored is necessary because the data include no information on school-level covariates for private schools.
Methods
The survival functions presented in this chapter are estimated using a method devised by Kaplan & Meier (1958) and discussed at length in the prior chapter. Absent censoring, Kaplan-Meier survival estimates are identical to descriptive survival statistics, telling what
portion of people “survive” to a certain point in time. In the presence of censoring, the Kaplan- Meier method estimates survival through a series of joint probabilities. Of the people known to start the first time period, what percentage are known to survive it? Likewise, what percentage of people who start the second time period survive to its end? The joint probability of these two events, found by multiplying together each event’s discrete probability, represents cumulative survival to the end of time period two. Cumulative survival to all future time points is calculated in the same way, by finding the product of survival probabilities across all prior time periods.
In this chapter, Kaplan-Meiers are used to describe survival patterns for a given factor, like race, without controlling for any other factors. This is useful for two reasons. First, descriptions of this sort are useful for their own sake. It is interesting and useful to know whether male or female corps members remain in teaching longer, whether retention is longer among corps members from more selective colleges or less selective colleges, etc. Second, this type of univariate analysis lays the groundwork for a multivariate approach. If males and females remain teachers at different rates, then gender should likely be included in a multivariate model that adjusts or controls for various other predictors of retention.
The multivariate models presented in this chapter are estimated through discrete-time survival analysis conducted in a logistic regression framework. Time is “discrete” rather than continuous because it is measured in years rather than a more continuous unit like day. Substantively, this means that models predict attrition at the start of the next school year (or before), but do not distinguish between attrition that occurs over the summer and attrition that occurs during the school year.
As explained in chapter 3, the data are organized in the long form with one observation per person-year. Each observation includes the dependent variable event, which is coded “0” if a corps member remains a teacher at the start of the following school year or a “1” if a corps member exits teaching. Because the event is binary rather than continuous, logistic regression is used rather than traditional OLS. Each observation also includes multiple predictor variables, which are described in Table 12 and Table 13. Predictor variables can vary with time, and time itself is also included as a predictor. Other functions of time, like time-squared, can also be included if the effect of time is non-linear in the logit.
Allison (2010, pp. 650–651) shows how to use Mplus to conduct logistic regression with full-information maximum likelihood for missing data. Full-information maximum likelihood estimation, or “FIML,” is useful because most observations in the dataset are missing values on at least one predictor variable. Employing listwise deletion would both bias estimates and shrink the sample from 9,994 observations to 4,707. In contrast, FIML provides a better strategy since it uses all observations and produces the most likely estimates given observed values. Using FIML, however, does require assuming that data are missing at random, or “MAR” (Rubin, 1976). Under the MAR assumption, missingness on a given variable may be correlated with observed values of other variables included in the model, but not with the variable that has missing data Thus, the MAR assumption is untestable but plausible for these data. (Missingness is chiefly a function of whether corps members turn up in a given administrative dataset, and whether corps members turn up is highly related to Year and Charter, which are both included as predictors.) To implement logistic regression with FIML for missing data, the basic code in Mplus Version 7 is as follows:
VARIABLE:
NAMES ARE event predictor_vars ; MISSING ARE ALL (-9999) ;
CATEGORICAL = event ; ANALYSIS: ESTIMATOR = ml ; INTEGRATION = montecarlo ; MODEL: event ON predictor_vars ; predictor_vars ;
The code tells Mplus that missing values are all coded as “-9999,” that event is a binary “CATEGORICAL” variable, and that for the “ANALYSIS”, parameters should be estimated using full-information maximum likelihood. The “MODEL” statement regresses event on a list of predictor variables under the assumption that the predictor variables fit a multivariate normal distribution. Although some predictor variables are binary, the given specification is robust to that particular violation and is common in the literature (Allison, 2010, pp. 651–652). Following recommendations by Hosmer, Lemeshow and Sturdivant (2013, pp. 90–93,107–124), models are refined using “purposeful selection.”
The final model is further validated through comparison to a fixed-effects logit for panel data, also known as a “conditional logit.” To complete this analysis, I drop school-level variables and nest person-years within schools. I then center continuous predictor variables, recoding them as deviations from the school-level (group) mean. Finally, I drop schools from the analysis if they lack variation on the dependent variable, either because all observations experience an
event or none do. In this way, I avoid omitted variable bias at the school-level by controlling for all school characteristics, both observed and unobserved. Corps members are compared to other corps members at the same school (and themselves at the same school during other years). This is accomplished in Mplus through the following code:
VARIABLE:
NAMES ARE event predictor_vars ; MISSING ARE ALL (-9999) ;
CATEGORICAL = event ;
CLUSTER = schoolid ; WITHIN = predictor_vars ; DEFINE:
CENTER predictor_vars (GROUPMEAN) ;3 ANALYSIS: ESTIMATOR = ml ; INTEGRATION = montecarlo ; TYPE = twolevel ; MODEL: %WITHIN% event ON predictor_vars ; predictor_vars ; %BETWEEN%
The “CLUSTER” command identifies the school-level groups and the “WITHIN” command identifies variables measured at the individual level and modeled only on the within level. The “TYPE” command specifies that the model is two-levels. Finally, the “%WITHIN%” and “%BETWEEN%” commands specify a regression of event on individual-level covariates but only a random intercept at the school-level.
As an equation, the one-level, unconditional model can be written as Pr(𝑦𝑖 = 1|𝑥𝑖) = 1
1 + 𝑒−(𝛽𝑥𝑖)
where i represents observations 1,…,n; yi is either 0 or 1; represents a vector of coefficients;
and xi represents a vector of predictor variables. Similarly, the two-level, conditional model can
be written be written as Pr(𝑦𝑖𝑗= 1|𝑥𝑖𝑗) = 1 1 + 𝑒−(𝛼𝑖+𝛽𝑥𝑖𝑗) 3
where i now represents a school; j represents individual observations within a school; αi represents a school effect; continues to represent a vector of coefficients; and xij represents a vector of predictor variables measured as individual deviations from the school mean (with school-level variables dropped from the model). The two-level model includes slightly fewer observations, since observations are dropped if (a) there is only one observation for a particular school or (b) all observations for a school are either yij= 0 or yij= 1.
Variables and Values
Predictor variables are listed and described in detail in Table 12. Some predictors are binary or categorical and are represented by dummies, while other predictors are continuous. Some predictors, like the race and gender of a particular corps member are time-invariant; they do not change across a corps member’s observations. Other predictors, like whether a corps member works at a charter school, are time-varying; a person can work at a district school one year and at a charter school the next.
Without exception, the predictor variables in Table 12 (or similar measures of the same latent constructs) have been theorized and empirically shown to relate to teacher attrition broadly or corps member attrition in particular. Conceptually, they might be grouped into six categories: (1) time, (2) school type, (3) school demographics, (4) teaching assignment, (5) teacher demographics, and (6) credentials. For this analysis, time is measured in years but is coded somewhat eccentrically to give direct estimates of possible first and second year effects, effects that are obviously important given the elevated attrition of first-teachers in general and the explicit two-year commitment made by Teach For America corps members in particular. School type can be charter or district, and can be elementary, middle, or high. School demographic variables include the school’s total enrollment, the portion of students who qualify
for free or reduced-price lunch, and the portion who pass annual state assessments. Teaching assignment variables include whether a corps member teaches multiple subjects and whether a corps member teaches special education. Teacher demographics include gender, race, age, and a corps member’s economic background as measured by whether or not she received Federal Pell Grants in college. Finally, a corps member’s ability and opportunity costs are measured by the selectivity of her undergraduate university, her undergraduate GPA, the potential lifetime earnings suggested by her undergraduate major, and whether or not she majored in education.
Table 12. Description of Variables Used in Survival Analysis
Variable Description
Dependent variable
Event employment at start of following school year: (1) left teaching; (0) continued teaching
Time-varying predictors
Year1 indicator: (1) year 1; (0) not Year2 indicator: (1) year 2; (0) not
Time time variable: (0) years 1 through 3; (1) year 4; (2) year 5; (3) year 6; (4) year 7; (5) year 8; (6) year 9
Selem indicator of whether elementary school: (1) elementary school; (0) nota
Smid indicator of whether middle level: (1) middle school; (0) nota Shigh indicator of whether high : (1) high school; (0) nota
Scharter indicator of whether charter school: (1) charter; (0) not Senroll number of students enrolled in school
Sfarl percentage of school's students who receive free or reduced-priced lunch
Stest average school achievement percentile on the NY state assessment Tsubmult indicator of whether teacher of multiple subjects: (1) yes; (0) no Tsubsped indicator of whether teacher of special education: (1) yes; (0) no
Time-invariant predictors
Pmale gender: (1) male; (0) female
Pageo25 indicator of whether 26 or older on first day of first school year: (1) yes; (0) no
Ppell indicator of whether received a federal Pell Grants: (1) yes; (0) no Pracew indicator of whether white/Caucasian: (1) yes; (0) no
Praceb indicator of whether black/African American: (1) yes; (0) no Praceh indicator of whether Hispanic: (1) yes; (0) no
Pracea indicator of whether Asian: (1) yes; (0) no
Praceoth indicator of whether other person of color: (1) yes; (0) no
Pbarrons six-level measure of undergraduate university’s admissions selectivity reverse-coded from Barron’s (2009): (6) most competitive; (5) highly competitive; (4) very competitive; (3) competitive; (2) less competitive; (1) non-competitiveb
PGPA GPA
Pearnpot lifetime earning potential by undergraduate major, measured in $millionc
Pmajeduc indicator of whether education major: (1) yes; (0) no
a
Usually, the three measures of school level (Selem, Smid, & Shigh) refer to school l levels included in the NYSED or USDOE data, but when school level is unknown, teachers are assigned a “school level” based on grades they teach.
b
Percentile generated by first determining the percentage of a school’s students who scored advanced or proficient on the state exam in math or ELA, second determining a school’s percentile rank in that grade-subject across all schools in NY State, and third, averaging together the school’s percentile rankings across all grade-subject pairs.
c
Lifetime earning potential estimates taken from Hershbein and Kearney (2014). Double-majors assigned earning potential of higher-earning major.
Table 13. Summary Statistics of Key Variables
Variable Mean S.D. % Obs. non-missing
Time-varying predictors (N=9,994 person-years)
Year1 0.366 - 100.0% Year2 0.261 - 100.0% Time 0.481 1.057 100.0% Selem 0.579 94.3% Smid 0.294 - 94.3% Shigh 0.127 - 94.3% Scharter 0.281 - 100.0% Senroll 578.355 401.985 93.5% Sfarl 84.015 12.724 91.6% Stest 0.338 0.244 79.0% Tsubmult 0.136 - 83.5% Tsubsped 0.192 - 83.5%
Time-invariant predictors (N=3,656 people)
Pmale 0.275 - 100.0% Pageo25 0.052 - 95.2% Ppell 0.262 - 74.1% Pracew 0.662 - 98.8% Praceb 0.084 - 98.8% Praceh 0.074 - 98.8% Pracea 0.084 - 98.8% Praceoth 0.095 - 98.8% Pbarrons 5.178 1.021 99.4% PGPA 3.571 0.258 99.8% Pearnpot 1.135 0.203 95.3% Pmajeduc 0.060 - 100.0% Kaplan-Meier Estimates
Using the Kaplan-Meier method, I investigate the relationship between corps member retention and individual predictor variables. For each variable, I display a graph that isolates a single variable and shows whether retention varies across distinct values of that variable. By definition, retention is 100% at the start of year 1, then decreases at the end of each interval if attrition is observed. Traditionally, Kaplan-Meier curves are graphed as step-functions to emphasize their discrete treatment of time. Each “curve” is graphed to look like a staircase descending to the right. Each step of the staircase represents one time interval. The “rise” (which, because the staircase descends is really a “fall”) represents attrition. The “run”
represents time. To improve the readability of graphs, I ignore the stepwise convention, instead using a single downward-sloping line to connect cummulative retention at one time point to cummulative retention at the next time point. Needless to say, these lines should not be interpretted because attrition is measured only at the start of each year, not in between.
In addition, it bears emphasizing that interpretation is straightforward when variables are time-invariant but slightly unintuitive when variables are time-varying. For a time-invariant variable like gender, the Kaplan-Meier estimator tells the percentage of female corps members who remain teachers at any given time period. However, when a variable is time-varying a single teacher can switch categories from year to year. She can work in a school with top quartile test scores one year and a school with second quartile test scores the next. This can happen because she switches schools or because test scores fall at the school where she teaches. Despite this technical point, the thrust of Kaplan-Meiers for time-varying and time- invariant variables is the same. Both show how retention varies based on group membership.
Cumulative retention in teaching by school type
We might expect corps member retention to vary according to school type. Past research has found higher attrition among middle school teachers (Marinell & Coca, 2013, p. 14). However, corps members working at different school levels appear to leave the profession at similar rates. This is shown in Figure 18 and Table14. The logrank test of equal curves is marginally significant, (p=0.98) but substantively retention varies only slightly. At the start of year 6, 37.3% of primary school corps members remain teachers, 35.0% of middle school teachers remain teachers, and 34.5% of high school teachers remain teachers.
Figure 18. Kaplan-Meier Estimate of Cumulative Retention in the Teaching Profession at the Start of Year X by School Level
Table14. Kaplan-Meier Estimate of Cumulative Retention in the Teaching Profession at the Start of Year X by School Level
Year 1 Year 2 Year 3 Year 4 Year 5 Year 6
Primary 100.0% 91.1% 65.4% 52.3% 43.6% 37.3%
Middle 100.0% 87.3% 60.2% 47.7% 40.9% 35.0%
High 100.0% 92.1% 62.2% 50.2% 39.6% 34.5%
Note: Full output for this table, including standard errors and beginning population totals, can be found in appendix B.
In the past decade, increasing numbers of high-poverty students have attended charter schools, and increasing numbers of Teach For America corps members have taught in charter schools. Figure 19 shows retention for teachers working in district schools versus charter schools. To the eye, charter school retention seems higher than district retention, but the logrank test of difference is not statistically significant (p=.297). The difference seems greatest in year 3, when retention at charter schools appears to be nearly 10 percentage points higher. By year 6 however, the lines have mostly converged. Retention at charter schools is only 2.7% higher.
Figure 19. Kaplan-Meier Estimate of Cumulative Retention in the Teaching Profession at the Start of Year X by School Type
Table 15. Kaplan-Meier Estimate of Cumulative Retention in the Teaching Profession at the Start of Year X by School Type
Year 1 Year 2 Year 3 Year 4 Year 5 Year 6
District 100.0% 89.9% 61.5% 49.1% 42.1% 36.9%
Charter 100.0% 90.3% 71.4% 57.6% 47.5% 39.6%
Note: Full output for this table, including standard errors and beginning population totals, can be found in appendix B.
The more notable difference between district and charter school teachers is that district teachers often migrate to charter schools, but charter school teachers seldom migrate to districts. Figure 20 and Table 16 show annual switch rates. Approximately 0.5% of teachers