PARTE III RAZONES CULTURALES Y SIMBÓLICAS PARA LA SIEMBRA
A. Eje culinario: la cocina, la parcela y la campesina
We train four types of classifiers to generate the callers’ abandonment and redialing proba- bilities. These include k-nearest neighbor voting, decision trees, logistic regression, and neural networks. We provide a brief description of each classifier below and refer the reader to Michie et al. (1994) for further details:
• K-Nearest Neighbor Voting: To predict the class of a focal observation, the k-nearest neighbor voting method begins by using the k-nearest neighbor algorithm to identify the k observations with the most similar set of features as the focal observation (the observation whose class we are interested in predicting).5 Then each of the k neighbors casts a vote for the class of the focal observation, where each neighbor’s vote is given by its own observed class. Finally, the probability of the focal observation belonging to each class is equal to the percentage of the votes cast for each class. For example, suppose that we wanted to generate a caller’s probability of redialing based on the call features. Then, we would first use the k- nearest neighbor algorithm to identify the k most-similar calls (with respect to their features) in the training data to the focal call. Suppose that we picked the five most-similar calls (k=5) to vote on the class of the focal call and in one of the calls the caller redialed and in the other four calls the caller did not redial. Then the neighbor that redialed would vote “redial” for the class of the focal call and the other four neighbors would vote “not redial.” Hence, the predicted probability of the caller abandoning the focal call would be 20% (1 redial vote out of the 5 votes cast). We trained the k-nearest neighbor voting classifier with k equal to 8, 16, 32, and 64 neighbors and found that its predictions were most accurate under 64 neighbors. • Decision Tree: Under decision trees the observations are segmented by applying a series of decision rules, where each rule assigns an observation to a segment based on the value of one of the features. Decision rules are successively applied to each segment, which creates a hierarchy (or tree) of segments within segments. Each segment in the tree is referred to as a node and the final nodes are referred to as leaves. To generate the probability of an observation belonging to a given class, one first follows the decision rules based on the observation features to determine the observation’s corresponding leaf. Then the probability of the observation belonging to each class is given by the percentage of training observations in the leaf that belong to that class. To illustrate how to use the decision rules to generate predicted probabilities, in Figure 4.7 we provide an example of a simple decision tree trained using caller redialing decisions over 10,000 abandoned calls. To be clear, the example decision
tree below is not based on our actual data, but only used for illustration. Figure 4.7: Example of Decision Tree for Caller Redialing Decision
Redial: 50.00% Sample: 10,000 Wait < 60 seconds >= 60 seconds Redial: 40.00% Sample: 6,000 Redial: 65.00% Sample: 4,000 lag1_Intercontact_Days >= 1 day < 1 day Redial: 30.00% Sample: 4,000 Redial: 60.00% Sample: 2,000 >= 2 days < 2 days Redial: 60.00% Sample: 3,000 Redial: 80.00% Sample: 1,000 lag1_Intercontact_Days
Our example decision tree contains 3 decision rules based on two features, resulting in 4 leaves. The features include how long (in seconds) the caller waited before abandoning (lag1 Wait) and the intercontact time (in days) between the most recent call and the current call (lag1 Intercontact Days). In each node we provide the number of calls in the training data that adhere to the decision rules corresponding to the node and the percentage of calls in the node that are classified as redials. Suppose we wanted to generate the probability of a caller redialing who waited 120 seconds before abandoning (lag1 Wait=120) and whose most recent call was 1 day ago (lag1 Intercontact Days=1). Then this call would belong to the far-right leaf since its value of lag1 Wait is greater than or equal to 60 seconds and its value of lag1 Intercontact Days is less than 2 days. Given that 80.00% of the calls from the training set that belong to this leaf resulted in redials, we would predict that the probability of the focal caller redialing is 80.00%.
• Logistic Regression: While the k-nearest neighbor voting and decision tree classifiers are nonparametric, the logistic regression classifier generates probability of an observation be- longing to each class based on a set of parameters that are estimated using the features and classes of observations in the training data. Specifically, let the training observations be in- dexed by i∈ {1,· · · , N}, let the possible classes be indexed by c ∈ {1,· · ·, C}, and denote by Pic the probability that observationi belongs to class c. Then under the assumptions of
the logit model, P ric= exp(αc+βcXi) 1 + C X c0=2 exp(αc0+βc0Xi) ,
whereαc is an intercept term for classc,Xi is a vector of features for observationi, andβcis a vector of parameters that captures the effects of the features on the probability of belonging to class c. To identify the parameter values for each class, it is necessary to normalize the parameter values (including the intercept) of one of the classes to zero. Hence, the first term in the denominator above is 1 for c= 1, since exp(0) = 1.
To train the logistic regression classifier for our abandonment classification problem, we index callers in the training data by i ∈ {1,· · ·, N}, calls initiated by caller i by j ∈ {1,· · ·, ni}, where ni is the number of calls initiated by caller i in the training data, and periods by
t ∈ {1,· · ·, τij}, where τij is the final decision period of caller i during call j. Also, we let
Yijt be the abandonment decision of calleriin periodtof call j, where
Yijt=
0,if the caller waited, 1,if the caller abandoned.
Then denoting byP(Yijt= 1) the probability of calleriabandoning, this probability is given by P(Yijt= 1) = exp(αa+βaHijt) 1 + exp(αa+βaH ijt) ,
where αa is an intercept term,Hijt is a vector of history features and other features, andβa captures the effects of the features on the probability of abandoning. Finally, to train the logistic regression classifier for our redial classification problem, we use the same approach, except the training data only includes abandoned calls and the classifier generates the callers’ probability of redialing following abandonment.
• Neural Network: A neural network contains layers of nodes where the connections between the nodes are used to generate the probability of an observation belonging to a given class. There are three types of nodes:
– Input Nodes: These contain the values of the observation’s features, which are used by the neural network to generate the probabilities of belonging to each class.
– Hidden Nodes: These contain new values that are calculated using the values from the input nodes and/or other hidden nodes. These new values are calculated using a two-step process. First, the hidden node calculates a linear combination of the values from the input nodes and/or other hidden nodes, where the weights are determined by the algorithm. This linear combination may also include an intercept value, which is determined by the algorithm and is referred to as abias. Then the value obtained from from the first step is transformed using anactivation function supplied to the algorithm prior to training. Note that there can be multiple layers of hidden nodes in the neural network, where each successive layer is formed as a linear combination of the values of the hidden nodes in the previous layer along with the possible biases.
– Output Nodes: These contain the probability of the observation belonging to each class, which is calculated using a two-step process. First, the output node calculates a linear combination of the values from the hidden nodes (and a possible bias), where the weights (and bias) are determined by the algorithm. Also, the values from the input nodes may also be included as part of the linear combination. Second, the value from the first step is transformed into a predicted probability using a probability function supplied to the algorithm prior to training.
The task of the neural network algorithm is to choose the weights and biases as to minimize an error function supplied prior to training, where the error function is some measure of the algorithm’s prediction accuracy. To demonstrate how the neural network calculates the probability of belonging to a given class, we provide an example of a neural network trained using the callers’ abandonment decisions in Figure 4.8. Again, to be clear, the example below is not based on actual data, but is for illustration.
Figure 4.8: Example of Neural Network for Caller Abandonment Decision
Weights Bias: Weights
Input Node 1
Input Node 2
Hidden Node 1
Hidden Node 2
Step 1:
Step 2 (Activation Function)
Step 2 (Activation Function)
Output Node
Bias: Step 1
Step 2 (Probability Function) Bias:
Step 1:
The neural network has two input nodes that include the caller’s waiting time in the most recent call (lag1 Wait) and an indicator of whether the caller abandoned the most recent call (lag1 Ab). It also contains two hidden nodes and the output node that is used to calculate the caller’s probability of abandoning. Suppose that we wanted to generate the probability of a caller abandoning in a given period based on the caller’s values of lag1 Wait and lag1 Ab. Then beginning in hidden node 1 we would perform step 1 by calculating a linear combination of the caller’s input node values using their corresponding weights (h1,1 and h1,2), and add
the bias of hidden node 1 (h1,0) to obtain x1. In step 2 we would transformx1 intoy1 using
the activation function that we selected upon training the algorithm. We would then repeat steps 1 and 2 in hidden node 2 to obtain y2. Next we would perform step 1 in the output
node by calculating a linear combination of the values from the hidden nodes (y1 and y2)
using their corresponding weights (o1 and o2), and adding the bias of the output node (o0).
Finally, we would perform step 2 of the output node by calculating the caller’s probability of abandoning using zas an input to the probability function that we selected upon training the algorithm.
We train 6 types of neural networks for each classification problem (abandonment and re- dialing), including neural networks with 1, 3, and 5 hidden nodes in a single hidden layer, and neural networks with 1, 3, and 5 hidden nodes each in two hidden layers. We train the neural networks under several activation functions and probability functions and find that prediction accuracy is highest under a hyperbolic tangent activation function and a standard logistic probability function.6 We also include biases in the hidden nodes and output node as these improve the prediction accuracy. Finally, we allow for direct connections from the input nodes to the output node as this also improves prediction accuracy.