La conducta prosocial desde un enfoque biológico.

2 Estado de la cuestión.

2.2. La conducta prosocial en la Psicología Social.

2.2.3. Segunda cuestión: ¿Porqué ayudan las personas?

2.2.3.2. La conducta prosocial desde un enfoque biológico.

< 40 1.09 1.06 1.04 1.01 1.14 1.10 1.07

40–49 1.01 1.02 0.96 0.95 1.04 1.07 1.01

50–59 0.89 0.83 0.81 0.78 0.97 0.99 0.95

Marketing Expense

The marketing expense for this product is $.78. This is a combination of the cost of the mail piece, $.45, postage of $.23 per piece, and $.10 for processing.

Deriving Variables

Once the data is deemed correct and missing values have been handled, the next step is to look for opportunities to derive new variables. This is a situation where knowledge of the data and the customer is critical. Combining variables through summarization or division can improve predictive power. Additional analysis of dates and the use of "date math" can assist in discovering new predictive variables.

Summarization

Summarization is an approach used to combine variables. This is done in certain cases where huge amounts of data are generated. Some common methods include addition, subtraction, and averaging.

Consider the amount of data in an active credit card transaction file. Daily processing includes purchases, returns, fees, and interchange income. Interchange income is the revenue that credit card banks collect from retailers for processing payments through their system. To make use of this information, it is typically aggregated to daily, weekly, monthly, or yearly totals and averages. For example, let's say you want to know the total monthly purchases for a group of

customers. And you want to know if that total is changing from month to month. First, you summarize the daily purchases to get a monthly total. Then you subtract the months to get the difference.

The following code is not part of the case study, but it does represent an example of how to calculate the monthly totals, average daily purchases, and amount of monthly change:

data ccbank.dailyact; set ccbank.dailyact;

janpurch = sum(of pur0101 -pur0131); /* Summarize daily purchases */ febpurch = sum(of pur0201 -pur0208);

janavep = janpurch/31; /* Average daily purchases */ febavep = febpurch/28;

change2 = febpurch - marpurch; run;

Ratios

Ratios are another variable form that is very useful for certain types of prediction. Many values have additional meaning when compared to some other factor. In this example, I have the variable credit line (credlin2). (Some variables now have a "2" on the end after having missing values replaced.) It represents the total credit line for all credit accounts. A total credit line is something people tend to build over time. To capture the value, I create a variable equal to the ratio of credit line to age of file (age_fil2). The following code creates the variable crl_rat.

data acqmod.model2; set acqmod.model2;

if age_fil2 > 0 then crl_rat=credlin2/age_fil2; else crl_rat = 0;

run; Dates

Dates are found in almost every data set. They can be very predictive as time measures or used in combination with other dates or different types of variables. In order to use them it is necessary to get them into a format that supports "date math." Date math is the ability to perform mathematical functions on dates. This includes addition, subtraction, multiplication, and division. SAS has numerous formats to capture date values. Once you put a date into an SAS format, it is stored as a whole number that represents the number of days since January 1, 1960. If you have two dates, you can compare them easily using this information.

In our case study, I have a date variable called bankcard open date (opd_bcd). It contains six characters. The first four characters represent the year; the last two characters represent the month. The first step is to get the date into an SAS format. The mdy format takes the values inside the parentheses and assigns them to month, day, and year. For months, I use the substr command to pick the values. It begins in the fifth position and takes two characters in substr(opd_bcd, 5,2). Year is pulled the same way, beginning in the first position and taking four characters.

Next, I create the variable fix_dat. This represents December 31, 1999. In the calculation, I use (fix_dat – opd_bcd2)/30 to represent the approximate number of months from the first bankcard open date to the end of 1999. Using date

math, I create a variable that represents the ratio of the current balance to the age in months of the oldest bankcard. I call this variable bal_rat.

data acqmod.model2; set acqmod.model2;

opd_bcd2 = mdy(substr(opd_bcd,5,2),'01',substr (opd_bcd,1,4));

fix_dat = mdy('12','31','1999'); if opd_bcd ^= '000000' then

bal_rat = tot_bal2/((fix_dat - opd_bcd2)/30);

else bal_rat = 0; run;

Variable Reduction

There are many opportunities to create variables through combinations and permutations of existing variables. This is one reason why familiarity with the data and the industry is so valuable. Once you've extracted, formatted, and created all eligible variables, it's time to narrow the field to a few strong contenders.

Continuous Variables

If you have fewer than 50 variables to start, you may not need to reduce the number of variables for final eligibility in the model. As the amount of data being collected continues to grow, the need for variable reduction increases. Some analysts, especially those using credit and transaction-level data, may be starting with 3,000+ eligible variables. Performing an in-depth analysis on each variable is not an efficient use of time. Many of the variables are correlated with each other. If you eliminate some that might have predictive power, usually some other ones will step in to do the job.

In the classic text Applied Logistic Regression [Hosmer and Lemshow, Wiley 1990], the authors recommend performing a univariate logistic regression on each variable. But with a large number of variables, this can be very time-consuming. It is simpler to use a procedure in SAS called PROC LOGISTIC. This is the same procedure that is used to build the final model, but it also works well as a variable reduction tool.

As an option in the model processing, choose selection=stepwise maxstep=1 and details. (In chapter 5, I will cover the selection options in more detail.) This will run very quickly because you are running only one step in the modeling process.

Method 1: One Model

title1 "XYZ Insurance - Data Reduction"; proc logistic data=acqmod.model2 descending; weight smp_wgt;

model active = inc_est3 inc_miss infd_ag2 hom_equ2 tot_acc2 actopl62

tot_bal2 inql6m2 age_fil2 totopac2 credlin2 crl_rat bal_rat no30day2 nobkrpt amtpdue no90eve

/selection= stepwise maxstep=1 details; run;

Part of the output will contain the table shown in Figure 4.1. From this table we can see the univariate predictive power of each continuous variable.

To select the final variables, look in the last column. A good rule of thumb is to keep all variables with a probability of chi-square of less than .5000. In our set of variables, all but NO30DAY and CRL_RAT remains in the candidate variable set. So I will eliminate those two variables from consideration.

Let's repeat this exercise for the models in Method 2. We want to keep all variables that are eligible for any of the models.

Figure 4.1

Chi-Square Statistic

In simple terms, the chi -square statistic measures the difference between what you expect to happen and what actually happens. The formula reads:

If the chi-square value is large, then the p-value associated with the chi-square is small. The p -value

represents the probability that the event occurred by chance. The chi -square statistic is the underlying test for many modeling procedures including logistic regression and certain classification trees.

Method 2:

In document Apoyo psicológico: un enfoque de campo (página 35-40)