Capitulo 4 Aristóteles
4. LA METAFISICA COMO “FILOSOFIA PRIMERA”
5.1 La doctrina del alma
The following log file illustrates how data may be transformed to obtain data that are appropriate for linear regression.
. * 2.18.Funding.log
. *
. * Explore the relationship between NIH research funds and
. * disability-adjusted life-years lost due to the 29 diseases
. * discussed by Gross et al. (1999). Look for transformed values
. * of these variables that are linearly related.
. * Perform a linear regression on these transformed variables.
. * Replot this regression line as a function of the untransformed
. * variables.
. use C:\WDDtext\2.18.Funding.dta, clear 1
. scatter dollars disabil, ylabel(0(.2)1.4) xlabel(0(1)9) 2
> xsize(2.7) ysize(1.964) scale(1.5) 3
. scatter dollars disabil if dollars < 1 4
> , ylabel(0(.1).4) ymtick(.05(.1).35) xlabel(0(1)9) 5
> xsize(2.7) ysize(1.964) scale(1.5)
. scatter dollars disabil if dollars < 1 6
> , ylabel(0(.1).4) ymtick(.05(.1).35)
> xscale(log) xlabel(0.01 0.1 1 10) 7
> xmtick(.02 (.01) .09 .2 (.1) .9 2 (1) 9) 8
> xsize(2.7) ysize(1.964) scale(1.5) . generate logdis = log(disabil)
. generate logdol = log(dollars)
. regress logdol logdis 9
Source SS df MS Number of obs = 29
F( 1, 27) = 18.97 Model 14.8027627 1 14.8027627 Prob > F = 0.0002 Residual 21.0671978 27 .780266584 R-squared = 0.4127 Adj R-squared = 0.3909 Total 35.8699605 28 1.28107002 Root MSE = .88333
logdol Coef. Std. Err. t P>|t| [95% Conf. Interval]
logdis .4767575 .109458 4.36 0.000 .2521682 .7013468 10 _cons -2.352205 .1640383 -14.34 0.000 -2.688784 -2.015626
. predict yhat,xb
. predict stdp, stdp 11
. generate ci–u = yhat +invttail(–N-2,.025)*stdp 12
. generate ci–l = yhat -invttail(–N-2,.025)*stdp
. sort logdis 13
. twoway rarea ci–u ci–l logdis, color(gs14) 14
> || line yhat logdis
> || scatter logdol logdis
> , ylabel(-4.61 "0.01" -2.3 "0.1" 0 "1") 15
> ymtick(-4.61 -3.91 -3.51 -3.22 -3.00 -2.81 -2.66 -2.53 16
> -2.41 -2.3 -1.61 -1.2 -.92 -.69 -.51 -.36 -.22 -.11 0)
> xtitle(Disability-Adjusted Life-Years Lost ($ millions)) 17
> xlabel(-4.61 "0.01" -2.3 "0.1" 0 2.3 "10")
> xmtick(-2.3 -1.61 -1.2 -.92 -.69 -.51 -.36 -.22 -.11 > 0 .69 1.1 1.39 1.61 1.79 1.95 2.08 2.2 2.3) legend(off) > xsize(2.7) ysize(1.964) scale(1.5)
. generate yhat2 = exp(yhat) 18
. generate ci–u2 = exp(ci–u) . generate ci–l2 = exp(ci–l)
. twoway rarea ci–u2 ci–l2 disabil, color(gs14) 19
> || line yhat2 disabil
> || scatter dollars disabil
> , ytitle(NIH Research Funds ($ Billions))
> ylabel(0(.2)1.4) ymtick(.1(.2)1.3) xlabel(0(1)9) > xsize(2.7) ysize(1.964) scale(1.5) legend(off) . twoway rarea ci–l2 ci–u2 disabil, color(gs14)
> || line yhat2 disabil, sort 20
> || scatter dollars disabil if dollars < 1
> , ytitle(NIH Research Funds (it $ Billions)) 21
> ylabel(0 (.1) .5) xlabel(0 (1) 9) legend(off) > xsize(2.7) ysize(1.964) scale(1.5)
Comments
1 This data set is from Table 1 of Gross et al. (1999). It contains the annual allocated NIH research funds in 1996 and disability-adjusted life-years lost for 29 diseases. These two variables are denoted dollars and disabil in this data set, respectively.
2 This command produces a scatter plot that is similar to panel A of Figure 2.14. In this figure the annotation of individual diseases was added with a graphics editor.
3 We want this to be one of the five panels in Figure 2.14. This requires that the size of the panel be smaller and the relative size of text be larger than Stata’s default values. The xsize and ysize options specify the size of the width and height of the graph in inches, respectively; scale(1.5) specifies that the size of all text, marker-symbols and line widths be increased by 50% over their default values.
4 AIDS was the only disease receiving more than one billion dollars. The
if dollars< 1 qualifier restricts this graph to diseases with less than one
billion dollars in funding and produces a graph similar to panel B of Figure 2.14.
5 The ymtick option draws tick marks on the y-axis at the indicated values. These tick marks are shorter than the ticks used for the axis labels. The
ytick option works the same way as the ymtick option except that the
tick lengths are the same as those of the ylabel command.
6 This command produces a graph similar to panel C of Figure 2.14. The equivalent point-and-click command isGraphics Twoway graph (scatter plot, line etc) Plots Create Plot Plot type: (scatter plot) Y variable:dollars, X variable:disabil
if/in Restrict observations If: (expression)dollars< 1 Accept
Y axis Major tick/label properties Rule Axis rule
g
r Range/Delta ,0Minimum value ,.4Maximum value ,.1Delta
Accept Minor tick/label properties Rule Axis rule
g
r Range/Delta , .05 Minimum value , .35 Maximum value ,
.05 Delta Accept X axis Major tick/label properties
Rule Axis rule gr Custom , Custom rule:
0.01 0.1 1 10 Accept Minor tick/label properties Rule
Axis rule gr Custom , Custom rule: .02(.01).09 .2(.1).9
2(1)9 Accept Axis scale properties ✓ Use logarithmic
Height: (inches) 1.964 ✓ Scale text, markers, and lines Scale multiplier: 1.5 Submit .
7 We plot disabil on a logarithmic scale using the xscale(log) option. 8 The minor tick marks of this xmtick option are specified to be from
0.02 to 0.09 in units of 0.01, from 0.2 to 0.9 in units of 0.1 and from 2 to 9 in integer units. The xmtick and xtick options have the same effects on the x-axis that the ymtick and ytick options have on the
y-axis.
9 This command fits the regression model of Equation (2.37).
10 The slope of the regression of log funding against log life-years lost is 0.4768. The P -value associated with the null hypothesis thatβ = 0 is
< 0.0005.
11 This predict command defines stdp to be the standard error of the ex- pected value of logdol from the preceding linear regression (i.e. stdp is the standard error of yhat). The option stdp specifies that this standard error is to be calculated. We have also used stdp as the name of this variable.
12 The variables ci u and ci l give the 95% confidence interval for yhat using Equation 2.20. In Section 2.12 we used the lfit and lfitci commands to derive and plot this interval for us. Although we could have done this here as well we will need the variables ci u and ci l to derive the confidence band of Panel E in Figure 2.14.
13 This command sorts the data set in ascending order by logdis. The equivalent point-and-click command isData Sort Ascending sort Variables:treat Submit .
Stata needs the data set to be sorted by the x-variable whenever a non-linear line is plotted.
14 This plot is similar to Panel D of Figure 2.14. The rarea command shades the region between ci u and ci l over the range of values defined by logdis. The option color(gs14) shades this region gray. Stata provides 15 different shades of gray named gs1, gs2, . . . , gs15. The higher the number the lighter the shade of gray; gs0 is black and gs16 is white. This option may also be used to color plot-symbols and lines. The point- and-click version of this command is similar to that given above. The
rarea plot is specified on the Plots tab of the twoway–Twoway graphs
dialogue box as Create Plot Choose a plot category and type gr Range plots , Range plots: (select type) Range area
Plot type: (range plot with area shading) Y1 variable: ci u2 , X variable: disabil , Y2 variable: ci l2 Accept .
15 In Panel D of Figure 2.14 the y-axis is labeled in billions of dollars. In this graph command, the y-variable is logdol, which is the logarithm of
dollars. This ylabel option places the labels 0.01, 0.1 and 1 at the values of
logdol of−4.61, −2.3 and 0, respectively. (Note that log[0.01] = −4.61,
log[0.1] =−2.3 and log[1] = 0.) This syntax may also be used to label the x-axis with the xlabel option.
16 The tick marks specified for logdis by this ymtick command are equal to the logarithms of the tick marks for disabil in Panel B. Compare this option with that described in Comment 8 when we were also using the
xscale(log) option. Tick marks may be specified as a list of values, as
is illustrated here, for the ymtick, ytick, ylabel, xmtick, xtick, and xlabel commands.
17 In the previous graphs in this example the title of the x-axis was taken from the variable label of the x-variable in the database. Here, the x- variable is logdis and we have not given it a label. For this reason we title the x-axis explicitly with a xtitle option.
18 The variable yhat2 equals the left-hand side of Equation (2.38); ci_u2 and ci_l2 give the upper and lower bounds of the 95% confidence interval for yhat2. In Stata and this text, exp[x] denotes e raised to the power x, where e is the base of the natural logarithm.
19 This graph plots funding against life-years lost. The regression curve and 95% confidence intervals are shown.
20 This line command plots yhat2 against logdis. The sort option sorts the data by the x-variable prior to drawing the graph. In this example, this option is not needed since we have already sorted the data by logdis. If the data had not been sorted in this way, and we had not used the sort option, then straight lines would have been drawn between consecutive observations as listed in the data set. This often produces a garbled graph that looks nothing like the intended line plot.
21 Deleting AIDS (diseases with dollars≥1) permits using a more narrow range for the y-axis. The resulting graph is similar to panel E of Fig- ure 2.14. In panel E, however, the y-axis is expressed in millions rather than billions and a graphics editor has been used to break the y-axis and add the data for AIDS.