• No se han encontrado resultados

2.6 Macroentorno

2.6.6 Variable Ambiental

BINOMIAL AND NORMAL DISTRIBUTIONS

Although the probability distribution of a random variable can be quite general, subject to some simple requirements such as the set of all probabilities summing to 1, in prac- tice, there are a few distributions that are particularly useful. In this section, we consider two such distributions (binomial and normal) in detail.

Consider an experiment and let A be an event of interest. Define a random variable X such that X = 1 if A occurs and X = 0 otherwise. Then, P X

(

= =1

)

P A

( )

, which we can denote by π, where 0 < π < 1. Note that we used this approach in Section 3.6 when analyzing Pujols’s batting in 2011.

Let X X1,  2…, Xn be independent random variables, each with the distribution of

X. Then, X X1,  2…, Xn is a sequence of ones and zeros. Let S   = X1+ X2+ + Xn. Note that S is simply the number of times that A occurs in the n experiments. It follows that S is a random variable, and its distribution can be determined using the information provided. S is said to have a binomial distribution with parameters n, π.

For instance, suppose n = 2. To find the probability that S = 2, we need to find all combinations of X1 and X2 that yield a sum of 2 and add their probabilities. Because there is only one way to have S = 2 (both X1 and X2 must be 1), this case is particularly simple:

(

=

)

=

(

= =

)

P S   2  P X1  1, X2  1 . Because X X1,  2 are independent:

= = = = = = π

Finding the probability that S = 1 is a little more complicated because there are two ways to have S   1:  = X1= 1,  X2=0  or X1= 0,  X2=1 so that

= = = = + = = = = = + = = = π − π (    1)    (  1,   0)    (  0,   1)   (  1) (  0)    (  0)  (  1). 2 (1 ). 1 2 1 2 1 2 1 2 P S P X X P X X P X P X P X P X

Because we must have P S

(

=0

)

+P S

(

= +1

)

P S

(

=2

)

=1, P S

(

=0 must be

)

(1   ) .− π 2 For general n, the same basic argument works, but the details rapidly become com- plicated. However, virtually all spreadsheet programs, statistical packages, and many calculators can compute binomial probabilities for given values of n and π; see Section 3.17 for information on how to compute these probabilities in Excel.

The important assumptions of the binomial distribution are that the results of the individual experiments are independent and that the probability that X = 1, i.e. that the event of interest occurs, is the same in each experiment.

The situations in which the binomial distribution applies are quite simple—we identify an event of interest and simply count how often that event occurs. However, it is because of that simplicity that the binomial distribution is so useful. Even when the experiment itself is complicated, we are often interested in relatively simple features of the results. For example, if our experiment is an NFL season, the detailed results of that experiment would fill this book. However, suppose we are interested in whether or not the team with the leading rusher during the regular season wins the Super Bowl. Then the number of seasons in the past 20 years in which the team with the leading rusher wins the Super Bowl can be modeled as a binomial random variable.

The only quantities governing the binomial distribution are n, the number of exper- iments and π, the probability of the event of interest occurring in a given experiment. Therefore, all the properties of a binomial random variable S are functions of n, π. For instance the mean of S is n π. For example, if we observe 100 experiments and in each one the probability of A is 0.25, we expect 25 occurrences of A.

The standard deviation of S is nπ − π(1 ) . The form of this expression may seem a little strange but, after a little reflection, it should make some sense. Standard deviation is a measure of variation. Suppose π is very close to 0. Then A almost never occurs. Therefore, S is almost always 0; that is, there is very little variation in S. The same argu- ment applies if π is very close to 0, except that A almost always occurs and S is almost always n. That is, when π is close to either 0 or 1, then the standard deviation should be small. We expect a lot of variation whenever π = 1/2 because A and “not A” are equally likely. The function nπ − π(1 ) has all these properties: It is 0 when either π = 0 or π = 1, and it takes its maximum value when π = ½.

The second important distribution that we will consider is the normal distribu- tion. Unlike the binomial distribution, the normal distribution is a continuous distri- bution, and if a random variable X has a normal distribution, X can take any value between −∞ and ∞, although extreme values are unlikely. We briefly discussed the normal distribution in Chapter 2, where it was presented as the “ideal” shape of a histogram.

The normal distribution is governed by two parameters, traditionally denoted by µ and σ. Here, µ represents the mean of the distribution of X, and σ represents the stan- dard deviation; because standard deviations are always positive, σ > 0. The shape of the distribution is given by the well-known bell-shaped curve, which takes its maximum value at µ; σ governs how spread out the curve is. Figure 3.3 shows several normal distributions, corresponding to different values of µ and σ. These plots illustrate some important properties of the normal distribution. For instance, the distribution is sym- metric about its peak, which occurs at the mean of the distribution. When the value of µ changes, the effect on the distribution is a shift; other aspects of the distribution, such as its “bell shape,” do not change. When the value of σ changes, the effect is essentially to change the scale on the x-axis.

Although it is easy to describe the shape of a normal distribution, it is a little more difficult to determine probabilities associated with a normal distribution. Let X denote a random variable with a normal distribution with mean µ and standard deviation σ. Since X is a continuous random variable, we cannot give a table listing the possible values of X together with their probabilities. Instead, we consider the probability that X falls into certain ranges of values.

To describe such probabilities, it is useful to relate X to a standard normal distribu- tion. A standard normal distribution is the one in which µ = 0 and σ = 1. Let X denote a random variable with mean µ and standard deviation σ. We can convert X to a random variable with a standard normal distribution by computing

= −

σ Z (     )X µ  .

Then, Z has a standard normal distribution. The standard normal distribution is a con- venient reference distribution that can be used to understand variation in many different contexts; several examples of this are presented in further chapters of this book.

Let Z have a standard normal distribution and consider P(−a < Z < a) as a function of a. This probability tells us how close Z tends to be to 0, the mean of its distribution. Table 3.4 gives values of this probability for several values of a.

Note that the value of P(−a < Z < a) rapidly approaches 1 as a increases. This is an important property of the normal distribution—with high probability, normal random variables tend to be close to their mean value.

Now, consider a normal random variable X that has mean µ and standard deviation σ. To find P(−b < X < b) for some value b, we convert this probability into a probability concerning a standard normal random variable Z. Because (X − µ)/σ has a standard normal distribution,

P(−b < X < b) = P(−(b − µ)/σ < (X − µ)/σ < (b − µ)/σ) = P(−((b − µ))/σ < Z < ((b − µ))/σ);

P(−b < X < b) can now be found from Table 3.4 (or a similar table) by using a = (b − µ)/σ, which is called the Z-score of b. The Z-score of b is the number of standard deviations it is above or below the mean of the distribution. The fact that P(−1 < Z < 1) = 0.683 can be interpreted as saying that the probability that a normal random variable falls within one

mean = 0, sd = 1 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 x –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 x mean = 0, sd = 2 –6 –5 –4 –3 –2 –1 0 1 2 3 4 5 6 x mean = 2, sd = 1

standard deviation of its mean is 0.683. Therefore, in the statement “with high probabil- ity, normal random variables tend to be close to their mean value,” close is interpreted as being relative to the standard deviation of the random variable.

3.14 USING Z-SCORES TO COMPARE TOP