• No se han encontrado resultados

3. Temporalización: La duración prevista es de 1 hora. 4. Recursos didácticos: PC, o pizarra digital, con acceso a Internet. 5. Metodología: Explicación teórica en la pizarra. Visualización de dos - LABES Práctica Final JuanJRios 2010

N/A
N/A
Protected

Academic year: 2018

Share "3. Temporalización: La duración prevista es de 1 hora. 4. Recursos didácticos: PC, o pizarra digital, con acceso a Internet. 5. Metodología: Explicación teórica en la pizarra. Visualización de dos - LABES Práctica Final JuanJRios 2010"

Copied!
11
0
0

Texto completo

(1)

1. Motivación: Los gráficos, en general, ofrecen una visión de conjunto

que facilita mucho la comprensión de la materia y la rápida observación

y percepción de los datos numéricos. Este tipo de gráficos, de Caja y

Bigotes, es de relativamente reciente inclusión en el temario de 4º ESO.

De hecho, este curso es el primero en el que yo voy a explicar este tipo

de gráficos.

2. Objetivos: Vincular los conceptos de mediana, cuartiles, valor mínimo

y máximo que los alumnos manejan individualmente pero no en forma

global.

3. Temporalización: La duración prevista es de 1 hora.

4. Recursos didácticos: PC, o pizarra digital, con acceso a Internet.

5. Metodología: Explicación teórica en la pizarra. Visualización de dos

vídeos cortos, prácticos, en inglés, con dos ejemplos muy bien

explicados, muy didácticos, que se pueden recrear en la pizarra, una vez

visualizados por los alumnos, para comprobar su asimilación por parte

de éstos.

6. Contenido teórico/práctico (en español):

Punto 0: Apunte Histórico: (5 minutos)

John Wilder Tukey (* 16 de junio de 1915 - † 26 de julio de 2000) fue un

estadístico nacido en New Bedford, Massachusetts. Tukey obtuvo un Bachiller

en Artes en 1936 y una Maestría en Ciencias en 1937, ambas en química, en la

Universidad de Brown, antes de trasladarse a la universidad de Princeton

donde recibió un Doctorado en Matemáticas. Durante la Segunda Guerra

Mundial, Tukey trabajó en la Oficina de la Investigación de Control de Fuego de

Artillería y colaboró con Samuel Wilks y William Cochran. Después de la guerra

regresó a Princeton dividiendo su tiempo entre la universidad y los Laboratorios

AT&T Bell.

Introdujo los diagramas de caja (Box Plot) en su libro de 1977,

Análisis

exploratorio de datos

.

Se retiró en 1985, Tukey murió en New Brunswick, New Jersey en el 2000.

Punto 1: Gráficos de Caja y Bigotes (10 minutos)

Este tipo de gráfico, creado por John Tukey en 1997, es muy interesante

porque permite resumir la información de una distribución de frecuencias

usando 5 medidas estadísticas: el valor mínimo, el primer cuartil, la mediana, el

tercer cuartil y el valor máximo.

(2)

El gráfico de Caja y Bigotes consiste en un rectángulo (CAJA) donde los lados

más largos muestran el RECORRIDO INTERCUARTÍLICO (RIC). Esta “caja”

está dividida por un segmento vertical que indica donde está la mediana y su

relación con los cuartiles primero y tercero.

Este rectángulo se ubica a escala sobre un segmento que tiene como extremos

los valores mínimo y máximo de la distribución. Estos segmentos que salen de

la caja en direcciones opuestas, a izquierda y derecha de la misma, se llaman

BIGOTES.

Los BIGOTES tienen una longitud máxima, de forma de modo que aquellos

valores atípicos que se separan del cuerpo principal de datos se indican

individualmente. A diferencia de otros métodos de presentación de datos, los

gráficos de caja muestran los valores atípicos de la variable. Llamaremos

valores atípicos de la variable a aquellos que están tan apartados del cuerpo

principal de los datos que bien pueden representar los efectos de causas

extrañas, como algún error de medición o registro. Su eliminación no se

justifica, ya que el propósito del gráfico de caja consiste en brindarnos un

mayor conocimiento de la forma en que se distribuyen los datos.

Punto 2: Criterio de TUKEY para fijar los extremos de los BIGOTES (10 m,)

Tukey introduce un criterio para fijar los extremos de los bigotes. Para esto

calcula 4 barreras, dos interiores y dos exteriores:

Barrera interior inferior=Primer cuartil – 1,5 . RIC

Barrera interior superior=Tercer cuartil + 1,5 . RIC

Barrera exterior inferior=Primer cuartil – 3 . RIC

Barrera exterior superior=Tercer cuartil + 3 . RIC

Recordemos que RIC (Recorrido Intercuartílico) es igual a la diferencia entre el

Tercer cuartil y el Primero.

Si se consideran los valores de la variable comprendidos entre las dos barreras

interiores, el valor mínimo de la variable y el valor máximo son los extremos de

los bigotes.

(3)

Punto 3: INFORMACION SOBRE LA SIMETRIA (5 m.)

Por otra parte, este tipo de gráfico nos proporciona información con respecto a

la simetría o asimetría de la distribución. Se utilizan los siguientes criterios:

si la mediana está en el centro de la caja o cerca de él, constituye un

indicio de simetría de los datos

si la mediana está considerablemente más cerca del primer cuartil indica

que los datos son positivamente asimétricos

si está más cerca del tercer cuartil, señala que los datos son

negativamente asimétricos.

Asimismo, la longitud relativa de los bigotes se puede emplear como un

indicio de su asimetría.

Punto 4: VISUALIZACION DE LOS VIDEOS (In English) (10 minutos.)

http://www.youtube.com/watch?v=Fhk5lDGpivo

(duración:aprox. 2 min)

http://www.youtube.com/watch?v=GMb6HaLXmjY

(duración: 6 min aprox.)

PUNTO 5: PREGUNTAS BASICAS (5 minutos)

Una vez realizado el gráfico, ¿qué tipo de preguntas debemos formular para

una mejor comprensión?

Algunas preguntas podrían ser las siguientes:

¿Qué porcentaje de los datos está representado por la caja?

¿Qué porcentaje representa cada uno de los bigotes?

¿Puede ser un bigote más largo que otro?. ¿Cuál es el significado?

¿Se encuentra la mediana siempre en el centro de la caja?

Punto 6: EJERCICIO PRÁCTICO (15 minutos)

Hildebrand (1997) propone el siguiente problema donde se muestra como

actúan las barreras interiores y exteriores:

(4)

-24,6 2,6 2,4 2,7 3,8 5,6

5,9

6,7

7,0 7,2 7,5 8,0 8,2

8,5

8,6

8,8 9,0 9,2 9,7 10,0 20,5

Trace un diagrama de caja para estos datos, señalando valores atípicos

Solución

En base a los datos obtenemos que:

Mediana: 7,5

Cuartil 1: 5,6

Cuartil 3: 8,8

RIC : 3,2

Las barreras son:

Barrera exterior inferior=Q1 - 3.0 RIC=5,6 - 3.0 (3,2)=-4,0

Barrera exterior superior=Q3 + 3.0 RIC=8,8 + 3.0 (3,2)=18,4

Barrera interior inferior=Q1 - 1.5 RIC=5,6 - 1.5 (3,2)=0,8

Barrera interior superior=Q3 + 1.5 RIC=8,8 + 1.5 (3,2)=13,6

La prueba de las barreras identifica dos valores atípicos importantes, -24,6 y

20,5 y un posible valor atípico, -2,6. (Una gráfica de los datos indica que los

valores atípicos importantes son obviamente valores extremos y que el valor

dudoso queda posiblemente excluído).

(5)

Punto 7: APUNTES EN INGLÉS

He seleccionado dos direcciones de Internet, que, la primera por su brevedad,

y la segunda porque es más completa, y con ejercicios, me han parecido las

más idóneas.

1.

Box and Whisker Diagrams

(http://www.mathsrevision.net/alevel/pages.php?page=50

Given some data, we can draw a box and whisker diagram (or box plot) to

show the spread of the data. The diagram shows the

quartiles

of the data, using

these as an indication of the spread.

The diagram is made up of a "box", which lies between the upper and lower

quartiles. The median can also be indicated by dividing the box into two.

The "whiskers" are straight line extending from the ends of the box to the

maximum and minimum values.

Outliers

(6)

Skewness

If the whisker to the right of the box is longer than the one to the left, there is

more extreme values towards the positive end and so the distribution is

positively skewed.

Similarly, if the whisker to the left is longer, the distribution is negatively skewed.

2.

Box-and-Whisker Plots:

Quartiles, Boxes, and Whiskers

(http://www.purplemath.com/modules/boxwhisk.htm)

Sections: Quartiles, boxes, and whiskers, Five-number summary, Interquartile ranges and outliers

Statistics assumes that your data points (the numbers in your list) are clustered around some central value. The "box" in the box-and-whisker plot contains, and thereby highlights, the middle half of these data points.

To create a box-and-whisker plot, you start by ordering your data (putting the values in numerical order), if they aren't ordered already. Then you find the median of your data. The median divides the data into two halves. To divide the data into quarters, you then find the medians of these two halves. Note: If you have an even number of values, so the first median was the average of the two middle values, then you include the middle values in your sub-median computations. If you have an odd number of values, so the first sub-median was an actual data point, then you do not include that value in your sub-median computations. That is, to find the sub-medians, you're only looking at the values that haven't yet been used.

You have three points: the first middle point (the median), and the middle points of the two halves (what I call the "sub-medians"). These three points divide the entire data set into quarters, called "quartiles". The top point of each quartile has a name, being a "

Q

" followed by the number of the quarter. So the top point of the first quarter of the data points is "

Q

1", and so

forth. Note that

Q

1is also the middle number for the first half of the list,

Q

2 is also the middle

number for the whole list,

Q

3is the middle number for the second half of the list, and

Q

4is the

largest value in the list.

Once you have these three points,

Q

1,

Q

2, and

Q

3, you have all you need in order to draw a

simple box-and-whisker plot. Here's an example of how it works.

Draw a box-and-whisker plot for the following data set:

4

.3, 5.1, 3.9, 4.5, 4.4, 4.9, 5.0, 4.7, 4.1, 4.6, 4.4, 4.3, 4.8, 4.4,

4.2, 4.5, 4.4

(7)

3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8,

4.9, 5.0, 5.1

The first number I need is the median of the entire set. Since there are seventeen values in this list, I need the ninth value:

3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4,

4.4,

4.5, 4.5, 4.6, 4.7, 4.8,

4.9, 5.0, 5.1

The median is

Q

2

= 4.4.

The next two numbers I need are the medians of the two halves. Since I used the "

4.4

" in the middle of the list, I can't re-use it, so my two remaining data sets are:

3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4

and

4.5, 4.5, 4.6, 4.7, 4.8, 4.9,

5.0, 5.1

The first half has eight values, so the median is the average of the middle two:

Q

1

= (4.3 + 4.3)/2 = 4.3

The median of the second half is: Copyright © Elizabeth Stapel 1999-2009 All Rights Reserved

Q

3

= (4.7 + 4.8)/2 = 4.75

Since my list values have one decimal place and range from

3.9

to

5.1

, I won't use a scale of, say, zero to ten, marked off by ones. Instead, I'll draw a number line from

3.5

to

5.5

, and mark off by tenths.

Now I'll mark off the minimum and maximum values, and

Q

1,

Q

2, and

Q

3:

The "box" part of the plot goes from

Q

1to

Q

3:

(8)

By the way, box-and-whisker plots don't have to be drawn horizontally as I did above; they can be vertical, too.

More terminology: The top end of your box may also be called the "upper hinge"; the lower end may also be called the "lower hinge". The lower hinge is also called "the

25

th percentile"; the median is "the

50

th percentile"; the upper hinge is "the

75

th percentile". This means that

25%, 50%

and

75%

of the data, respectively, is at or below that point. The distance between the hinges may be referred to as the "H-spread" or, as you will see on the following page, the "Interquartile Range", abbreviated "

IQR

". ("Hinge" actually has a different technical definition, but the term is sometimes used informally.)

Also, some books and software will include the overall median (

Q

2) when computing

Q

1 and

Q

3 for data sets with an odd number of elements. The Texas Instruments calculators do not

include

Q

2 in this case, so you may encounter a book answer that doesn't match the calculator

answer. And different software packages use all different sorts of formulas. Be careful to use the formula from your book when doing your homework!

Additionally, the box-and-whisker plot may include a cross or an "X" marking the mean value of the data, in addition to the line inside the box that marks the median. The difference between the "X" and the median line can then be used as a measure of "skew".

Please don't ask me to explain "skew".

Draw the box-and-whisker plot for the following data set:

77, 79, 80, 86, 87, 87, 94, 99

My first step is to find the median. Since there are eight data points, the median will be the average of the two middle values: (

86 + 87) ÷ 2 = 86.5 = Q

2

This splits the list into two halves:

77, 79, 80, 86

and

87, 87, 94, 99

. Since the halves of the data set each contain an even number of values, the sub-medians will be the average of the middle two values.

Q

1

= (79 + 80) ÷ 2 = 79.5

Q

3

= (87 + 94) ÷ 2 = 90.5

The minimum value is

77

and the maximum value is

99

, so I have:

min:

77, Q

1

: 79.5, Q

2

: 86.5, Q

3

: 90.5,

max:

99

(9)

As you can see, you only need the five values listed above (min,

Q

1

, Q

2

, Q

3, and max) in order

to draw your box-and-whisker plot. This set of five values has been given the name "the five-number summary".

Give the five-number summary of the following data set:

79, 53, 82, 91, 87, 98, 80, 93

The five-number summary consists of the numbers I need for the box-and-whisker plot: the minimum value,

Q

1(the bottom of the box),

Q

2(the median of the set),

Q

3(the top

of the box), and the maximum value (which is also

Q

4). So I need to order the set, find

the median and the sub-medians, and then list the required values in order.

ordering the list:

53, 79, 80, 82, 87, 91, 93, 98

, so the minimum is

53

and the maximum is

98

finding the median:

(82 + 87) ÷ 2 = 84.5 = Q

2

lower half of the list:

53, 79, 80, 82

, so

Q

1

= (79 + 80) ÷ 2 = 79.5

upper half of the list:

87, 91, 93, 98

, so

Q

3

= (91 + 93) ÷ 2 = 92

five-number summary:

53, 79.5, 84.5, 92, 98

Part of the point of a box-and-whisker plot is to show how spread out your values are. But what if one or another of your values is way out of line? For this, we need to consider "outliers"....

The "interquartile range", abbreviated "

IQR

", is just the width of the box in the box-and-whisker plot. That is,

IQR = Q

3

– Q

1. The

IQR

can be used as a measure of how spread-out the

values are. Statistics assumes that your values are clustered around some central value. The

IQR

tells how spread out the "middle" values are; it can also be used to tell when some of the other values are "too far" from the central value. These "too far away" points are called "outliers", because they "lie outside" the range in which we expect them.

The

IQR

is the length of the box in your box-and-whisker plot. An outlier is any value that lies more than one and a half times the length of the box from either end of the box. That is, if a data point is below

Q

1

– 1.5×IQR

or above

Q

3

+ 1.5×IQR

, it is viewed as being too far from the

central values to be reasonable. Maybe you bumped the weigh-scale when you were making that one measurement, or maybe your lab partner is an idiot and you should never have let him touch any of the equipment. Who knows? But whatever their cause, the outliers are those points that don't seem to "fit".

(Why one and a half times the width of the box? Why does that particular value demark the difference between "acceptable" and "unacceptable" values? Because, when John Tukey was inventing the box-and-whisker plot in 1977 to display these values, he picked

1.5×IQR

as the demarkation line for outliers. This has worked well, so we've continued using that value ever since.)

(10)

10.2, 14.1, 14.4. 14.4, 14.4, 14.5, 14.5, 14.6, 14.7,

14.7, 14.7, 14.9, 15.1, 15.9, 16.4

To find out if there are any outliers, I first have to find the

IQR

. There are fifteen data points, so the median will be at position

(15 + 1) ÷ 2 = 8

. Then

Q

2

= 14.6

. There are

seven data points on either side of the median, so

Q

1is the fourth value in the list and

Q

3is the twelfth:

Q

1

= 14.4

and

Q

3

= 14.9

. Then

IQR = 14.9 – 14.4 = 0.5

.

Outliers will be any points below

Q

1

– 1.5×IQR = 14.4 – 0.75 = 13.65

or above

Q

3

+

1.5×IQR = 14.9 + 0.75 = 15.65.

Then the outliers are at

10.2, 15.9,

and

16.4

.

The values for

Q

1

– 1.5×IQR

and

Q

3

+ 1.5×IQR

are the "fences" that mark off the

"reasonable" values from the outlier values. Outliers lie outside the fences.

If your assignment is having you consider outliers and "extreme values", then the values for

Q

1

– 1.5×IQR

and

Q

3

+ 1.5×IQR

are the "inner" fences and the values for

Q

1

– 3×IQR

and

Q

3

+ 3×IQR

are the "outer" fences. The outliers (marked with asterisks or open dots) are

between the inner and outer fences, and the extreme values (marked with whichever symbol you didn't use for the outliers) are outside the outer fences. Copyright © Elizabeth Stapel 1999-2009 All Rights Reserved

By the way, your book may refer to the value of "

1.5×IQR

" as being a "step". Then the outliers will be the numbers that are between one and two steps from the hinges, and extreme value will be the numbers that are more than two steps from the hinges.

Looking again at the previous example, the outer fences would be at

14.4 – 3×0.5 = 12.9

and

14.9 + 3×0.5 = 16.4

. Since

16.4

is right on the upper outer fence, this would be considered to be only an outlier, not an extreme value. But

10.2

is fully below the lower outer fence, so

10.2

would be an extreme value.

Your graphing calculator may or may not indicate whether a box-and-whisker plot includes outliers. For instance, the above problem includes the points

10.2, 15.9

, and

16.4

as outliers. One setting on my graphing calculator gives the simple box-and-whisker plot which uses only the five-number summary, so the furthest outliers are shown as being the endpoints of the whiskers:

(11)

A different calculator setting gives the box-and-whisker plot with the outliers specially marked (in this case, with a

simulation of an open dot), and the whiskers going only as far as the highest and lowest values that aren't outliers:

Note that my calculator makes no distinction between outliers and extreme values.

If you're using your graphing calculator to help with these plots, make sure you know which setting you're supposed to be using and what the results mean, or the calculator may give you a perfectly correct but "wrong" answer.

Find the outliers and extreme values, if any, for the following data set, and draw the box-and-whisker plot. Mark any outliers with an asterisk and any extreme values with an open dot.

21, 23, 24, 25, 29, 33, 49

To find the outliers and extreme values, I first have to find the

IQR

. Since there are seven values in the list, the median is the fourth value, so

Q

2

= 25

. The first half of the

list is

21, 23, 24

, so

Q

1

= 23

; the second half is

29, 33, 49

, so

Q

3

= 33

. Then

IQR =

33 – 23 = 10

.

The outliers will be any values below

23 – 1.5×10 = 23 – 15 = 8

or above

33 +

1.5×10 = 33 + 15 = 48

. The extreme values will be those below

23 – 3×10 = 23 – 30

= –7

or above

33 + 3×10 = 33 + 30 = 63

.

So I have an outlier at

49

but no extreme values, I won't have a top whisker because

Q

3 is also the highest non-outlier, and my plot looks like this:

Referencias

Documento similar

No había pasado un día desde mi solemne entrada cuando, para que el recuerdo me sirviera de advertencia, alguien se encargó de decirme que sobre aquellas losas habían rodado

Where possible, the EU IG and more specifically the data fields and associated business rules present in Chapter 2 –Data elements for the electronic submission of information

The 'On-boarding of users to Substance, Product, Organisation and Referentials (SPOR) data services' document must be considered the reference guidance, as this document includes the

In medicinal products containing more than one manufactured item (e.g., contraceptive having different strengths and fixed dose combination as part of the same medicinal

Products Management Services (PMS) - Implementation of International Organization for Standardization (ISO) standards for the identification of medicinal products (IDMP) in

Products Management Services (PMS) - Implementation of International Organization for Standardization (ISO) standards for the identification of medicinal products (IDMP) in

This section provides guidance with examples on encoding medicinal product packaging information, together with the relationship between Pack Size, Package Item (container)

Package Item (Container) Type : Vial (100000073563) Quantity Operator: equal to (100000000049) Package Item (Container) Quantity : 1 Material : Glass type I (200000003204)