Anexos
Anexo 6 BASE DE DATOS
Using five summary statistics, namely: minimum, maximum, median, first and third quartiles, a box-plot can be constructed as follows:
1. Draw a rectangular box (horizontally or vertically) with the first and third quartiles as the endpoints. Thus the width of the box is given by the IQR which is the difference between the third and first quartiles.
2. Locate the median inside the box and identify it with a line segment.
3. Compute for 1.5 IQR. Use this value to identify markers. These markers are used to identify outliers. The lowest marker is given by Q1 – 1.5IQR while the highest marker is Q3+ 1.5IQR.Values outside these markers are said to be outliers and could be represented by a solid circle.
4. One of the two whiskers of the box-plot is a line segment joining the side of the box representing Q1 and the minimum while the other whisker is a line segment joining Q3 and the maximum. This is for the case when the minimum and maximum are not outliers. In the case that there are outliers, the whiskers will only be line segments from the side of box and its corresponding marker.
Inform also the students that a box-plot is also called box-and-whiskers plot and it could easily be generated using a statistical software. Comparison of data distributions could easily be done visually using this kind of plots. Likewise, in technical papers or reports, a box-plot is an accepted graphical presentation of data distribution.
To complete the activity for this lesson, ask each group to construct box-plots of the male and female data distributions of their assigned variable. They could further improve their textual presentation by interpreting the resulting box-plots of their data sets.
Using the sample class data, the following figures provide the box-plots for the variables heights, weights and BMI by sex of the student. The said figures confirm what were stated in the textual presentation.
Figure 9.1 Box-plots of the variable heights of the 30 students by sex.
We could also note that in Figure 9.1, the distribution of heights for the girls has a larger range because of an outlier as represented by a solid circle given on the plot.
The distribution of the girls’ heights has smaller median compared to the male distribution.
Figure 9.2 Box-plots of the variable weights of the 30 students by sex.
For the variable weights, females have a lower median weight than males, as well as less variability. The middle 50% of the female weight distribution is also observed to be contained within the range of the male weight data.
Figure 9.2 Box-plots of the variable BMI of the 30 students by sex.
As for the variable BMI, females have a lower median BMI and lower variability compared to those of males. There is, at least extremely obese female, and one is severely underweight male.
With the computed descriptive statistics and corresponding box-plot(s), the analysis or textual presentation could be further improved by describing data not only in terms of the measures but also in terms of the interpretation of box plots. Furthermore, these measures allow us to answer the guide questions provided at the start of the class.
KEY POINTS
• Descriptive measures are important statistics required in simple data analysis.
• Groups of data could be compared in terms of their descriptive measures.
• A box-plot is an approach to compare visually data distributions.
ASSESSMENT
Note: Answers are provided inside the parentheses and italicized.
In a university the grading scale that is used for a subject are as follows: 1.0; 1.25; 1.5;
1.75; 2.0; 2.25; 2.5; 2.75; 3.0; 4.0; and 5.0 Grades from 1.0 to 3.0 are passing grades with 1.0 as the highest possible grade. The grade of 5.0 is failing while 4.0 is a conditional grade. At the end of the semester, the general weighted average (GWA) of the students are computed and students with high GWAs are usually recognized.
Below is a table showing the GWA and sex of thirty students who are to be recognized in a program for having high GWAs.
Name GWA Sex
Use the approaches below to compare the academic performance of male and female students in the previous term.
1. Compute for the descriptive measures which include the measures of location such as minimum, maximum, mean, median, first and third quartiles; and measures of dispersion such the range, interquartile range (IQR) and standard deviation by sex.
Descriptive Measure Computed Value
2. Using the computed descriptive statistics, compare the two distributions in terms of their measures of location and measures of dispersions. On the average, which group of students perform better academically in the previous term? Which group varies more?
(On the average, the numerical GWA of female students is 1.51 while male students have an average GWA of 1.46 which implies that male students in this group
perform better academically than the female students. There is also difference in the numerical values of the computed medians but still the same observation that males perform better than females. However, the variability of the observations for the male students is higher compared to those of the female students. Hence, we say that the GWAs of male students vary more than those of the female students.)
3. Sort the data within each group then determine what proportion in each group is within one standard deviation of that group's mean. Are the proportions similar?
(Sorted Data of Male Students:
1.06 1.33 1.34 1.36 1.38 1.42 1.42 1.45 1.52 1.56 1.58 1.6 1.63 1.75
! ∓ σ = 1.46 ∓ 0.17 = 1.29,1.63 Note that there are 12 out of 14 observations are within the interval or 86% of the observations are within one standard deviation of the mean.
Sorted data for the female students:
1.22 1.24 1.27 1.43 1.45 1.49 1.49 1.5
4 1.5
6 1.5
6 1.5
8 1.5
9 1.5
9 1.6
4 1.7 1.7
8
! ∓ ! = 1.51 ∓ 0.16 = 1.35,1.67 Note that there are 11 out of 16 are within the interval or 69% of the observations are within one standard deviation of the mean.
The proportions of observations that are within one standard deviation of the mean for each group are not the same. The proportion for the male group is larger than that of the female group. This support the observation earlier that the GWAs of the male students are more varied compared to those of female students.)
4. Construct box-plots of the GWAs for the males and females. Compare the two data distributions of GWAs.
Visually, the two distributions of GWAs are different. The GWAs of the female students are less dispersed compared to that of the male students. Numerically, the median GWA of male students is lower than that of the female students. Hence, male students of this group perform better academically than their female counterpart. But the numerical values of the GWAs of the female students are close to each other.