A special type of tabular and graphical presentation is the frequency distribution table (FDT) and its corresponding histogram. Specifically, these are used to depict the distribution of the data. Most of the time, these are used in technical reports. An FDT is a presentation containing non-overlapping categories or classes of a variable and the frequencies or counts of the observations falling into the categories or classes. There are two types of FDT according to the type of data being organized:
a qualitative FDT or a quantitative FDT. For a qualitative FDT, the non-overlapping categories of the variable are identified, and frequencies, as well as the percentages of observations falling into the categories, are computed. On the other hand, for a
30 40 50 60 70 80
110 130 150 170 190
weight in kg
height in cm
quantitative FDT, there are also of two types: ungrouped and grouped. Ungrouped FDT is constructed when there are only a few observations or if the data set contains only few possible values. On the other hand, grouped FDT is constructed when there is a large number of observations and when the data set involves many possible values. The distinct values are grouped into class intervals. The creation of columns for a grouped FDT follows a set of guidelines. One such procedure is described in the following steps, which is lifted from the Workbook in Statistics 1 (listed in the reference section at the end of this Teaching Guide)
Steps in the construction of a grouped FDT
1. Identify the largest data value or the maximum (MAX) and smallest data value or the minimum (MIN) from the data set and compute the range, R. The range is the difference between the largest and smallest value, i.e. R = MAX – MIN.
2. Determine the number of classes, k usingk = N, where N is the total number of observations in the data set. Round-off k to the nearest whole number. It should be noted that the computed k might not be equal to the actual number of classes constructed in an FDT.
3. Calculate the class size, c, using c = R/k. Round off c to the nearest value with precision the same as that with the raw data.
4. Construct the classes or the class intervals. A class interval is defined by a lower limit (LL) and an upper limit (UL). The LL of the lowest class is usually the MIN of the data set. The LL’s of the succeeding classes are then obtained by adding c to the LL of the preceding classes. The UL of the lowest class is obtained by subtracting one unit of measure 1
10x
! "
# $
% &
, where x is the maximum number of decimal places observed from the raw data) from the LL of the next class. The UL’s of the succeeding classes are then obtained by adding c to the UL of the preceding classes. The lowest class should contain the MIN, while the highest class should contain the MAX.
5. Tally the data into the classes constructed in Step 4 to obtain the frequency of each class. Each observation must fall in one and only one class.
6. Add (if needed) the following distributional characteristics:
a. True Class Boundaries (TCB). The TCBs reflect the continuous property of a continuous data. It is defined by a lower TCB (LTCB) and an upper TCB (UTCB).
These are obtained by taking the midpoints of the gaps between classes or by using the following formulas: LTCB = LL – 0.5(one unit of measure) and UTCB = UL + 0.5(one unit of measure).
b. Class Mark (CM). The CM is the midpoint of a class and is obtained by taking the average of the lower and upper TCB’s, i.e. CM = (LTCB + UTCB)/2.
c. Relative Frequency (RF). The RF refers to the frequency of the class as a fraction of the total frequency, i.e. RF = frequency/N. RF can be computed for both qualitative and quantitative data. RF can also be expressed in percent.
d. Cumulative Frequency (CF). The CF refers to the total number of observations greater than or equal to the LL of the class (>CF) or the total number of observations less than or equal to the UL of the class (<CF).
e. Relative Cumulative Frequency (RCF). RCF refers to the fraction of the total number of observations greater than or equal to the LL of the class (>RCF) or the fraction of the total number of observations less than or equal to the UL of the class (<RCF).
Both the <RCF and >RCF can also be expressed in percent.
The histogram is a graphical presentation of the frequency distribution table in the form of a vertical bar graph. There are several forms of the histogram and the most common form has the frequency on its vertical axis while the true class boundaries in the horizontal axis.
As an example, the FDT and its corresponding histogram of the 2012 estimated poverty incidences of 144 municipalities and cities of Region VIII are shown below.
Poverty
• Three methods of data presentation: textual, tabular and graphical
• Two or all the methods could be combined to fully describe the data at hand.
• Distribution of data is presented using frequency distribution table and histogram.
ASSESSMENT
Note: This exercise and its corresponding possible answers were lifted from Workbook in Statistics 1 (listed in the reference section)
A. You are to describe the data on the following table. Perform what is being asked for in the questions found after the table.
Table 5.2 Characteristics of the 30 members of the Batong Malake Senior Citizens Association (BMSCA) who participated in their 2009 Lakbay-Aral.
!
1. Choose a QUANTITATIVE variable from the given data set. Construct a quantitative grouped FDT for this variable. Show preliminary computations (R, k, and c). Also, construct a histogram for the data. Use appropriate labels and titles for the table and graph. Describe the characteristics of the units in the data set using a brief narrative report. Refer to the FDT and histogram constructed.
R!=!____________________!!!!!!!
k!=!____________________!!!!!!!
c!=!____________________!
Table __________________________________________________________________
Classes
Frequency (F)
RF (%)
CF RCF (%)
CM
TCB
LL UL <
CF
>
CF < RCF > RCF LTCB UTCB
Histogram:
Textual presentation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
Which of the three methods of data presentation do you think is most appropriate to use for the variable chosen in Number 1? Justify your answer.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. Choose a QUALITATIVE variable from Table 5.2 Construct an appropriate graph.
Use labels and a title for the graph.
Give a brief report describing the variable:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
Possible Answers:
1. For the quantitative variable gross monthly family income:
R = 73.1 – 10.1 = 63 k = 30 5.477 ~ 5= c = 63/5 = 12.6
Table 1. Distribution of the gross monthly family income (in thousand pesos) of the 30 Batong Malake Senior Citizens Association members who joined the Lakbay-Aral.
Classes Frequency
(F) RF (%) CF RCF (%) CM TCB
LL UL < CF > CF < RCF > RCF LTCB UTCB 10.1 22.6 9 30.00 9 30 30.00 100.00 16.35 10.05 22.65 22.7 35.2 8 26.67 17 21 56.67 70.00 28.95 22.65 35.25 35.3 47.8 7 23.33 24 13 80.00 43.33 41.55 35.25 47.85 47.9 60.4 3 10.00 27 6 90.00 20.00 54.15 47.85 60.45 60.5 73.0 2 6.67 29 3 96.67 10.00 66.75 60.45 73.05 73.1 85.6 1 3.33 30 1 100.00 3.33 79.35 73.05 85.65
Histogram:
Figure 1. Monthly gross family income (in thousand pesos) of the 30 BMSCA members.
Textual presentation:
(Sample) The monthly gross family income of the 30 BMSCA members range from 10.1 to 73.1 thousand pesos. More than half of them have income of at most 35,250 pesos. Only three of them, or 10%, have monthly family income of at least 60,450 pesos.
Which of the three methods of data presentation do you think is most appropriate to use for the variable chosen in Number 1? Justify your answer.
(Sample)
Textual presentation: It is most appropriate to use a textual presentation since the highlights of the family income of the BMSCA members can be presented.
Tabular presentation: It is most appropriate to use a tabular presentation since a lot of the numerical information can be presented and trends in the monthly income of the members can be seen.
Graphical presentation: A graphical presentation is most appropriate so that trends in the monthly income of the BMSCA are easily visible.
2. For the qualitative variable: gender
0!
2!
4!
6!
8!
10!
1! 2! 3! 4! 5! 6!
Frequency(
TCB(
10.05!!!!!!!!!!!!!!!22.65!!!!!!!!!!!!!!!!35.25!!!!!!!!!!!!!!!!47.85!!!!!!!!!!!!!!!!!60.45!!!!!!!!!!!!!!!!73.05!!!!!!!!!!!!!!!!!
85.65!
Figure 2. Distribution of the 30 BMSCA members by gender.
Brief Description: Majority of the 30 BMSCA who joined the Lakbay-Aral are males. Only 43% are females. For the qualitative variable: whether member is receiving monthly pension or not
Figure 2. Distribution of the 30 BMSCA members as to whether they are receiving monthly pension or not.
Brief Description: More than half of the 30 BMSCA members receive monthly pension. Forty percent are not receiving monthly pension.