We use the ff. notations and terms in the approximation of the variance and the standard deviation from grouped data:
The computational formula for the Variance for the population and sample are as follows:
Population Variance:
Sample Variance:
Example: Consider the frequency distribution of the Final Grades in Stat 101 of the 100
selected students. Compute for the sample variance and sample standard deviation.
Class Limits Frequency Class Mark
LCL UCL fi xi fiXi fiXi2
The z-score or the standard score measures how many standard deviations an observed value is above or below the mean:
Population z-score 𝜇𝜎 where is the population mean
is the population standard deviation
Sample z-score ̅𝑠 where ̅ is the sample mean
s is the sample standard deviation
The z-score or standard score helps determine the relative position of an observed value in the collection where the observed value is below or above the mean and it also measures how far the observed value is from the mean in terms of the size of the standard deviation.
We can use the standard score two compare two or more observed values from different data sets. We can also use the standard score in identifying possible outliers in our dataset.
Example: The mean grade in Statistics 101 is 70% and the standard deviation is 10%, whereas in Math 17, the mean grade is 80% and the standard deviation is 20%. Mark got a grade of 75% in Stat 101 and a grade of 90% in Math 17. In which subject did Mark perform better if we consider the grades of the other students in the two subjects?
Solution:
If we consider the grades of the other students in the two subjects, Mark’s score in Stat 101 is just as good as his score in Math 17. Based on the z-scores, Mark’s scores in both subjects are 0.5 standard deviations above their respective mean scores.
4.1.3 The Z-score
The coefficient of variation is the ratio of the standard deviation to the mean, expressed as a percentage. The formula of the coefficient of variation (CV) is as follows:
Population CV 𝜎𝜇 100% where is the population mean
is the population standard deviation
Sample CV ̅𝑠 100% where ̅ is the sample mean
s is the sample standard deviation
The coefficient of variation expresses the standard deviation as a percentage of the mean. A large coefficient of variation indicates that the dataset is highly variable because its standard deviation is large relative to the size of the mean.
We do not use the coefficient of variation is the mean is less than or equal to zero. When the mean is zero, then the coefficient of variation will be undefined. When the mean is negative, the coefficient of variation is meaningless.
Example: Suppose we want to buy a stock and we can select from one out of the two. The prices of stock 1 and stock 2 per share are 2100 PhP and 650 PhP respectively. Let us say that for the past months, we compiled data on a sample of prices f stock 1 and stock 2 at the close of trading and we have the following statistics:
Stocks 1 Stocks 2
Mean 2095 665
Standard Deviation 450 80
Solution: We compute for the coefficient of variation to know which stock has more variable price.
𝑠 𝑠
From the calculation, stock 1 has a more variable price than that of stock 2. Thus we will select stock 1 if we want to take chance that its price will increase. We just have to remember that by choosing stock 1, we are also taking the risk that its price will decrease.
4.1.4 The Coefficient of Variation
If it is possible to divide the histogram at the center into two identical halves, wherein each half is a mirror image of the other, then the distribution is called a symmetric
distribution. Otherwise, it is called a skewed distribution.
Relying solely on a measure of central tendency and a measure of central tendency and a measure of dispersion in figuring out the behavior of a dataset may sometimes be misleading. It is possible for two datasets to have equal means and equal standard deviations;
and yet, the shapes of their histograms are extremely different.
The figure below shows various examples of symmetric and skewed distributions. We will notice that there are two distinct types of skewness. Either the concentration of observations is on the right side of the distribution which is tapering-off on the left side or the other way around.
4.2 Measures of Skewness
4.2.1 Symmetry and Skewness
A distribution is said to be positively skewed or skewed to the right when the concentration of the values is at the left-end of the distribution and the upper tail of the distribution stretches out more than the lower tail.
A distribution is said to be negatively skewed or skewed to the left when the concentration of the values is at the right-end of the distribution and the lower tail of the distribution stretches out more than the upper tail.
Skewness presents a problem in the analysis of data because it can adversely affect the behavior of certain summary measures. For this reason, certain procedures in statistics depend on symmetric assumptions. It would be inappropriate to use these procedures in the presence of severe skewness. Sometimes we need to perform special preliminary adjustments, such as transformations before analyzing the data.
In general we should look if there is the presence of skewness in the data before analysis for us to prevent contamination or errors in the succeeding analysis because it may result to spurious conclusions.
Relationship of the Three Measure of Central Tendency and the Skewness of the Distribution
All measures of skewness that would be discussed are relative to each other, thus we can always use the following interpretations for the computed measures:
Sk = 0 symmetric distribution
Sk > 0 positively skewed distribution Sk < 0 negatively skewed distribution