• No se han encontrado resultados

for student 1

N/A
N/A
Protected

Academic year: 2020

Share "for student 1"

Copied!
57
0
0

Texto completo

(1)

Graphs and Numerical Summaries

1 Data Types

2 Distributions and Graphs

3 Measures of Center

4 Measures of Spread

(2)

Definitions:

Variable

Variable: any characteristic that takes different values for different individuals

Categorical (qualitative) variables place an individual into one of several groups

• Examples: gender, race

• Categorical variables can be string (alphanumeric) data or numeric variables that use numeric codes to represent categories (for

example, 0 = Unmarried and 1 = Married).

• There are two basic types of categorical data:

– Nominal. Categorical data where there is no inherent order to the categories. For example, a job category of sales is not higher or lower than a job category of marketing or research.

(3)

Quantitative variables (also referred as scale data) take on numerical values (mostly continuous ).

• Examples: height, age, wages

• Quantitative Data is measured on an interval or ratio scale, where the data values indicate both the order of values and the distance

between values.

• For example, a salary of $72,195 is higher than a salary of $52,398, and the distance between the two values is $19,797.

(4)

Interval Scale:

Definitions:

Variable

94

56 65 70 65 55 60 66 70 75 56

60 70 61 67 61 71 67 62 71 66

68 72 57 68 72 69 57 71 69 75

72 62 67 73 58 63 66 73 63 65

58 73 74 76 74 80 81 60 74 58

76 82 77 83 77 80 91 78 94 72

79 64 57 79 55 87 64 88 78 62

classes Freq.

(5)
(6)

Definition: Distribution

A

distribution

describes what values a variable

(7)

Distribution

1- Frequency Table

• A column for classes contain the data

(8)

Steps to form the frequency table for (categories) qualitative or small number of quantitative data:

- each value is itself a class

Steps to form Frequency table for a quantitative variable:

- In case of high number of quantitative data: each class is a range of data which has maximum and minimum values

1. Determine the highest and the lowest values of the data 2. Calculate the range= (the highest value – the lowest one) 3. Determine the number of classes (Relative)

4. Determine the length of each class: 5. Form the table:

Distribution

1- Frequency Table

classes Range L

#

(9)

Example: Find the frequency

table for the marks of the statistics class (8 classes)

56 65 70 65 55 60 66 70 75 56

60 70 61 67 61 71 67 62 71 66

68 72 57 68 72 69 57 71 69 75

72 62 67 73 58 63 66 73 63 65

58 73 74 76 74 80 81 60 74 58

76 82 77 83 77 80 91 78 94 72

79 64 57 79 55 87 64 88 78 62

Distribution : 1- Frequency Table

(10)

Example: Frequency table for a

quantitative variable

ةئفلا لوط يه امهنيب ةفاسملا نيتميق ةرابع ةئف لك

(

5

:)

يمست يلولأا ةميقلا

(

يندلأا دحلا

)

ةيناثلاو ؛

(

ىلعلأا دحلا

.)

ثيحب اهلبق يتلا ةئفلا ءاهتنا دنع أدبت ةئف لك

:

عقت تادرفملا نم ةدرفم لك

؛طقف ةدحاو ةئف لخاد

لاثم

:

يلولأا ةئفلا

:

يندلأا دحلا

=

ةجرد لقأ

(

55

)=

يلعلأا دحلا

=

يندلأا دحلا

+

ةئفلا لوط

=

55

+

5

=

60

نوكت يلولأا ةئفلا اذا

"

نم

55

نم لقأ ىلإ

60

"

ةجردلا نإ ظحلا

(

60

)

عقت لا

يلولأا ةئفلا لخاد

(11)

classes frequency

(f)

Relative frequency

55 - 60 10 0.143*

60 - 65 12 0.171

65 - 70 13 0.186

70 - 75 16 0.229

75 - 80 10 0.143

80 - 85 4 0.057

85 - 90 3 0.043

90 - 95 2 0.028

70 1

(12)

Distribution: 2- Graphs

Qualitative Data: we can graph the distribution

using

bar plots

and

pie charts

Quantitative Data:

Histograms

,

Frequency

(13)

Barplots and Pie Charts (Qualitative Data)

Pie charts are generally not as useful as barplots

Need to have all categories to make a pie chart

harder to compare subsets of categories

(14)

Sales

ةرايس ةنيفس ةرئاط ةلفاح

classes frequency

ةرايس 12

ةنيفس 10

ةرئاط 15

ةلفاح 5

sum 42

16

12

8

4

0

Barplots and Pie Charts (Qualitative Data)

(15)

Histograms (For distribution of Quantitative

Data)

Histograms emphasize

frequency

of different values

in the distribution

• X-axis: Values are divided into intervals (the length of a

classes)

• Y-axis: Height of each class is the frequency that values from

(16)

Histogram

يراركتلا جردملا

)

1

(

تانايبلاب صاخلا طيسبلا يراركتلا لودجلل ينايب ليثمت

ةيمكلا

ةقصلاتم ةينايب ةدمعأ نع ةرابع

ريغتملا ميق امنيب ،يسأرلا روحملا ىلع تاراركتلا

(

دودح

تائفلا

)

يقفلأا روحملا ىلع

هتدعاق لوطو ،ةئفلا راركت وه هعافترا ،دومعب ةئف لك لثمت

ةئفلا لوط وه

.

(17)

اهمجح ،مارجلاب نجاودلا نم ةنيع نازولأ يلاتلا يراركتلا عيزوتلا انيدل

100

( ةجاجد :)

(Graph the Histogram) .

Classes Frequency (f)

600- 10

620- 15

640- 20

660- 25

680- 20

700-720 10

Sum 100

(18)

Relative frequency table and its relative

histogram

Find the relative histogram for the previous example

Classes Frequency

(f)

600- 10

620- 15

640- 20

660- 25

680- 20

700-720 10

(19)

(F-Polygon)

طيسبلا يراركتلا لودجلل اضيأ ينايب ليثمت

يسأرلا روحملا ىلع تاراركتلا لثمت

تائفلا زكارمو

ىلع

يقفلأا روحملا

ليصوت متي كلذ دعبو ،ةميقتسم طوطخب تايثادحلإا لصوت

يقفلأا روحملاب علضملا يفرط

.

يلي امك ةئفلا زكرم بسحي

:

(20)

Graph the F-Polygon

classes

Frequency (f)

600- 10

620- 15

640- 20

660- 25

680- 20

700-720 10

Sum 100

• Example: find F-polygon for the previous example

Center

(600+620)/2= 610 (620+640)/2=630

650 670 690

(21)
(22)

F-Curve

• Similar to the F Polygon but we replace the straight lines

(23)

• The distribution of a variable can be described graphically and numerically in terms of:

Center: where are most of the values located?

Spread: how variable are the values?

Shape: is the distribution symmetric or skewed? Are there multiple peaks or just one?

(24)
(25)

3- Measures of Center

Simple examples:

• Numbers: 1, 2, 6, 2, 4, 2, 5

Mean = Median= Mode=

• Numbers: 5.8, 5.7, 5.9, 5,7, 5.5, 5.7, 5.7, 5.7, 5.6

Mean = Median= Mode=

• Throw out the number 5.5 and again find the mean,

(26)

Example: suppose we have 5 students , four of them have the following marks: 1.8, 1.72, 1.5, and 1.8. the average of their marks is 1.7. compute the mark of the fifth student.

Measures of Center

(27)

In case more than one sample and each sample has its own mean, the weighted mean of these samples is:

Example:

Two groups of Statistics classes, where the first group consists of 50 student and the mean of their grades is 15. the second group consists of 40

students and the mean is 10. compute the weighted mean

 

n x

mean i

ximeann

....

...

2 1 2 2 1 1

n

n

mean

n

mean

n

Mean

w

The weighted mean

(28)

• Example: we have two samples with the following results: find the weighted mean.

• Example: A student has taken three courses which have the

credits: 4, 3, and 5 hours. At the end of the semester, the final marks were as follows: 68, 72 and 81, respectively. Find the average mark of this semester.

The weighted mean

3- Measures of Center

(29)

1- if the sample consists of n values which are identically equal to a, then

2- the sum of the total deviations of the values from their mean equals to zero.

3- in case of adding a constant value (a) to each value in the sample then:

4- in case of multiplying a constant value (a) with each value in the sample then:

Properties of the mean

3- Measures of Center

a n na n a a a

mean   ...  

  0

1  

n i i mean x old new a mean

mean  

) .( old

new a mean

(30)

3- Measures of Center

(31)

1-

2-

Relations between the mean, mode and median:

3- Measures of Center

)

(

3

mod

e

mean

median

mean

Median

a

a

x

Median

x

(32)

Relations between the mean, mode and median:

(33)
(34)

Variation=(square of standard deviation)

2

var

iation

(35)

4- Measures of spread

(36)

1- if the sample consists of n values which are identically equal to a, then

2- in case of adding a constant value (a) to each value in the

sample then the standard deviation would not be changed

3- in case of multiplying a constant value (a) with each value in the sample then:

Properties of the Standard Deviation

4- Measures of Spread

0 

S

) .( old

new a S

(37)
(38)

4- Measures of variation

Coefficient of Variation

To measure the degree of spread of data

It is beneficial to compare between the degree of spread of two group of data or more, which have different units.

100

.

X

s

V

C

Example: find the coefficient of variation of the following data 6 . 36 , 3 . 15   X s

Example: which are more homogeneous (have less degree of spread) the weights or the lengths?

lengths اweights data

(39)

Example: A group of 20 workers have the following data about their salaries X and the work hours Y:

1- find the st. dev. For the number of work hours. 2- Find the sum of the total salaries of the workers. 3- which has less of spread X or Y?

     1 1 2 2 1 1 n i i X x n S 2 1 1 2 2 1 1 X x n S n i i

    

   3982 , 184 16 , 2000 2 Y Y S X X

4- Measures of spread

Standard deviation

(40)

Example:

Two different samples have been taken from a specific population, and the results were as the below table.

1- which of these two samples is more homogeneous?

2- If we merge the two samples then find the standard deviation of the new sample.

ةيناثلا ةنيعلا ىلولأا ةنيعلا

7560 660 60 1 2 60 1  

  i i X X 3200 300 30 1 2 30 1  

  i i Y Y

4- Measures of spread

(41)

Quartiles

4- Measures of spread

(42)

4- Measures of spread

(43)
(44)

Outliers

(45)
(46)

Almost all values are between 5 and 13

50% of values are between 7.5 and 10

Center (Median) is around 8.5

Couple of suspected outliers: 14 and 14.5

(47)

Histograms versus Boxplots

• Both graphs give a good idea of the spread

• Boxplots may be a little clearer in terms of the center and

outliers in a distribution

center

outliers

spread of likely values

(48)
(49)
(50)
(51)
(52)

Associations between Variables

Positively associated

if increased values of one

variable tend to occur with increased values of

the other

Negatively associated

if increased values of one

variable occur with decreased values of the other

Old Faithful: eruption duration is positively

associated with interval between durations

Remember that

association is not proof of

(53)

Correlation

Correlation is a measure of the strength of

linear

relationship between variables X and Y

Correlation has a range between -1 and 1

r = 1 means the relationship between X and Y is exactly positive linear

r = -1 means the relationship between X and Y is exactly negative linear

r = 0 means that there is no linear relationship between X and Y

(54)

Measure of Strength

(55)

Pearson Correlation

Correlation of two variables:

We divide by standard deviation of both X

and Y, so correlation has no units



r

1

n

1

(

x

i

x

)(

y

i

y

)

s

x

s

y

(56)

X 4 5 3 4 4

Y 3 2 4 4 2

Find the Pearson correlation

y x i i

s

s

y

y

x

x

n

r

(

)(

)

1

1

(57)

X 4 5 3 4 2

Y 3 2 4 4 1

XY Y

X 2, 2,

2 2

, ,

,

, Y XY X Y

Referencias

Documento similar

H4: Personal variables - such as age and gender -, professional variables - such as professional experi- ence - and organisational variables - such as the num- ber of students and

On the other hand, the Open Cities project, designed to test the open innovation methodologies in Smart Cities services , has also experimented extensively with the possibility

Bilu’s equidistribution theorem establishes that, given a strict sequence of points on the N -dimensional algebraic torus whose Weil height tends to zero, the Galois orbits of

Each container is composed of a data payload (usu- ally also referred simply as burst or packet) and a header packet (HP).. generated when the burstification process is finished

The dijet-mass distribution observed in data (points with error bars) and expected (histograms) with the Medium and Tight b-tagging categories (also referred to as MM and TT in

This paper presents a Spanish version of the NEP Scale for Children, examines children’s ecological beliefs according to socio­demographic variables as well as

In this paper we analyze a finite element method applied to a continuous downscaling data assimilation algorithm for the numerical approximation of the two and three dimensional

In this paper we characterize topologically the empty interior subsets of a compact surface S which can be ω-limit sets of recurrent orbits (but of no nonrecurrent ones) of