r
Chapte I
OVERVIEW OF STATISTICS
...
…...
Objetive
Chapter
Prepare students how to obtaining data
and transform it into information to
describe, synthesizing, analyzing, and
interpreting information by using table,
graphs and summary statistic.
Explain the distinction between types of
data.
1. Introduction of the course
This course prepares students how to obtaining data and transform it into information to describe, synthesizing, analyzing, and interpreting information by using table, graphs and summary statistic. The course covers the fundamental tools and features for descriptive statistics and probability, and demonstrates real-world applications, particularly those related to the field of business. It covers the following topics: statistics language, the language is what helps you know what a problem is asking for, what results are needed, and how to describe and evaluate the results in a statistically correct manner; data gathering, organization and presentation of data, summarize data using: measures of central tendency, measures of variability, measure of position and measure of distribution. Introduction to probability and probability distribution, discrete and continuous distributions of random variables.
Analyze dataset using Window Excel and Statistical Package for the Social Sciences SPSS v 25.
Therefore the descriptive statistics helps students to use methods to describe and evaluate the result in a statistically correct manner in the context of a Christian worldview.
Having completed the course the student should be able to:
understand the difference between descriptive statistics, probability, inferential statistics, and demonstrate real world applications using examples from the field of Business;
acquire an important understanding about randomness and its influence on the computing decisions made every day use appropriate probability concepts were used with correct interpretations;
understand published graphical presentation of data and present statistical data to others in graphical form;
summarize and analyze statistical data and interpret the analysis; identify relationships between pairs of variables;
use Statistical Package for the Social Science (SPSS) and Microsoft Excel.
Why we study descriptive Statistics?
The study of statistics will serve to enhance and further develop critical and analytic thinking skills. To do well in statistics one must develop and use formal logical thinking abilities that are both high level and creative.
Students and professional people may be called on to conduct research in their field, since statistical procedures are basic to research. To accomplish this, they must be able to design experiments; collect, organize, analyze, and summarize data; and possibly make reliable predictions or forecasts for future use. They must be able to communicate the results of the study in their own words.
Students and professional people can also use the knowledge gained from studying statistics to become better consumers and citizens. For example, we can make intelligent decisions about what products to purchase based on consumer studies, government spending based on utilization studies, and so on.
1.1 The Branches of Statistics
1.2 Definitions of terms
Data. Data can be defined as a collection of facts or information from which conclusions may be drawn (the raw material of statistics is data). Data can be collected from sources or through observation, surveys, or by doing experiments.
Population. Population is a complete set of individuals, objects, and measurements having some common characteristics. Population is denoted by “N", e.g. if we want to determine the average of I.Q. The score of all university students in Rwanda, then the population is all students who are in Rwanda's universities in 2019.
Sample it is not usually possible, or not practical, to examine every member of a population, so we use a sample, a smaller selection taken from that population, to estimate some value or characteristic of the whole population. Care must be taken when selecting the sample as it must be representative of the whole population under consideration otherwise it doesn't tell us anything relevant to that particular population.
Parameter is a numerical measure that describes a variable (characteristic) of a population that you’re interested in estimating or testing (such as a population mean or proportion). For example the population mean is a parameter that is often used to indicate the average value of quantity e.g. the median net hourly wage of Rwanda's total population is 450 Rwandan francs (RWF).
https://wageindicator.org/Wageindicatorfoundation/publications/2013/wages-in-rwanda
Statistics the statistics is a field of study concerned with (1) the collection, organization, summarization, and analysis of data; and (2) the drawing of inferences about a group of data when only a part of the data is observed.
Variable is a characteristic number, or quantity that can be measured or counted that changes or varies over time and/or for different individuals or objects under consideration e.g. gender, household income, business income and expenses, country of birth, capital expenditure, etc.
Sources of data:
You start each statistical analysis by identifying the source of the data; we begin to look for adequate data that will serve as raw material for our research. Such data is generally available in one or more of the following sources:
Published Sources are the data available in print or in electronic form, including data found on internet website. Primary data sources are those published by the individual or group that collected the data. Secondary data sources are those compiled from primary sources e.g. National Bank of Rwanda: http://www.bnr.rw/index.php?id=213; National Institute of Statistics of Rwanda (NISR): http://www.statistics.gov.rw/
Routinely kept records. Any type of organization that does keep records of day-to-day transactions of its activities. Hospital medical records, for example, contain immense amounts of information on patients, while hospital accounting records contain a wealth of data on the facility’s business activities.
Surveys. If the data needed to answer a question are not available from routinely kept records, the logical source may be a survey, for this sometimes we use questionnaire or similar means to gather values for the responses from a set of participants.
Experiments. Frequently the data needed to answer a question are available only as the result of an experiment. For example, the Human Resources department may wish to know which of the various strategies is best to maximize worker compliance, therefore, the HR Department could conduct an experiment in which it measures the strategies that workers are using, and then they are given a program to help them maximize these strategies, at the end of the program workers are again measured and you can see if the program has been effective or not. The subsequent evaluation of the responses to the different strategies could allow HR to decide which strategy is the most effective.
1.3 Software Tools. There are many software tools for statistical analysis, such as SPSS, STATISTICA and others. The collected data must be stored in tabular form ("data matrix"). SPSS is an important and popularized software product of the type driven by menus in window environments with user-friendly data editing, representation and graphic support in an interactive way. SPSS requires a minimum time for familiarization and allows the user to easily perform statistical analysis using a philosophy based on spreadsheets to operate with the data. In terms of flexibility, SPSS provides a command language and macro construction facilities.
SPSS
Begin by opening SPSS 25 for Windows.
1. Click on the IBM SPSS shortcut button on your desktop. OR
Data Entry. SPSS runs on Windows and Mac operating systems, but the focus of these notes is Windows.
Variable view: You can define information about your variables by accessing the Variable View tab (at the bottom of the Data Editor window). The Variable View tab displays information about the variables in your data. Click the Variable View tab at the bottom.
Data view: used for entering, editing and modifying data.
Menus and Toolbars
Various pull-down menus appear at the top of the Data Editor window. These pull-down menus are at the heart of using SPSS. The Data Editor Menu items (with some of the uses of the menu) are:
FILE: Standard options for opening, saving, printing and exiting.
EDIT: Used to copy and paste data values; used to find data in a file; insert variables and cases.
VIEW: Options for showing/hiding toolbars, displaying values or their labels in Data Editor. DATA: Identify duplicate cases, merge files, split file, select cases, weight cases, etc. TRANSFORM: Compute new variables, recode variables.
ANALYZE: This menu provides access to the statistical procedures for analyzing your data set. All the items on the analyze menu have sub menus.
DIRECT MARKETING: It allows you to perform advanced analysis of clients or contacts to improve your marketing campaigns and maximize the ROI of your marketing budget.
GRAPHS: Provide options to create high quality plots and charts.
UTILITIES: Used to display information on individual variables (add comments to accompany data file (and other advanced features).
Add-ons: The SPSS extension packages are additional features of the program that you can add to SPSS (advanced statistical procedures)
WINDOW: Provides option for switch between data, syntax and navigator windows.
HELP: Contains SPSS help system (for example Select Help|Case Studies. Provides hands-on examples of how to create various types of statistical analyses and how to interpret the results).
Note. Dear student, for more information I recommend opening the file "Tutorial of SPSS Vs 23. Dr. Rosa Padilla" that is posted on the website course or you can go to the AUCA library or surf the internet and you will find how to learn and how to use this important statistical software.
1.4 Types of Data and Scales of Measurement
Variables can be classified in several ways. One method of classification refers to the type and amount of information contained in the data. Data are either categorical or numerical. Another method is to classify data by levels of measurement: nominal, ordinal, interval or ratio.
Classification of variables according of types of data
Categorical. This is generally non-numerical data which is placed into exclusive categories and then counted rather than measured; their values are describing by words.
This type of variable can be broken down into two types: Nominal and Ordinal.
1. Nominal data is merely descriptive (e.g. religion, country name, sex). Any assigned numerical value is merely for convenience (e.g. religion: Catholic = 1, Adventist = 2, Other = 3)
2. Ordinal-Level data has rank order, though intervals between data points cannot be considered equal (e.g. Income (high/medium/low); Severity (poor, average, high).
Numerical. Numerical or quantitative data arise from counting, measuring something, or some kind of mathematical operation.
This type of variable can be broken down into two types: Discrete and Continuous.
1. Discrete. Often such data are integers. E.g. number of takeoffs (departures) at Kigali International Airport; the number of people shopping in a supermarket.
Classification of variables according of measurement level
1. Nominal-Level data is merely descriptive, the data describing it are simple labels or names which cannot be ordered (e.g. religion, country name, sex). Any assigned numerical value is merely for convenience (e.g. Religion: Catholic = 1, Adventist = 2, Other = 3)
2. Ordinal-Level data has rank in a meaningful order, though intervals between data points cannot be considered equal (e.g. household income (high/medium/low); severity (poor/ average/ high))
3. Interval-Level, this kind of measurement not only assigns rank or order. The major strength of this scale lies in the fact that they have equal units of measurement. However they do not possess a true zero (zero is not natural). For example, temperature scales are interval data with 25C warmer than 20C and a 5C difference has some physical meaning. Note that 0C is arbitrary, so that it does not make sense to say that 20C is twice as hot as 10C.
4. Ratio scale is the strongest level of measurement, here the measures are not only expressed in equal unit but true zero also exists. The zero here indicates absence of quality or attributes being assessed. Examples are measurement of household income, length, weight, etc. These permit statements regarding the comparative ration in relation to some quality or property existing among different individuals. For example, profit is a ratio variable (e.g. 4 million is twice 2 million).
Likert scales
Likert scales. It is a special case that is frequently used in survey research. You have undoubtedly seen such scales. Typically, a statement is made and the respondent is asked to indicate his or her agreement/disagreement on a five-point or seven-point scale using verbal anchors.
For example:
“High school students should be required to go to college to study a foreign language." (Check one)
r
St ongly
r
Disag ee
Somewhat
r
Disag ee
r r
Neithe ag ee
r r
No Disag ee
Somewhat
r
ag ee
r
St ongly
r
ag ee
Types of variable by experimental design
2. Dependent variable. The observed variable that is expected to change as a result of changes in the independent variable in an experiment.
1.5 Methods of data collection
There are many methods used to collect or obtain data for statistical analysis. Four of the most popular methods are:
1. Census. A census is a survey of a whole population
2. Sample survey. A sample survey is a study that obtains data from a subset of a population, in order to estimate population attributes.
Surveys may be administered in a variety of ways, e.g. • Personal Interview,
• Telephone Interview, and • Administered Questionnaire.
3. Experiment. It is a controlled study of a group. Experiments are very common in the medical fields. The researcher controls how members are placed study groups and which treatment each group receives. Bias can be a major issue with experiments.
4. Observational study. Is about the same as an experiment. However, the researcher does not use control groups or assign treatments.
Surveys
A questionaire is a standardised set of questions administered to the respondents in a survey
Respondents are required to interpret a preestablished set of questions and to supply the information these questions seek.
Key design principles:
Keep the questionnaire as short as possible.
Ask short, simple, and clearly questions.
Start with demographic questions to help respondents get started comfortably.
Pretest a questionnaire on a small number of people.
Think about the way you intend to use the collected data when preparing the questionnaire.
Formatting the answer
Survey items can take a variety of formats; the most common are:
1. Open-ended questions that call for numerical answers. Example: Now, thinking about your physical health, which includes physical illness and injury:
For how many days during the past 30 days was your physical health not good? 2. Closed questions with ordered response scales. Example:
Would you say that in general your health is:
(4) Very good (5) Excellent
Or closed questions with categorical response options Are you:
(1) Married (4) Separated (2) Divorced (5) Never married
(3) Widowed (6) A member of an unmarried couple
With closed questions, include all reasonable possibilities as explicit response options Are you:
(1) Married Are you:
(2) Divorced (1) Married
(3) Widowed (2) Single
(4) Separated (5) Never married
Make the question as specific as possible (about who it covers, what time period, which behaviours…)
Over the last month, that is ….. In a tipical week, how often do you how often do you read a newspaper read a newspaper?
in a tipical week?
Use words that virtually all respondents will understand
Have you ever had a heart attack? Have you ever had a myocardial infarction?
Clearly specify the attitude object of interest
Do you agree or disagree with the following statement?
Government is spending too little on education Do you think the Government is spending too litte, about the right amount, or too much on education?
(1) Strongly Agree (2) Agree
(3) Neither agree nor disagree (4) Disagree
Example of a questionnaire
Adventist University of Central Africa QUESTIONNAIRE
(02/14/2019)
Dear Colleague:
This Questionnaire is strictly confidential and will only serve as statistical information to work in the Statistic Training Session. The study is purely for academic purposes and the information given will be treated with utmost confidentiality. I am therefore, humbly request you to spare some time and answer the following questions.
Instructions: Please answer the following questions truthfully by marking a tick where appropriate and fill in the blanks.
I.GENERAL INFORMATION
Gender: 1Female 2Male
Age range (years): 1Less than 30 230-40 341-50 451 and above Marital status: 1Single 2Married 3Divorce 4Widowed 5Other Highest level of education: 1Primary 2Secondary 3University 4Other How long have you been working in your Institution?
1 Less than 1 year 21 – 3 years 34 – 5 years 4 More than 5 years
II.THE TIME OF YOUR LIFE
Please tick (√) the number that corresponds to your level of agreement. Choose one among these three statements:
r
1 = Neve ; 2 = Sometimes; 3 = Always
How do you manage time?
Items Never Sometimes Always
1. I prioritize the things that need to be done 1 2 3
2. I usually finish what I set out to do in any day 1 2 3
3. In the past I have always got tasks done on time 1 2 3
4. I feel I make the best use of my time 1 2 3
5. I can tackle difficult or unpleasant task without using delaying tactics
Chapter I Overview of Statistics
Assignment 1 – Descriptive Statistics
1. List three reasons to study the course of Descriptive Statistics
2. List three applications of Statistics in your field or specialty.
3. Mach each of the following te ms to is co ectr rr definition:
TERMS DEFINITION
( )Parameter
a. The complete collection of items under study
( ) Statistical Inference
b. A number that describes a sample characteristic
( h ) Census
c. Procedures for collecting, classifying, summarizing, and presenting data
( ) Statistics
d. A number that describes a population characteristic
( ) Population
e. The science of gathering and summarizing data and using results to make decisions ( )Descriptive
Statistics f. A subset of the population
( ) Sample
g. The process of arriving at a conclusion about a population parameter on the basis of a sample statistic
( ) Statistic
h. A survey of all the elements in a population
4. Determine whether the following data is categorical (nominal or ordinal) or numerical (continuous or discrete).
( ) The number of people living in a household ( ) The branches of Statistics
( ) The average miles per gallon on all new Fords.
( ) Customer Satisfaction
5. The portion of the population that is selected for analysis is called:
a. a sample b. a frame
c. a parameter
6. The height of an individual is an example of a: a. discrete variable
b. continuous variable c. categorical variable d. constant
7. The brand of an automobile (Toyota, kia, Nissan, MW, and so on) is an example of a: a. discrete variable
b. continuous variable c. categorical variable d. constant
8. The number of credit cards in a person’s wallet is an example of a:
a. discrete variable b. continuous variable c. categorical variable d. constant
9. Statistical inference occurs when you:
a. compute descriptive statistics from a sample b. take a complete census of a population c. present a graph of data
d. take the result of a sample and reach conclusion about a population
10. The human resources director of a large corporation wants to develop a dental benefits package and decides to select 100 employees from a list of all 5,000 workers in order to study their preferences for the various components of a potential package. All the workers in the corporation constitute the ___________
a. sample b. population c. statistic d. parameter d. a statistic
11. Those methods that involved collecting, presenting, and computing characteristics of a set of data in order to properly describe the various features of the data are called:
a. statistical inference b. the scientific method c. sampling
d. descriptive statistic
Chapter I Overview of Statistics
a. The favorite flavor of ice cream of student at your local elementary school b. The time is takes for a certain student to walk to your local elementary school
c. The distance between the home of a certain student and the local elementary school d. The number of teacher employed at your local elementary school
Answer: True or False
13. The possible responses to the question, “How long have you been living at your current residence?” are values from a continuous variable
True ( ) False ( )
14. The possible responses to the question, ”How many times in the past three months have you visited a museum?” are values from a discrete variable
True ( ) False ( ) Fill in the blank:
15. An insurance company evaluates many variables about a person before deciding on an appropriate rate for automobile insurance. The number of accidents a person has had in the past three years is an example of a ________________ variable
16. An insurance company evaluates many variables about a person before deciding on an appropriate rate for automobile insurance. The distance a person drives in a day is an example of a _________variable 17. An insurance company evaluates many variables about a person before deciding on an appropriate rate
for automobile insurance. A person’s marital status is an example of a ___________ variable
18. A numerical measure that is computed from only a sample of the population is called a ____________ 19. The portion of the population that is selected for analysis is called the __________
20. A college admission application includes many variables. The number of advanced placement courses the student has taken is an example of a ______________ variable
21. A college admission application includes many variables. The gender of the student is an example of a _______ variable