Chapter VIII. Forecasting and estimation from data correlated

(1)

Chapter VIII

Forecasting and estimation from data

correlated

...

Purpose of

Chapter

(2)

Introduction

Forecasting is based on the assumption that the past predicts the future as accurately as possible, given all the information available including historical data and knowledge of any future events that might impact the forecasts. When forecasting, think carefully whether or not the past is strongly related to what you expect to see in the future…

Human being on a daily basis makes decisions that have either positive or negative impact on their future. In the business world likewise, managers makes decisions that do have impacts on the future of their respective organization. Those decisions may either be positive or negative. The fundamental nature of forecasting therefore is to make a good and reliable prediction about future event.

Forecasting can therefore be defined as the art and science of prediction of future events. It may involve taking historical data and projecting it into the future with some sort of mathematical models. Sometimes it is subjective or base on initiative prediction of future events.

Why is forecasting important?

Demand for products and services is usually uncertain. Forecasting can be used for…

 Strategic planning (long range planning)

 Finance and accounting (budgets and cost controls)

 Marketing (future sales, new products)

 Production and operations

What is the difference between estimation, extrapolation, prediction and forecasting?

Estimation is the calculated approximation of a result. This result might be a forecast but not necessarily. For

example, I can estimate that the number of cars on the Golden Gate Bridge at 5 PM yesterday was 900 by assuming the three lanes going toward Marin were at capacity, each car takes 30 feet of space, and the bridge is 9000 feet long (9000 / 30 x 3 = 900).

Extrapolation is estimating the value of a variable outside a known range of values by assuming that the estimated value follows some pattern from the known ones. The simplest and most popular form of extrapolation is estimating a linear trend based on the known data. Alternatives to linear extrapolation include polynomial and conical extrapolation. Like estimation, extrapolation can be used for forecasting but it isn't limited to forecasting.

Prediction is simply saying something about the future. Predictions are usually focused on outcomes and not the

pathway to those outcomes. For example, I could predict that by 2050 all vehicles will be powered with electric motors without explaining how we get from low adoption in 2015 to full adoption by 2050. As you can see from the previous example, predictions are not necessarily based on data.

Forecasting is the process of making a forecast or prediction. The terms forecast and prediction are often used

interchangeably but sometimes forecasts are distinguished from predictions in that forecasts often provide explanations of the pathways to an outcome. For example, an electric vehicle adoption forecast might include the pathway to full electric vehicle adoption following an S-shaped adoption pattern where few cars are electric before 2025, an inflection point occurs at 2030 with rapid adoption, and the majority of cars are electric after 2040.

Estimation, extrapolation, prediction, and forecasting are not mutually exhaustive and collectively exhaustive

(3)

order to produce plausible results. Forecasts and predictions can also occur without any kind of calculated estimations.

“FORECASTING is an estimate quantitative or qualitative factor or factors (variables) that make up a future event, based on information current or past"

Why?

 The company operates in a highly uncertain context.

 Politics, technology and environment impact on business-relevant variables: production costs, inventory turnover.

 The company must make decisions about controllable factors considering uncontrollable factors. Objective

Reducing uncertainty of future by anticipating events, whose occurrence probability is relatively high, compared to other possible events.

Controllable factors

Those for which the company decides its structure, level, political and operating mode: • Production levels

• Inventory levels • Capacity

Uncontrollable factors

Those on which the company can not decide or modify: depend on factors outside the company. • Product demand

• Competition • Economy

• Consumer behavior

CLASSIFICATION FORECASTING TECHNIQUES

BY TYPE OF DATA:

Qualitative subjective techniques: They use qualitative information (experience of experts).

Quantitative: are based on numerical data and mathematical and statistical tools used in its manufacture.

QUALITATIVE TECHNIQUES

The same technique used by two different experts may produce different results. Market research Historical analogies Delphi method General consensus Impact cross Scenario analysis MARKET RESEARCH

Information about actual market behavior through surveys for the public consumer or from the experience of sellers, to conclude on the future behavior.

HISTORICAL ANALOGIES

It is based on a comparative analysis of similar cases being studied. Try to recognize patterns of similarity to draw conclusions and get a forecast: similar products in other product markets, etc.

DELPHI METHOD

(4)

* Experts answer a questionnaire

* We obtain the mean and standard deviation of each question

* Response is requested to justify those who are outside the range of two or more deviations above the mean of each question.

* You pass this review to all participants and reapplied the questionnaire.

* The process is repeated until a consensus on the various questions and to identify subgroups of opinions. * With the information obtained proceeds to decision making.

GENERAL CONSENSUS

Meets to a group of experts, from brainstorm discussions are established to reach agreement reflect the feelings of the majority.

IMPACT CROSS

Built to matrix to study the effects of various factors on the likelihood of an event, and the impact that this may have another series of events.

* Determine the events included in the study

* To estimate the initial probability of each event and the conditional probability of each pair of events

* Select events at random and calculate the impact on other events as a result of the occurrence of the event or not chosen.

SCENARIO ANALYSIS

Describe different possible future scenarios (more likely, likely, unlikely) considering influencing factors (changes in population, inflation, changes in demand) to recognize the long-term implications of possible changes.

QUANTITATIVE TECHNIQUES

INFORMATION: historical data required variables involved.

SUPPOSED: the historical pattern remains valid variables tested in the future.

EXTRAPOLATIVE: curve fitting and smoothing methods. The patterns observed in the past are projected into the future.

TIME SERIES ANALYSIS: decomposition methods and ARIMA (autoregressive, integrated and moving average). CAUSAL MODELS: econometric models (regression)

STAGES OF A FORECASTING

1. Define the purpose

2. Collect data: primary or secondary sources 3. Preparing Data: sorting and classifying

4. Selecting the proper technique: qualitative or quantitative 5. Perform Forecasting: estimating errors

6. FOLLOW-UP: addressing current information

SELECTING THE PROPER TECHNIQUE:

The best technique is one that:

Facilitate decision making at the right time That is understood by the decision maker Spend a cost-benefit analysis

Comply with system constraints: time available, data availability computing Meet criteria: accuracy, stability, objectivity

TYPES OF DATA

OBSERVED at a precise moment of time: a day, an hour, a week, etc. Example: a feature observed in a sample of

(5)

Objective: To extrapolate to the entire population of the sample characteristics.

TIME SERIES: A time series is a set of digital data which are obtained at regular intervals over time. The time unit

can be: time, day, month, quarter, year or any period that may be of interest, i.e. is a chronological sequence of observations of a variable at equal intervals of time.

Example: Quarterly sales of the last five years, unemployment in recent years, the price of a product over time, etc. Objective: To analyze patterns of the past that can be extrapolated to the future.

What should we consider when looking at past demand data?

 TREND: Component very long term representing growth or decline of the data in an extended period. Forces that affect and explain TREND:

* Population growth * Inflation

* Sales of a product in its growing stage in the life cycle.

 SEASON: short-term component. Periodic fluctuations in periods whose frequency is less than a year, about the same time and almost with the same intensity.

(6)

 RANDOM VARIATION: Very short-term component. Irregular or sporadic movements or short term.

CASE OF STUDY

Considering that the movement of international visitor in Peru is increasing year by year, we’d like to analyze this movement through forecasting analysis.

Objetives

 Describe the trend of the series of airport movement of passengers for international flights in the period January 2001 - October 2011

 Predicting the movement of passenger airport for international flights in the period January 2001 - October 2011

Note: Open the original data (International flies) that you find on Website

Steps in SPSS:

For the First step in SPSS is necessary to create the season period, follow the next figure: Go to Data>define dates

(7)

In the data View appear the following result:

Then you go for Forescasting analysis:

1. Make “Sequence Charts”:

Analyze>Forecasting>Sequence Chart (click)

Pass (international visitors) on Variables > on Time axis Labels pass (Date. Format)

(8)

Interpret the following Outcomes of SPSS software.

SEQUENCE GRAPHIC

Figure 1. International passenger airport Movement 2001 – 2011

Interpretation:

The movement of passengers at International Airport “Jorge Chavez” is as follows: • A growing trend

• The variance of the number is increasing as more time: There is no homogeneity

• The series features seasonality: seasonal variations were observed in the months of June, July and August, where the component takes the highest values (see Table 1: seasonal factors).

•Cyclicit. It is stationary: The pattern of seasonal variation remains constant year after year. • We can observe the presence of the random component

The objective is to get an idea about what is o what are the months are where the components take the highest values, we ask to SPSS for Seasonal Decomposition.

(9)

Analyze>Forecasting>Seasonal Decomposition>pass to the variable(s) (international visitors)>Model type (click in additive)

Table 1. Seasonal Factors

Series Name: International_visitors

Period Seasonal Factor

1 -6503.18580

2 -7259.34414

3 -61.56914

4 -5674.40247

5 -4953.96914

6 June 4201.86049 7 July 18776.98086 8 August 7128.63086

9 -2092.26080

10 -341.88580

11 -2972.46080

12 -248.39414

Before making predictions is important to know the state of the variance of the series, the diagram Box plots (figure 2) shows variability airport visitor movement which displays a non constant variance over time.

Steps in SPSS for diagram Box plots:

Graphs>Chart builder>select box plot>drag simple boxplot>pass variable (International_Visitors To “Y” axis) and variable Year to “X” axis>Ok

(10)

According to Figure above we observed that the variance is not constant because the amplitudes are not equal boxes presents only some variability. Therefore, we ask the dispersion level through the Levene Test: power estimate to see the most suitable type of transformation.

Stabilization of variance

Analyze> Descriptive Statistic>Explore>pass variable (international Visitors) on Dependent List, and Year in Factor List>next go to Plots> and result on Power estimation>continue>OK

Go to the last graph which is shown in this output that is (Spred vs. Level Plot of International_visitors by Year)

(11)

In the graph above you can see “The power estimation” that is (Slope= .686 Power for transformation = .314), so 1-0.86 = 0.314, When this result is close to zero we have to transform the original variable to Natural Log.

Steps in SPSS for transform the original variable to Natural Log:

Transform>Compute Variable>In target variable write (Visitor_Ln) and find in Function group (select Arithmetic> in Functions and Special Variables: select (Ln)> pass the variable (international visitors)>OK

On the screen on data View appear like the following figure:

SEQUENCE CHART WITH TRANSFORMED DATA

Applications of Natural log transform to stabilize the variance

(12)

(13)

DESCRIPTIVE STATISTICS

Follow the below steps, and at the end click OK:

Output descriptive

Table 2: Descriptive statistics

International_visitors Mean Standard Deviation

YEAR, not periodic

2001 37900.75 5720.90

2002 44024.92 7207.26

2003 47833.33 6229.32

2004 47241.33 7313.90

2005 49349.00 7054.64

2006 51697.92 7536.53

2007 61772.00 10153.49

2008 73480.08 9763.46

2009 75656.42 7988.44

2010 87795.08 11843.21

2011 100298.64 12077.83

We can see that, the standard deviations of each year do not grow proportionally as the average value grows, which is an indication that the pattern of aggregation of the components of this series can be additive, also we sow on the Box plot before.

Additive Model

Yt Tt + St + It

Original Serie STC_1 + SAF_1 + ERR_1 Where: Tt Trend-cycle (STC_1)

St Seasonal (SAF_1) It Random Part (ERR_1)

We can see that the airport movement has a pattern that repeats about every 12 months (around July is the peak month)

(14)

Weighted average: Endpoints weighted by 0.5

When we work before Additive Model appear the follow announcement

Results:

ERR_1: Errors (random part) SAF_1: Seasonal Factors

SAS_1: Seasonal adjusted series = Original Series - Series SAF_1 STC_1: Trend-cycle

Created estimated trend of international passenger airport movement 2001 -2011

(15)

Estimation with seasonal decomposition method

To measure or eliminate the seasonal effect, we have to deseasonalize the "Ln serie (visitors_Ln)" of the international visitors.

Estimate for November 2011

We determine the seasonal variation index for November 2011. This corresponds to the value 131. As the pattern is 12 months, so we that divide 131/12 = 10 with 11 residue

(we can locate the corresponding seasonal factor vissitors (SAF_2) to 11 (NOV) ≈ -0.04787

Original Serie STC_2 + SAF_2.

STC_2 = the last value of the calculated trend, that is the month 130 (Nov 2011) = 11.54411

The decomposition model estimated for November: Ln Yt+1 =11.54411-0.04787

Anti(Ln) Y131 = exp(11.54411-0.04787) Anti(Ln) Y131= exp(11.49624)