Select Page

This project is about finding the best forecasting model for 6 known stocks. The first part is to find the 6 stocks that we need to build a model on. Second is to try to build an index that can be used for predicting the chosen stocks. Finally is to write a 5 pages report that explains the process that you did. I will upload the book of the class and the assignment description so that it can make more sense. You need to be an expert in the forecasting concepts and the use of Minitab Software to solve this problem.EIND 468 | 558 – Mini-Project # 3 – Due Tuesday – 13 October

Use the same stock you used last week + 5 additional stocks of your choosing.
1. Get the daily close from 1 Jan – 30 Sep 2020
2. Build your best model for each stock using the tools we have covered in the course.
3. Use these six stocks to create an “index” of your choosing.
▪ Create daily closing data for your index using the stock closing data
4. Use your seven models to predict the closing prices for each for the first seven trading
days of October (do not bring in the October data).
5. Write a ~5 page report that outlines:
▪ The stocks you chose and how you combined them to create an index.
▪ The methods used to build your best models, highlighting any differences
between the approaches based on differences in your data.
▪ Outline each of your best models and how they were used to develop the seven
day forecast.
▪ Outline the process you recommend be implemented using your model to make a
profit on trading these stocks going forward. Assume you have \$100,000 to
invest in any mix of these stocks.
▪ Gather the actual closing information and show how your process would have
made or lost money if implemented in the first 7 trading days of October.
6. Submit a Word document of your report + your analysis file(s) to the assignment folder
by the due date.
Wiley Series in Probability and Statistics
Introduction to
Time Series
Analysis and
Forecasting
Second Edition
Douglas C. Montgomery
Cheryl L. Jennings
Murat Kulahci
INTRODUCTION TO
TIME SERIES ANALYSIS
AND FORECASTING
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice,
Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott,
Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg
Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane,
Jozef L. Teugels
A complete list of the titles in this series appears at the end of this volume.
INTRODUCTION TO
TIME SERIES ANALYSIS
AND FORECASTING
Second Edition
DOUGLAS C. MONTGOMERY
Arizona State University
Tempe, Arizona, USA
CHERYL L. JENNINGS
Arizona State University
Tempe, Arizona, USA
MURAT KULAHCI
Technical University of Denmark
Lyngby, Denmark
and
Luleå University of Technology
Luleå, Sweden
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400,
fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission
should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street,
Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts
in preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, including
but not limited to special, incidental, consequential, or other damages.
our Customer Care Department within the United States at (800) 762-2974, outside the United States
at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data applied for.
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
CONTENTS
PREFACE
xi
1 INTRODUCTION TO FORECASTING
1
1.1
1.2
1.3
1.4
The Nature and Uses of Forecasts / 1
Some Examples of Time Series / 6
The Forecasting Process / 13
Data for Forecasting / 16
1.4.1 The Data Warehouse / 16
1.4.2 Data Cleaning / 18
1.4.3 Imputation / 18
1.5 Resources for Forecasting / 19
Exercises / 20
2 STATISTICS BACKGROUND FOR FORECASTING
25
2.1 Introduction / 25
2.2 Graphical Displays / 26
2.2.1 Time Series Plots / 26
2.2.2 Plotting Smoothed Data / 30
2.3 Numerical Description of Time Series Data / 33
2.3.1 Stationary Time Series / 33
v
vi
CONTENTS
2.3.2 Autocovariance and Autocorrelation Functions / 36
2.3.3 The Variogram / 42
2.4 Use of Data Transformations and Adjustments / 46
2.4.1 Transformations / 46
2.4.2 Trend and Seasonal Adjustments / 48
2.5 General Approach to Time Series Modeling and
Forecasting / 61
2.6 Evaluating and Monitoring Forecasting Model
Performance / 64
2.6.1 Forecasting Model Evaluation / 64
2.6.2 Choosing Between Competing Models / 74
2.6.3 Monitoring a Forecasting Model / 77
2.7 R Commands for Chapter 2 / 84
Exercises / 96
3 REGRESSION ANALYSIS AND FORECASTING
107
3.1 Introduction / 107
3.2 Least Squares Estimation in Linear Regression Models / 110
3.3 Statistical Inference in Linear Regression / 119
3.3.1 Test for Significance of Regression / 120
3.3.2 Tests on Individual Regression Coefficients and
Groups of Coefficients / 123
3.3.3 Confidence Intervals on Individual Regression
Coefficients / 130
3.3.4 Confidence Intervals on the Mean Response / 131
3.4 Prediction of New Observations / 134
3.5 Model Adequacy Checking / 136
3.5.1 Residual Plots / 136
3.5.2 Scaled Residuals and PRESS / 139
3.5.3 Measures of Leverage and Influence / 144
3.6 Variable Selection Methods in Regression / 146
3.7 Generalized and Weighted Least Squares / 152
3.7.1 Generalized Least Squares / 153
3.7.2 Weighted Least Squares / 156
3.7.3 Discounted Least Squares / 161
3.8 Regression Models for General Time Series Data / 177
CONTENTS
vii
3.8.1 Detecting Autocorrelation: The Durbin–Watson
Test / 178
3.8.2 Estimating the Parameters in Time Series
Regression Models / 184
3.9 Econometric Models / 205
3.10 R Commands for Chapter 3 / 209
Exercises / 219
4 EXPONENTIAL SMOOTHING METHODS
233
4.1 Introduction / 233
4.2 First-Order Exponential Smoothing / 239
4.2.1 The Initial Value, ỹ 0 / 241
4.2.2 The Value of / 241
4.3 Modeling Time Series Data / 245
4.4 Second-Order Exponential Smoothing / 247
4.5 Higher-Order Exponential Smoothing / 257
4.6 Forecasting / 259
4.6.1 Constant Process / 259
4.6.2 Linear Trend Process / 264
4.6.3 Estimation of e2 / 273
4.6.4 Adaptive Updating of the Discount Factor / 274
4.6.5 Model Assessment / 276
4.7 Exponential Smoothing for Seasonal Data / 277
4.7.1 Additive Seasonal Model / 277
4.7.2 Multiplicative Seasonal Model / 280
4.8 Exponential Smoothing of Biosurveillance Data / 286
4.9 Exponential Smoothers and Arima Models / 299
4.10 R Commands for Chapter 4 / 300
Exercises / 311
5 AUTOREGRESSIVE INTEGRATED MOVING
AVERAGE (ARIMA) MODELS
5.1 Introduction / 327
5.2 Linear Models for Stationary Time Series / 328
5.2.1 Stationarity / 329
5.2.2 Stationary Time Series / 329
327
viii
CONTENTS
5.3 Finite Order Moving Average Processes / 333
5.3.1 The First-Order Moving Average Process,
MA(1) / 334
5.3.2 The Second-Order Moving Average Process,
MA(2) / 336
5.4 Finite Order Autoregressive Processes / 337
5.4.1 First-Order Autoregressive Process, AR(1) / 338
5.4.2 Second-Order Autoregressive Process, AR(2) / 341
5.4.3 General Autoregressive Process, AR(p) / 346
5.4.4 Partial Autocorrelation Function, PACF / 348
5.5 Mixed Autoregressive–Moving Average Processes / 354
5.5.1 Stationarity of ARMA(p, q) Process / 355
5.5.2 Invertibility of ARMA(p, q) Process / 355
5.5.3 ACF and PACF of ARMA(p, q) Process / 356
5.6 Nonstationary Processes / 363
5.6.1 Some Examples of ARIMA(p, d, q) Processes / 363
5.7 Time Series Model Building / 367
5.7.1 Model Identification / 367
5.7.2 Parameter Estimation / 368
5.7.3 Diagnostic Checking / 368
5.7.4 Examples of Building ARIMA Models / 369
5.8 Forecasting Arima Processes / 378
5.9 Seasonal Processes / 383
5.10 Arima Modeling of Biosurveillance Data / 393
5.12 R Commands for Chapter 5 / 401
Exercises / 412
6 TRANSFER FUNCTIONS AND INTERVENTION
MODELS
6.1
6.2
6.3
6.4
6.5
6.6
Introduction / 427
Transfer Function Models / 428
Transfer Function–Noise Models / 436
Cross-Correlation Function / 436
Model Specification / 438
Forecasting with Transfer Function–Noise Models / 456
427
CONTENTS
ix
6.7 Intervention Analysis / 462
6.8 R Commands for Chapter 6 / 473
Exercises / 486
7 SURVEY OF OTHER FORECASTING METHODS
493
7.1 Multivariate Time Series Models and Forecasting / 493
7.1.1 Multivariate Stationary Process / 494
7.1.2 Vector ARIMA Models / 494
7.1.3 Vector AR (VAR) Models / 496
7.2 State Space Models / 502
7.3 Arch and Garch Models / 507
7.4 Direct Forecasting of Percentiles / 512
7.5 Combining Forecasts to Improve Prediction
Performance / 518
7.6 Aggregation and Disaggregation of Forecasts / 522
7.7 Neural Networks and Forecasting / 526
7.8 Spectral Analysis / 529
7.9 Bayesian Methods in Forecasting / 535
7.10 Some Comments on Practical Implementation and Use of
Statistical Forecasting Procedures / 542
7.11 R Commands for Chapter 7 / 545
Exercises / 550
APPENDIX A STATISTICAL TABLES
561
APPENDIX B DATA SETS FOR EXERCISES
581
APPENDIX C INTRODUCTION TO R
627
BIBLIOGRAPHY
631
INDEX
639
PREFACE
Analyzing time-oriented data and forecasting future values of a time series
are among the most important problems that analysts face in many fields,
ranging from finance and economics to managing production operations,
to the analysis of political and social policy sessions, to investigating the
impact of humans and the policy decisions that they make on the environment. Consequently, there is a large group of people in a variety of fields,
including finance, economics, science, engineering, statistics, and public
policy who need to understand some basic concepts of time series analysis
and forecasting. Unfortunately, most basic statistics and operations management books give little if any attention to time-oriented data and little
guidance on forecasting. There are some very good high level books on
time series analysis. These books are mostly written for technical specialists who are taking a doctoral-level course or doing research in the field.
They tend to be very theoretical and often focus on a few specific topics
or techniques. We have written this book to fill the gap between these two
extremes.
We have made a number of changes in this revision of the book. New
material has been added on data preparation for forecasting, including
dealing with outliers and missing values, use of the variogram and sections
on the spectrum, and an introduction to Bayesian methods in forecasting.
We have added many new exercises and examples, including new data sets
in Appendix B, and edited many sections of the text to improve the clarity
of the presentation.
xi
xii
PREFACE
Like the first edition, this book is intended for practitioners who make
real-world forecasts. We have attempted to keep the mathematical level
modest to encourage a variety of users for the book. Our focus is on shortto medium-term forecasting where statistical methods are useful. Since
many organizations can improve their effectiveness and business results
by making better short- to medium-term forecasts, this book should be
useful to a wide variety of professionals. The book can also be used as a
textbook for an applied forecasting and time series analysis course at the
could come from engineering, business, statistics, operations research,
mathematics, computer science, and any area of application where making
forecasts is important. Readers need a background in basic statistics (previous exposure to linear regression would be helpful, but not essential),
and some knowledge of matrix algebra, although matrices appear mostly
in the chapter on regression, and if one is interested mainly in the results,
the details involving matrix manipulation can be skipped. Integrals and
derivatives appear in a few places in the book, but no detailed working
knowledge of calculus is required.
Successful time series analysis and forecasting requires that the analyst interact with computer software. The techniques and algorithms are
just not suitable to manual calculations. We have chosen to demonstrate
the techniques presented using three packages: Minitab® , JMP® , and R,
and occasionally SAS® . We have selected these packages because they
are widely used in practice and because they have generally good capability for analyzing time series data and generating forecasts. Because R is
increasingly popular in statistics courses, we have included a section in each
chapter showing the R code necessary for working some of the examples in
the chapter. We have also added a brief appendix on the use of R. The basic
principles that underlie most of our presentation are not specific to any
particular software package. Readers can use any software that they like or
have available that has basic statistical forecasting capability. While the text
examples do utilize these particular software packages and illustrate some
of their features and capability, these features or similar ones are found
in many other software packages.
There are three basic approaches to generating forecasts: regressionbased methods, heuristic smoothing methods, and general time series
models. Because all three of these basic approaches are useful, we give
an introduction to all of them. Chapter 1 introduces the basic forecasting
problem, defines terminology, and illustrates many of the common features of time series data. Chapter 2 contains many of the basic statistical
tools used in analyzing time series data. Topics include plots, numerical
PREFACE
xiii
summaries of time series data including the autocovariance and autocorrelation functions, transformations, differencing, and decomposing a time
series into trend and seasonal components. We also introduce metrics for
evaluating forecast errors and methods for evaluating and tracking forecasting performance over time. Chapter 3 discusses regression analysis and its
use in forecasting. We discuss both crosssection and time series regression
data, least squares and maximum likelihood model fitting, model adequacy
checking, prediction intervals, and weighted and generalized least squares.
The first part of this chapter covers many of the topics typically seen in an
introductory treatment of regression, either in a stand-alone course or as
part of another applied statistics course. It should be a reasonable review
for many readers. Chapter 4 presents exponential smoothing techniques,
both for time series with polynomial components and for seasonal data.
We discuss and illustrate methods for selecting the smoothing constant(s),
forecasting, and constructing prediction intervals. The explicit time series
modeling approach to forecasting that we have chosen to emphasize is
the autoregressive integrated moving average (ARIMA) model approach.
Chapter 5 introduces ARIMA models and illustrates how to identify and
fit these models for both nonseasonal and seasonal time series. Forecasting and prediction interval construction are also discussed and illustrated.
Chapter 6 extends this discussion into transfer function models and intervention modeling and analysis. Chapter 7 surveys several other useful topics from time series analysis and forecasting, including multivariate time
series problems, ARCH and GARCH models, and combinations of forecasts. We also give some practical advice for using statistical approaches
to forecasting and provide some information about realistic expectations.
The last two chapters of the book are somewhat higher in level than the
first five.
Each chapter has a set of exercises. Some of these exercises involve
analyzing the data sets given in Appendix B. These data sets represent an
interesting cross section of real time series data, typical of those encountered in practical forecasting problems. Most of these data sets are used
in exercises in two or more chapters, an indication that there are usually
several approaches to analyzing, modeling, and forecasting a time series.
There are other good sources of data for practicing the techniques given in
this book. Some of the ones that we have found very interesting and useful
include the U.S. Department of Labor—Bureau of Labor Statistics (http://
www.bls.gov/data/home.htm), the U.S. Department of Agriculture—
National Agricultural Statistics Service, Quick Stats Agricultural Statistics
Data (http://www.nass.usda.gov/Data_and_Statistics/Quick_Stats/index.
asp), the U.S. Census Bureau (http://www.census.gov), and the U.S.
xiv
PREFACE
Department of the Treasury (http://www.treas.gov/offices/domesticfinance/debt-management/interest-rate/). The time series data library
created by Rob Hyndman at Monash University (http://www-personal.
buseco.monash.edu.au/∼hyndman/TSDL/index.htm) and the time series
data library at the Mathematics Department of the University of York
(http://www.york.ac.uk/depts/maths/data/ts/) also contain many excellent
data sets. Some of these sources provide links to other data. Data sets and
other materials related to this book can be found at ftp://ftp.wiley.com/
public/scitechmed/ timeseries.
We would like to thank the many individuals who provided feedback
and suggestions for improvement to the first edition. We found these suggestions most helpful. We are indebted to Clifford Long who generously
provided the R codes he used with his students when he taught from the
book. We found his codes very helpful in putting the end-of-chapter R code
sections together. We also have placed a premium in the book on bridging
the gap between theory and practice. We have not emphasized proofs or
technical details and have tried to give intuitive explanations of the material whenever possible. The result is a book that can be used with a wide
variety of audiences, with different interests and technical backgrounds,
whose common interests are understanding how to analyze time-oriented
data and constructing good short-term statistically based forecasts.
We express our appreciation to the individuals and organizations who
have given their permission to use copyrighted material. These materials
are noted in the text. Portions of the output contained in this book are
printed with permission of Minitab Inc. All material remains the exclusive
Douglas C. Montgomery
Cheryl L. Jennings
Murat Kulahci
CHAPTER 1
INTRODUCTION TO FORECASTING
It is difficult to make predictions, especially about the future
NEILS BOHR, Danish physicist
1.1 THE NATURE AND USES OF FORECASTS
A forecast is a prediction of some future event or events. As suggested by
Neils Bohr, making good predictions is not always easy. Famously “bad”
forecasts include the following from the book Bad Predictions:
r “The population is constant in size and will remain so right up to the
end of mankind.” L’Encyclopedie, 1756.
r “1930 will be a splendid employment year.” U.S. Department of
Labor, New Year’s Forecast in 1929, just before the market crash on
October 29.
r “Computers are multiplying at a rapid rate. By the turn of the century
there will be 220,000 in the U.S.” Wall Street Journal, 1966.
Introduction to Time Series Analysis and Forecasting, Second Edition.
Douglas C. Montgomery, Cheryl L. Jennings and Murat Kulahci.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.
1
2
INTRODUCTION TO FORECASTING
Forecasting is an important problem that spans many fields including
business and industry, government, economics, environmental sciences,
medicine, social science, politics, and finance. Forecasting problems are
often classified as short-term, medium-term, and long-term. Short-term
forecasting problems involve predicting events only a few time periods
(days, weeks, and months) into the future. Medium-term forecasts extend
from 1 to 2 years into the future, and long-term forecasting problems
can extend beyond that by many years. Short- and medium-term forecasts
are required for activities that range from operations management to budgeting and selecting new research and development projects. Long-term
forecasts impact issues such as strategic planning. Short- and medium-term
forecasting is typically based on identifying, modeling, and extrapolating
the patterns found in historical data. Because these historical data usually exhibit inertia and do not change dramatically very quickly, statistical
methods are very useful for short- and medium-term forecasting. This book
is about the use of these statistical methods.
Most forecasting problems involve the use of time series data. A time
series is a time-oriented or chronological sequence of observations on a
variable of interest. For example, Figure 1.1 shows the market yield on US
Treasury Securities at 10-year constant maturity from April 1953 through
December 2006 (data in Appendix B, Table B.1). This graph is called a time
16
14
Rate, %
12
10
8
6
4
2
Apr-53 Jul-58 Nov-63 Mar-69 Jul-74 Nov-79 Mar-85 Jul-90 Nov-95 Mar-01 Dec-06
Month
FIGURE 1.1 Time series plot of the market yield on US Treasury Securities at
10-year constant maturity. Source: US Treasury.
THE NATURE AND USES OF FORECASTS
3
series plot. The rate variable is collected at equally spaced time periods, as
is typical in most time series and forecasting applications. Many business
applications of forecasting utilize daily, weekly, monthly, quarterly, or
annual data, but any reporting interval may be used. Furthermore, the data
may be instantaneous, such as the viscosity of a chemical product at the
point in time where it is measured; it may be cumulative, such as the total
sales of a product during the month; or it may be a statistic that in some
way reflects the activity of the variable during the time period, such as the
daily closing price of a specific stock on the New York Stock Exchange.
The reason that forecasting is so important is that prediction of future
events is a critical input into many types of planning and decision-making
processes, with application to areas such as the following:
1. Operations Management. Business organizations routinely use forecasts of product sales or demand for services in order to schedule
production, control inventories, manage the supply chain, determine
staffing requirements, and plan capacity. Forecasts may also be used
to determine the mix of products or services to be offered and the
locations at which products are to be produced.
2. Marketing. Forecasting is important in many marketing decisions.
Forecasts of sales response to advertising expenditures, new promotions, or changes in pricing polices enable businesses to evaluate
their effectiveness, determine whether goals are being met, and make
3. Finance and Risk Management. Investors in financial assets are interested in forecasting the returns from their investments. These assets
include but are not limited to stocks, bonds, and commodities; other
investment decisions can be made relative to forecasts of interest
rates, options, and currency exchange rates. Financial risk management requires forecasts of the volatility of asset returns so that
the risks associated with investment portfolios can be evaluated and
insured, and so that financial derivatives can be properly priced.
4. Economics. Governments, financial institutions, and policy organizations require forecasts of major economic variables, such as gross
domestic product, population growth, unemployment, interest rates,
inflation, job growth, production, and consumption. These forecasts
are an integral part of the guidance behind monetary and fiscal policy, and budgeting plans and decisions made by governments. They
are also instrumental in the strategic planning decisions made by
4
INTRODUCTION TO FORECASTING
5. Industrial Process Control. Forecasts of the future values of critical quality characteristics of a production process can help determine when important controllable variables in the process should be
changed, or if the process should be shut down and overhauled. Feedback and feedforward control schemes are widely used in monitoring
and adjustment of industrial processes, and predictions of the process
output are an integral part of these schemes.
6. Demography. Forecasts of population by country and regions are
made routinely, often stratified by variables such as gender, age,
and race. Demographers also forecast births, deaths, and migration
patterns of populations. Governments use these forecasts for planning
policy and social service actions, such as spending on health care,
retirement programs, and antipoverty programs. Many businesses
use forecasts of populations by age groups to make strategic plans
regarding developing new product lines or the types of services that
will be offered.
These are only a few of the many different situations where forecasts
are required in order to make good decisions. Despite the wide range of
problem situations that require forecasts, there are only two broad types of
forecasting techniques—qualitative methods and quantitative methods.
Qualitative forecasting techniques are often subjective in nature and
require judgment on the part of experts. Qualitative forecasts are often
used in situations where there is little or no historical data on which to base
the forecast. An example would be the introduction of a new product, for
which there is no relevant history. In this situation, the company might use
the expert opinion of sales and marketing personnel to subjectively estimate
product sales during the new product introduction phase of its life cycle.
Sometimes qualitative forecasting methods make use of marketing tests,
surveys of potential customers, and experience with the sales performance
of other products (both their own and those of competitors). However,
although some data analysis may be performed, the basis of the forecast is
subjective judgment.
Perhaps the most formal and widely known qualitative forecasting technique is the Delphi Method. This technique was developed by the RAND
Corporation (see Dalkey [1967]). It employs a panel of experts who are
assumed to be knowledgeable about the problem. The panel members are
physically separated to avoid their deliberations being impacted either by
social pressures or by a single dominant individual. Each panel member
responds to a questionnaire containing a series of questions and returns the
information to a coordinator. Following the first questionnaire, subsequent
THE NATURE AND USES OF FORECASTS
5
questions are submitted to the panelists along with information about the
opinions of the panel as a group. This allows panelists to review their predictions relative to the opinions of the entire group. After several rounds,
it is hoped that the opinions of the panelists converge to a consensus,
although achieving a consensus is not required and justified differences of
opinion can be included in the outcome. Qualitative forecasting methods
are not emphasized in this book.
Quantitative forecasting techniques make formal use of historical data
and a forecasting model. The model formally summarizes patterns in the
data and expresses a statistical relationship between previous and current
values of the variable. Then the model is used to project the patterns in
the data into the future. In other words, the forecasting model is used to
extrapolate past and current behavior into the future. There are several
types of forecasting models in general use. The three most widely used
are regression models, smoothing models, and general time series models. Regression models make use of relationships between the variable of
interest and one or more related predictor variables. Sometimes regression
models are called causal forecasting models, because the predictor variables are assumed to describe the forces that cause or drive the observed
values of the variable of interest. An example would be using data on house
purchases as a predictor variable to forecast furniture sales. The method
of least squares is the formal basis of most regression models. Smoothing
models typically employ a simple function of previous observations to
provide a forecast of the variable of interest. These methods may have a
formal statistical basis, but they are often used and justified heuristically
on the basis that they are easy to use and produce satisfactory results. General time series models employ the statistical properties of the historical
data to specify a formal model and then estimate the unknown parameters
of this model (usually) by least squares. In subsequent chapters, we will
discuss all three types of quantitative forecasting models.
The form of the forecast can be important. We typically think of a forecast as a single number that represents our best estimate of the future value
of the variable of interest. Statisticians would call this a point estimate or
point forecast. Now these forecasts are almost always wrong; that is, we
experience forecast error. Consequently, it is usually a good practice to
accompany a forecast with an estimate of how large a forecast error might
be experienced. One way to do this is to provide a prediction interval (PI)
to accompany the point forecast. The PI is a range of values for the future
observation, and it is likely to prove far more useful in decision-making
than a single number. We will show how to obtain PIs for most of the
forecasting methods discussed in the book.
6
INTRODUCTION TO FORECASTING
Other important features of the forecasting problem are the forecast
horizon and the forecast interval. The forecast horizon is the number
of future periods for which forecasts must be produced. The horizon is
often dictated by the nature of the problem. For example, in production
planning, forecasts of product demand may be made on a monthly basis.
Because of the time required to change or modify a production schedule,
ensure that sufficient raw material and component parts are available from
the supply chain, and plan the delivery of completed goods to customers
or inventory facilities, it would be necessary to forecast up to 3 months
ahead. The forecast horizon is also often called the forecast lead time.
The forecast interval is the frequency with which new forecasts are prepared. For example, in production planning, we might forecast demand on
a monthly basis, for up to 3 months in the future (the lead time or horizon), and prepare a new forecast each month. Thus the forecast interval is
1 month, the same as the basic period of time for which each forecast is
made. If the forecast lead time is always the same length, say, T periods, and
the forecast is revised each time period, then we are employing a rolling or
moving horizon forecasting approach. This system updates or revises the
forecasts for T−1 of the periods in the horizon and computes a forecast for
the newest period T. This rolling horizon approach to forecasting is widely
used when the lead time is several periods long.
1.2 SOME EXAMPLES OF TIME SERIES
Time series plots can reveal patterns such as random, trends, level shifts,
periods or cycles, unusual observations, or a combination of patterns. Patterns commonly found in time series data are discussed next with examples
of situations that drive the patterns.
The sales of a mature pharmaceutical product may remain relatively
flat in the absence of unchanged marketing or manufacturing strategies.
Weekly sales of a generic pharmaceutical product shown in Figure 1.2
appear to be constant over time, at about 10,400 × 103 units, in a random
sequence with no obvious patterns (data in Appendix B, Table B.2).
To assure conformance with customer requirements and product specifications, the production of chemicals is monitored by many characteristics.
These may be input variables such as temperature and flow rate, and output
properties such as viscosity and purity.
Due to the continuous nature of chemical manufacturing processes,
output properties often are positively autocorrelated; that is, a value
above the long-run average tends to be followed by other values above the
SOME EXAMPLES OF TIME SERIES
7
Units, in Thousands
10,800
10,600
10,400
10,200
10,000
9800
1
12
24
36
48
60
72
84
96
108
120
Week
FIGURE 1.2 Pharmaceutical product sales.
average, while a value below the average tends to be followed by other
values below the average.
The viscosity readings plotted in Figure 1.3 exhibit autocorrelated
behavior, tending to a long-run average of about 85 centipoises (cP), but
with a structured, not completely random, appearance (data in Appendix B,
Table B.3). Some methods for describing and analyzing autocorrelated data
will be described in Chapter 2.
90
89
Viscosity, cP
88
87
86
85
84
83
82
81
1
10
20
30
40
50
60
70
80
90
Time period
FIGURE 1.3
100
8
INTRODUCTION TO FORECASTING
The USDA National Agricultural Statistics Service publishes agricultural statistics for many commodities, including the annual production of
dairy products such as butter, cheese, ice cream, milk, yogurt, and whey.
These statistics are used for market analysis and intelligence, economic
indicators, and identification of emerging issues.
Blue and gorgonzola cheese is one of 32 categories of cheese for which
data are published. The annual US production of blue and gorgonzola
cheeses (in 103 lb) is shown in Figure 1.4 (data in Appendix B, Table
B.4). Production quadrupled from 1950 to 1997, and the linear trend has
a constant positive slope with random, year-to-year variation.
The US Census Bureau publishes historic statistics on manufacturers’
shipments, inventories, and orders. The statistics are based on North American Industry Classification System (NAICS) code and are utilized for purposes such as measuring productivity and analyzing relationships between
employment and manufacturing output.
The manufacture of beverage and tobacco products is reported as part of
the nondurable subsector. The plot of monthly beverage product shipments
(Figure 1.5) reveals an overall increasing trend, with a distinct cyclic
pattern that is repeated within each year. January shipments appear to be
the lowest, with highs in May and June (data in Appendix B, Table B.5).
This monthly, or seasonal, variation may be attributable to some cause
Production, 1000 lbs
40,000
30,000
20,000
10,000
0
1950
1960
1970
1980
1990
1997
FIGURE 1.4 The US annual production of blue and gorgonzola cheeses. Source:
USDA–NASS.
9
7000
6000
5000
4000
3000
De
c20
7
9
2
4
4
6
5
5
6
3
8
0
1
2
3
99 99 99 99 99 99 99 99 00 00 00 00 00 00 00
-1 n-1 n-1 n-1 n-1 n-1 n-1 n-1 n-2 n-2 n-2 n-2 n-2 n-2 n-2
n
Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja Ja
06
Beverage shipments, Millions of dollars
SOME EXAMPLES OF TIME SERIES
FIGURE 1.5 The US beverage manufacturer monthly product shipments, unadjusted. Source: US Census Bureau.
such as the impact of weather on the demand for beverages. Techniques
for making seasonal adjustments to data in order to better understand
general trends will be discussed in Chapter 2.
To determine whether the Earth is warming or cooling, scientists look at
annual mean temperatures. At a single station, the warmest and the coolest
temperatures in a day are averaged. Averages are then calculated at stations
all over the Earth, over an entire year. The change in global annual mean
surface air temperature is calculated from a base established from 1951 to
1980, and the result is reported as an “anomaly.”
The plot of the annual mean anomaly in global surface air temperature
(Figure 1.6) shows an increasing trend since 1880; however, the slope, or
rate of change, varies with time periods (data in Appendix B, Table B.6).
While the slope in earlier time periods appears to be constant, slightly
increasing, or slightly decreasing, the slope from about 1975 to the present
appears much steeper than the rest of the plot.
Business data such as stock prices and interest rates often exhibit nonstationary behavior; that is, the time series has no natural mean. The daily
closing price adjusted for stock splits of Whole Foods Market (WFMI)
stock in 2001 (Figure 1.7) exhibits a combination of patterns for both
mean level and slope (data in Appendix B, Table B.7).
While the price is constant in some short time periods, there is no
consistent mean level over time. In other time periods, the price changes
10
INTRODUCTION TO FORECASTING
Average annual anomaly, ºC
0.75
0.50
0.25
0.00
−0.25
−0.50
1880
1900
1920
1940
1960
1980
2000
FIGURE 1.6 Global mean surface air temperature annual anomaly. Source:
NASA-GISS.
at different rates, including occasional abrupt shifts in level. This is an
example of nonstationary behavior, which will be discussed in Chapter 2.
The Current Population Survey (CPS) or “household survey” prepared
by the US Department of Labor, Bureau of Labor Statistics, contains
national data on employment, unemployment, earnings, and other labor
market topics by demographic characteristics. The data are used to report
45
40
35
30
25
20
2-Jan-01
2-Apr-01
2-Jul-01
1-Oct-01
31-Dec-01
FIGURE 1.7 Whole foods market stock price, daily closing adjusted for splits.
SOME EXAMPLES OF TIME SERIES
11
7.0
6.5
6.0
5.5
5.0
4.5
4.0
3.5
Jan-1995
Jan-1997
Jan-1999
Jan-2001
Jan-2003
Dec-2004
FIGURE 1.8 Monthly unemployment rate—full-time labor force, unadjusted.
Source: US Department of Labor-BLS.
on the employment situation, for projections with impact on hiring and
training, and for a multitude of other business planning activities. The data
of regular patterns that occur each year.
The plot of monthly unadjusted unemployment rates (Figure 1.8)
exhibits a mixture of patterns, similar to Figure 1.5 (data in Appendix B,
Table B.8). There is a distinct cyclic pattern within a year; January, February, and March generally have the highest unemployment rates. The overall
level is also changing, from a gradual decrease, to a steep increase, followed by a gradual decrease. The use of seasonal adjustments as described
in Chapter 2 makes it easier to observe the nonseasonal movements in time
series data.
Solar activity has long been recognized as a significant source of noise
impacting consumer and military communications, including satellites, cell
phone towers, and electric power grids. The ability to accurately forecast
solar activity is critical to a variety of fields. The International Sunspot
Number R is the oldest solar activity index. The number incorporates both
the number of observed sunspots and the number of observed sunspot
groups. In Figure 1.9, the plot of annual sunspot numbers reveals cyclic
patterns of varying magnitudes (data in Appendix B, Table B.9).
series plots may also draw attention to the occurrence of atypical events.
Weekly sales of a generic pharmaceutical product dropped due to limited
12
INTRODUCTION TO FORECASTING
Yearly sunspot number
200
150
100
50
0
1700
1750
1800
1850
1900
1950
2006
FIGURE 1.9 The international sunspot number. Source: SIDC.
availability resulting from a fire at one of the four production facilities.
The 5-week reduction is apparent in the time series plot of weekly sales
shown in Figure 1.10.
Another type of unusual event may be the failure of the data measurement or collection system. After recording a vastly different viscosity
reading at time period 70 (Figure 1.11), the measurement system was
550
Units, in Thousands
500
450
400
350
300
1
12
24
36
48
60
72
84
96
Week
FIGURE 1.10 Pharmaceutical product sales.
108
120
THE FORECASTING PROCESS
13
90
85
Viscosity, cP
80
75
70
65
60
55
1
10
20
30
40
50
60
70
80
90
100
Time period
FIGURE 1.11 Chemical process viscosity readings, with sensor malfunction.
checked with a standard and determined to be out of calibration. The cause
was determined to be a malfunctioning sensor.
1.3 THE FORECASTING PROCESS
A process is a series of connected activities that transform one or more
inputs into one or more outputs. All work activities are performed in
processes, and forecasting is no exception. The activities in the forecasting
process are:
1.
2.
3.
4.
5.
6.
7.
Problem definition
Data collection
Data analysis
Model selection and fitting
Model validation
Forecasting model deployment
Monitoring forecasting model performance
These activities are shown in Figure 1.12.
Problem definition involves developing understanding of how the forecast will be used along with the expectations of the “customer” (the user of
14
INTRODUCTION TO FORECASTING
Problem
definition
Data
collection
Data
analysis
Model
selection
and fitting
Model
validation
Forecasting
model
deployment
Monitoring
forecasting
model
performance
FIGURE 1.12 The forecasting process.
the forecast). Questions that must be addressed during this phase include
the desired form of the forecast (e.g., are monthly forecasts required), the
forecast horizon or lead time, how often the forecasts need to be revised
(the forecast interval), and what level of forecast accuracy is required in
order to make good business decisions. This is also an opportunity to introduce the decision makers to the use of prediction intervals as a measure of
the risk associated with forecasts, if they are unfamiliar with this approach.
Often it is necessary to go deeply into many aspects of the business system
that requires the forecast to properly define the forecasting component of
the entire problem. For example, in designing a forecasting system for
inventory control, information may be required on issues such as product
shelf life or other aging considerations, the time required to manufacture
or otherwise obtain the products (production lead time), and the economic
consequences of having too many or too few units of product available
to meet customer demand. When multiple products are involved, the level
of aggregation of the forecast (e.g., do we forecast individual products or
families consisting of several similar products) can be an important consideration. Much of the ultimate success of the forecasting model in meeting
the customer expectations is determined in the problem definition phase.
Data collection consists of obtaining the relevant history for the variable(s) that are to be forecast, including historical information on potential
predictor variables.
The key here is “relevant”; often information collection and storage
methods and systems change over time and not all historical data are
useful for the current problem. Often it is necessary to deal with missing
values of some variables, potential outliers, or other data-related problems
that have occurred in the past. During this phase, it is also useful to begin
planning how the data collection and storage issues in the future will be
handled so that the reliability and integrity of the data will be preserved.
Data analysis is an important preliminary step to the selection of the
forecasting model to be used. Time series plots of the data should be constructed and visually inspected for recognizable patterns, such as trends
and seasonal or other cyclical components. A trend is evolutionary movement, either upward or downward, in the value of the variable. Trends may
THE FORECASTING PROCESS
15
be long-term or more dynamic and of relatively short duration. Seasonality
is the component of time series behavior that repeats on a regular basis,
such as each year. Sometimes we will smooth the data to make identification of the patterns more obvious (data smoothing will be discussed in
Chapter 2). Numerical summaries of the data, such as the sample mean,
standard deviation, percentiles, and autocorrelations, should also be computed and evaluated. Chapter 2 will provide the necessary background to
do this. If potential predictor variables are available, scatter plots of each
pair of variables should be examined. Unusual data points or potential
outliers should be identified and flagged for possible further study. The
purpose of this preliminary data analysis is to obtain some “feel” for the
data, and a sense of how strong the underlying patterns such as trend and
seasonality are. This information will usually suggest the initial types of
quantitative forecasting methods and models to explore.
Model selection and fitting consists of choosing one or more forecasting models and fitting the model to the data. By fitting, we mean estimating
the unknown model parameters, usually by the method of least squares. In
subsequent chapters, we will present several types of time series models
and discuss the procedures of model fitting. We will also discuss methods for evaluating the quality of the model fit, and determining if any
of the underlying assumptions have been violated. This will be useful in
discriminating between different candidate models.
Model validation consists of an evaluation of the forecasting model
to determine how it is likely to perform in the intended application. This
must go beyond just evaluating the “fit” of the model to the historical data
and must examine what magnitude of forecast errors will be experienced
when the model is used to forecast “fresh” or new data. The fitting errors
will always be smaller than the forecast errors, and this is an important
concept that we will emphasize in this book. A widely used method for
validating a forecasting model before it is turned over to the customer is
to employ some form of data splitting, where the data are divided into
two segments—a fitting segment and a forecasting segment. The model is
fit to only the fitting data segment, and then forecasts from that model are
simulated for the observations in the forecasting segment. This can provide
useful guidance on how the forecasting model will perform when exposed
to new data and can be a valuable approach for discriminating between
competing forecasting models.
Forecasting model deployment involves getting the model and the
resulting forecasts in use by the customer. It is important to ensure that the
customer understands how to use the model and that generating timely forecasts from the model becomes as routine as possible. Model maintenance,
16
INTRODUCTION TO FORECASTING
including making sure that data sources and other required information
will continue to be available to the customer is also an important issue that
impacts the timeliness and ultimate usefulness of forecasts.
Monitoring forecasting model performance should be an ongoing
activity after the model has been deployed to ensure that it is still performing satisfactorily. It is the nature of forecasting that conditions change
over time, and a model that performed well in the past may deteriorate
in performance. Usually performance deterioration will result in larger or
more systematic forecast errors. Therefore monitoring of forecast errors
is an essential part of good forecasting system design. Control charts
of forecast errors are a simple but effective way to routinely monitor the
performance of a forecasting model. We will illustrate approaches to monitoring forecast errors in subsequent chapters.
1.4 DATA FOR FORECASTING
1.4.1 The Data Warehouse
Developing time series models and using them for forecasting requires
data on the variables of interest to decision-makers. The data are the raw
materials for the modeling and forecasting process. The terms data and
information are often used interchangeably, but we prefer to use the term
data as that seems to reflect a more raw or original form, whereas we think
of information as something that is extracted or synthesized from data. The
output of a forecasting system could be thought of as information, and that
output uses data as an input.
In most modern organizations data regarding sales, transactions, company financial and business performance, supplier performance, and customer activity and relations are stored in a repository known as a data
warehouse. Sometimes this is a single data storage system; but as the
volume of data handled by modern organizations grows rapidly, the data
warehouse has become an integrated system comprised of components
that are physically and often geographically distributed, such as cloud data
storage. The data warehouse must be able to organize, manipulate, and
integrate data from multiple sources and different organizational information systems. The basic functionality required includes data extraction,
data from internal sources and from external sources such as third party
vendors or government entities and financial service organizations. Once
the data are extracted, the transformation stage involves applying rules to
prevent duplication of records and dealing with problems such as missing
information. Sometimes we refer to the transformation activities as data
DATA FOR FORECASTING
17
cleaning. We will discuss some of the important data cleaning operations
subsequently. Finally, the data are loaded into the data warehouse where
they are available for modeling and analysis.
Data quality has several dimensions. Five important ones that have been
described in the literature are accuracy, timeliness, completeness, representativeness, and consistency. Accuracy is probably the oldest dimension
of data quality and refers to how close that data conform to its “real”
values. Real values are alternative sources that can be used for verification purposes. For example, do sales records match payments to accounts
receivable records (although the financial records may occur in later time
periods because of payment terms and conditions, discounts, etc.)? Timeliness means that the data are as current as possible. Infrequent updating
of data can seriously impact developing a time series model that is going
to be used for relatively short-term forecasting. In many time series model
applications the time between the occurrence of the real-world event and
its entry into the data warehouse must be as short as possible to facilitate
model development and use. Completeness means that the data content is
complete, with no missing data and no outliers. As an example of representativeness, suppose that the end use of the time series model is to forecast
customer demand for a product or service, but the organization only records
booked orders and the date of fulfillment. This may not accurately reflect
demand, because the orders can be booked before the desired delivery
period and the date of fulfillment can take place in a different period than
the one required by the customer. Furthermore, orders that are lost because
of product unavailability or unsatisfactory delivery performance are not
recorded. In these situations demand can differ dramatically from sales.
Data cleaning methods can often be used to deal with some problems of
completeness. Consistency refers to how closely data records agree over
time in format, content, meaning, and structure. In many organizations
how data are collected and stored evolves over time; definitions change
and even the types of data that are collected change. For example, consider
monthly data. Some organizations define “months” that coincide with the
traditional calendar definition. But because months have different numbers
of days that can induce patterns in monthly data, some organizations prefer
to define a year as consisting of 13 “months” each consisting of 4 weeks.
It has been suggested that the output data that reside in the data warehouse are similar to the output of a manufacturing process, where the raw
data are the input. Just as in manufacturing and other service processes, the
data production process can benefit by the application of quality management and control tools. Jones-Farmer et al. (2014) describe how statistical
quality control methods, specifically control charts, can be used to enhance
data quality in the data production process.
18
INTRODUCTION TO FORECASTING
1.4.2 Data Cleaning
Data cleaning is the process of examining data to detect potential errors,
missing data, outliers or unusual values, or other inconsistencies and then
correcting the errors or problems that are found. Sometimes errors are
the result of recording or transmission problems, and can be corrected by
working with the original data source to correct the problem. Effective data
cleaning can greatly improve the forecasting process.
Before data are used to develop a time series model, it should be subjected to several different kinds of checks, including but not necessarily
limited to the following:
1. Is there missing data?
2. Does the data fall within an expected range?
3. Are there potential outliers or other unusual values?
These types of checks can be automated fairly easily. If this aspect
of data cleaning is automated, the rules employed should be periodically
evaluated to ensure that they are still appropriate and that changes in the
data have not made some of the procedures less effective. However, it
is also extremely useful to use graphical displays to assist in identifying
unusual data. Techniques such as time series plots, histograms, and scatter
diagrams are extremely useful. These and other graphical methods will be
described in Chapter 2.
1.4.3 Imputation
Data imputation is the process of correcting missing data or replacing outliers with an estimation process. Imputation replaces missing or erroneous
values with a “likely” value based on other available information. This
enables the analysis to work with statistical techniques which are designed
to handle the complete data sets.
Mean value imputation consists of replacing a missing value with
the sample average calculated from the nonmissing observations. The big
advantage of this method is that it is easy, and if the data does not have any
specific trend or seasonal pattern, it leaves the sample mean of the complete
data set unchanged. However, one must be careful if there are trends or
seasonal patterns, because the sample mean of all of the data may not reflect
these patterns. A variation of this is stochastic mean value imputation, in
which a random variable is added to the mean value to capture some of the
noise or variability in the data. The random variable could be assumed to
RESOURCES FOR FORECASTING
19
follow a normal distribution with mean zero and standard deviation equal
to the standard deviation of the actual observed data. A variation of mean
value imputation is to use a subset of the available historical data that
reflects any trend or seasonal patterns in the data. For example, consider
the time series y1 , y2 , … , yT and suppose that one observation yj is missing.
We can impute the missing value as
( j−1
)
j+k

1

yj =
y +
y ,
2k t=j−k t t−j+1 t
where k would be based on the seasonal variability in the data. It is usually
chosen as some multiple of the smallest seasonal cycle in the data. So, if
the data are monthly and exhibit a monthly cycle, k would be a multiple of
12. Regression imputation is a variation of mean value imputation where
the imputed value is computed from a model used to predict the missing
value. The prediction model does not have to be a linear regression model.
For example, it could be a time series model.
Hot deck imputation is an old technique that is also known as the last
value carried forward method. The term “hot deck” comes from the use
of computer punch cards. The deck of cards was “hot” because it was
currently in use. Cold deck imputation uses information from a deck of
cards not currently in use. In hot deck imputation, the missing values are
imputed by using values from similar complete observations. If there are
several variables, sort the data by the variables that are most related to
the missing observation and then, starting at the top, replace the missing
values with the value of the immediately preceding variable. There are
many variants of this procedure.
1.5 RESOURCES FOR FORECASTING
There are a variety of good resources that can be helpful to technical
professionals involved in developing forecasting models and preparing
forecasts. There are three professional journals devoted to forecasting:
r Journal of Forecasting
r International Journal of Forecasting
r Journal of Business Forecasting Methods and Systems
These journals publish a mixture of new methodology, studies devoted
to the evaluation of current methods for forecasting, and case studies and
20
INTRODUCTION TO FORECASTING
applications. In addition to these specialized forecasting journals, there are
several other mainstream statistics and operations research/management
science journals that publish papers on forecasting, including:
r Journal of Business and Economic Statistics
r Management Science
r Naval Research Logistics
r Operations Research
r International Journal of Production Research
r Journal of Applied Statistics
This is by no means a comprehensive list. Research on forecasting tends
to be published in a variety of outlets.
There are several books that are good complements to this one.
We recommend Box, Jenkins, and Reinsel (1994); Chatfield (1996);
Fuller (1995); Abraham and Ledolter (1983); Montgomery, Johnson, and
Gardiner (1990); Wei (2006); and Brockwell and Davis (1991, 2002). Some
of these books are more specialized than this one, in that they focus on
a specific type of forecasting model such as the autoregressive integrated
moving average [ARIMA] model, and some also require more background
in statistics and mathematics.
Many statistics software packages have very good capability for fitting
a variety of forecasting models. Minitab® Statistical Software, JMP® , the
Statistical Analysis System (SAS) and R are the packages that we utilize
and illustrate in this book. At the end of most chapters we provide R code
for working some of the examples in the chapter. Matlab and S-Plus are
also two packages that have excellent capability for solving forecasting
problems.
EXERCISES
1.1 Why is forecasting an essential part of the operation of any organization or business?
1.2 What is a time series? Explain the meaning of trend effects, seasonal
variations, and random error.
1.3 Explain the difference between a point forecast and an interval
forecast.
1.4 What do we mean by a causal forecasting technique?
EXERCISES
21
1.5 Everyone makes forecasts in their daily lives. Identify and discuss
a situation where you employ forecasts.
a. What decisions are impacted by your forecasts?
b. How do you evaluate the quality of your forecasts?
c. What is the value to you of a good forecast?
d. What is the harm or penalty associated with a bad forecast?
1.6 What is meant by a rolling horizon forecast?
1.7 Explain the difference between forecast horizon and forecast interval.
1.8 Suppose that you are in charge of capacity planning for a large
electric utility. A major part of your job is ensuring that the utility has
sufficient generating capacity to meet current and future customer
needs. If you do not have enough capacity, you run the risks of
brownouts and service interruption. If you have too much capacity,
it may cost more to generate electricity.
a. What forecasts do you need to do your job effectively?
b. Are these short-range or long-range forecasts?
c. What data do you need to be able to generate these forecasts?
1.9 Your company designs and manufactures apparel for the North
American market. Clothing and apparel is a style good, with a
relatively limited life. Items not sold at the end of the season are
usually sold through off-season outlet and discount retailers. Items
not sold through discounting and off-season merchants are often
given to charity or sold abroad.
a. What forecasts do you need in this business to be successful?
b. Are these short-range or long-range forecasts?
c. What data do you need to be able to generate these forecasts?
d. What are the implications of forecast errors?
1.10 Suppose that you are in charge of production scheduling at a semiconductor manufacturing plant. The plant manufactures about 20
different types of devices, all on 8-inch silicon wafers. Demand for
these products varies randomly. When a lot or batch of wafers is
started into production, it can take from 4 to 6 weeks before the
batch is finished, depending on the type of product. The routing of
each batch of wafers through the production tools can be different
depending on the type of product.
22
INTRODUCTION TO FORECASTING
a.
b.
c.
d.
What forecasts do you need in this business to be successful?
Are these short-range or long-range forecasts?
What data do you need to be able to generate these forecasts?
Discuss the impact that forecast errors can potentially have on
the efficiency with which your factory operates, including workin-process inventory, meeting customer delivery schedules, and
the cycle time to manufacture product.
1.11 You are the administrator of a large metropolitan hospital that operates the only 24-hour emergency room in the area. You must schedule attending physicians, resident physicians, nurses, laboratory, and
support personnel to operate this facility effectively.
a. What measures of effectiveness do you think patients use to
evaluate the services that you provide?
b. How are forecasts useful to you in planning services that will
maximize these measures of effectiveness?
c. What planning horizon do you need to use? Does this lead to
short-range or long-range forecasts?
1.12 Consider an airline that operates a network of flights that serves 200
cities in the continental United States. What long-range forecasts do
the operators of the airline need to be successful? What forecasting
problems does this business face on a daily basis? What are the
consequences of forecast errors for the airline?
1.13 Discuss the potential difficulties of forecasting the daily closing
price of a specific stock on the New York Stock Exchange. Would
the problem be different (harder, easier) if you were asked to forecast
the closing price of a group of stocks, all in the same industry (say,
the pharmaceutical industry)?
1.14 Explain how large forecast errors can lead to high inventory levels
at a retailer; at a manufacturing plant.
1.15 Your company manufactures and distributes soft drink beverages,
sold in bottles and cans at retail outlets such as grocery stores,
restaurants and other eating/drinking establishments, and vending
machines in offices, schools, stores, and other outlets. Your product
line includes about 25 different products, and many of these are
produced in different package sizes.
a. What forecasts do you need in this business to be successful?
EXERCISES
23
b. Is the demand for your product likely to be seasonal? Explain
why or why not?
c. Does the shelf life of your product impact the forecasting problem?
d. What data do you think that you would need to be able to produce
successful forecasts?
CHAPTER 2
STATISTICS BACKGROUND
FOR FORECASTING
The future ain’t what it used to be.
YOGI BERRA, New York Yankees catcher
2.1 INTRODUCTION
This chapter presents some basic statistical methods essential to modeling,
analyzing, and forecasting time series data. Both graphical displays and
numerical summaries of the properties of time series data are presented.
We also discuss the use of data transformations and adjustments in forecasting and some widely used methods for characterizing and monitoring
the performance of a forecasting model. Some aspects of how these performance measures can be used to select between competing forecasting
techniques are also presented.
Forecasts are based on data or observations on the variable of interest.
These data are usually in the form of a time series. Suppose that there are
T periods of data available, with period T being the most recent. We will
let the observation on this variable at time period t be denoted by yt , t = 1,
2, … , T. This variable can represent a cumulative quantity, such as the
Introduction to Time Series Analysis and Forecasting, Second Edition.
Douglas C. Montgomery, Cheryl L. Jennings and Murat Kulahci.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.
25
26
STATISTICS BACKGROUND FOR FORECASTING
total demand for a product during period t, or an instantaneous quantity,
such as the daily closing price of a specific stock on the New York Stock
Exchange.
Generally, we will need to distinguish between a forecast or predicted
value of yt that was made at some previous time period, say, t − , and a
fitted value of yt that has resulted from estimating the parameters in a time
series model to historical data. Note that is the forecast lead time. The
forecast made at time period t − is denoted by ŷ t (t − ). There is a lot of
interest in the lead − 1 forecast, which is the forecast of the observation
in period t, yt , made one period prior, ŷ t (t − 1). We will denote the fitted
value of yt by ŷ t .
We will also be interested in analyzing forecast errors. The forecast
error that results from a forecast of yt that was made at time period t − is
et ( ) = yt − ŷ t (t − ).
(2.1)
For example, the lead − 1 forecast error is
et (1) = yt − ŷ t (t − 1).
The difference between the observation yt and the value obtained by fitting
a time series model to the data, or a fitted value ŷ t defined earlier, is called
a residual, and is denoted by
et = yt − ŷ t .
(2.2)
The reason for this careful distinction between forecast errors and residuals
is that models usually fit historical data better than they forecast. That is,
the residuals from a model-fitting process will almost always be smaller
than the forecast errors that are experienced when that model is used to
forecast future observations.
2.2 GRAPHICAL DISPLAYS
2.2.1 Time Series Plots
Developing a forecasting model should always begin with graphical display
and analysis of the available data. Many of the broad general features of a
time series can be seen visually. This is not to say that analytical tools are
GRAPHICAL DISPLAYS
27
not useful, because they are, but the human eye can be a very sophisticated
data analysis tool. To paraphrase the great New York Yankees catcher Yogi
Berra, “You can observe a lot just by watching.”
The basic graphical display for time series data is the time series plot,
illustrated in Chapter 1. This is just a graph of yt versus the time period,
t, for t = 1, 2, … , T. Features such as trend and seasonality are usually
easy to see from the time series plot. It is interesting to observe that some
of the classical tools of descriptive statistics, such as the histogram and
the stem-and-leaf display, are not particularly useful for time series data
because they do not take time order into account.
Example 2.1 Figures 2.1 and 2.2 show time series plots for viscosity
readings and beverage production shipments (originally shown in Figures 1.3 and 1.5, respectively). At the right-hand side of each time series
plot is a histogram of the data. Note that while the two time series display
very different characteristics, the histograms are remarkably similar. Essentially, the histogram summarizes the data across the time dimension, and
in so doing, the key time-dependent features of the data are lost. Stem-andleaf plots and boxplots would have the same issues, losing time-dependent
features.
Viscosity, cP
Time series plot
Histogram
90
90
89
89
88
88
87
87
86
86
85
85
84
84
83
83
82
82
81
81
1
20
40
60
Time period
80
100
0
4
8
Frequency
12
16
FIGURE 2.1 Time series plot and histogram of chemical process viscosity
28
STATISTICS BACKGROUND FOR FORECASTING
7200
7000
6600
Beverage shipments
Beverage shipments, Millions of Dollars
Histogram of beverage shipments
6000
5000
6000
5400
4800
4000
4200
3000
Ja
n
Ja -19
n 9
Ja -19 2
n 9
Ja -19 3
n 9
Ja -19 4
n 9
Ja -19 5
n 9
Ja -19 6
n 9
Ja -19 7
n 9
Ja -19 8
n 9
Ja -20 9
n 0
Ja -20 0
n 0
Ja -20 1
n 0
Ja -20 2
n 0
Ja -20 3
n 0
Ja -20 4
D n-2 05
ec 00
-2 6
00
6
3600
0
6
12
18
24
Frequency
FIGURE 2.2 Time series plot and histogram of beverage production shipments.
Global mean surface air temp. anomaly, °C
When there are two or more variables of interest, scatter plots can be
useful in displaying the relationship between the variables. For example,
Figure 2.3 is a scatter plot of the annual global mean surface air temperature
anomaly first shown in Figure 1.6 versus atmospheric CO2 concentrations.
The scatter plot clearly reveals a relationship between the two variables:
0.75
0.50
0.25
0.00
−0.25
−0.50
300
320
340
360
380
Combined atmospheric CO2 concentrations, ppmv
FIGURE 2.3 Scatter plot of temperature anomaly versus CO2 concentrations.
Sources: NASA–GISS (anomaly), DOE–DIAC (CO2 ).
GRAPHICAL DISPLAYS
29
low concentrations of CO2 are usually accompanied by negative anomalies,
and higher concentrations of CO2 tend to be accompanied by positive
anomalies. Note that this does not imply that higher concentrations of
CO2 actually cause higher temperatures. The scatter plot cannot establish
a causal relationship between two variables (neither can naive statistical
modeling techniques, such as regression), but it is useful in displaying how
the variables have varied together in the historical data set.
There are many variations of the time series plot and other graphical
displays that can be constructed to show specific features of a time series.
For example, Figure 2.4 displays daily price information for Whole Foods
Market stock during the first quarter of 2001 (the trading days from January
2, 2001 through March 30, 2001). This chart, created in Excel® , shows the
opening, closing, highest, and lowest prices experienced within a trading
day for the first quarter. If the opening price was higher than the closing
price, the box is filled, whereas if the closing price was higher than the
opening price, the box is open. This type of plot is potentially more useful
than a time series plot of just the closing (or opening) prices, because it
shows the volatility of the stock within a trading day. The volatility of an
asset is often of interest to investors because it is a measure of the inherent
risk associated with the asset.
65
Price, \$/Share
60
55
50
45
40
01
1/
16
/2
00
1/
23 1
/2
00
1/
30 1
/2
00
1
2/
6/
20
01
2/
13
/2
00
2/
20 1
/2
00
2/
27 1
/2
00
1
3/
6/
20
01
3/
13
/2
00
3/
20 1
/2
00
3/
27 1
/2
00
1
9/
20
1/
1/
2/
20
01
35
Date
FIGURE 2.4 Open-high/close-low chart of Whole Foods Market stock price.
Source: finance.yahoo.com.
30
STATISTICS BACKGROUND FOR FORECASTING
2.2.2 Plotting Smoothed Data
Sometimes it is useful to overlay a smoothed version of the original data
on the original time series plot to help reveal patterns in the original data.
There are several types of data smoothers that can be employed. One of the
simplest and most widely used is the ordinary or simple moving average.
A simple moving average of span N assigns weights 1/N to the most
recent N observations yT , yT−1 , … , yT−N+1 , and weight zero to all other
observations. If we let MT be the moving average, then the N-span moving
average at time period T is
MT =
yT + yT−1 + ⋯ + yT−N+1
1
=
N
N
T

yt
(2.3)
t=T−N+1
Clearly, as each new observation becomes available it is added into the sum
from which the moving average is computed and the oldest observation
is discarded. The moving average has less variability than the original
observations; in fact, if the variance of an individual observation yt is 2 ,
then assuming that the observations are uncorrelated the variance of the
moving average is
(
Var(MT ) = Var
1
N
)
N

yt
t=T−N+1
=
1
N2
N

Var(yt ) =
t=T−N+1
2
N
Sometimes a “centered” version of the moving average is used, such as in
Mt =
S
1 ∑
y
S + 1 i=−S t−i
(2.4)
where the span of the centered moving average is N = 2S + 1.
Example 2.2 Figure 2.5 plots the annual global mean surface air temperature anomaly data along with a five-period (a period is 1 year) moving
average of the same data. Note that the moving average exhibits less variability than found in the original series. It also makes some features of the
data easier to see; for example, it is now more obvious that the global air
Plots of moving averages are also used by analysts to evaluate stock
price trends; common MA periods are 5, 10, 20, 50, 100, and 200 days. A
time series plot of Whole Foods Market stock price with a 50-day moving
GRAPHICAL DISPLAYS
31
Average annual anomaly, °C
0.75
0.50
0.25
0.00
−0.25
Variable
Actual
Fits
−0.50
1885
1900
1915
1930
1945
1960
1975
1990
2004
FIGURE 2.5 Time series plot of global mean surface air temperature anomaly,
with five-period moving average. Source: NASA–GISS.
45
40
35
30
25
Variable
Actual
Fits
20
2-Jan-01
2-Apr-01
2-Jul-01
1-Oct-01
31-Dec-01
FIGURE 2.6 Time series plot of Whole Foods Market stock price, with 50-day
moving average. Source: finance.yahoo.com.
average is shown in Figure 2.6. The moving average plot smoothes the
day-to-day noise and shows a generally increasing trend.
The simple moving average is a linear data smoother, or a linear
filter, because it replaces each observation yt with a linear combination of
the other data points that are near to it in time. The weights in the linear
combination are equal, so the linear combination here is an average. Of
32
STATISTICS BACKGROUND FOR FORECASTING
course, unequal weights could be used. For example, the Hanning filter
is a weighted, centered moving average
MtH = 0.25yt+1 + 0.5yt + 0.25yt−1
Julius von Hann, a nineteenth century Austrian meteorologist, used this
filter to smooth weather data.
An obvious disadvantage of a linear filter such as a moving average
is that an unusual or erroneous data point or an outlier will dominate the
moving averages that contain that observation, contaminating the moving
averages for a length of time equal to the span of the filter. For example,
consider the sequence of observations
15, 18, 13, 12, 16, 14, 16, 17, 18, 15, 18, 200, 19, 14, 21, 24, 19, 25
which increases reasonably steadily from 15 to 25, except for the unusual
value 200. Any reasonable smoothed version of the data should also
increase steadily from 15 to 25 and not emphasize the value 200. Now
even if the value 200 is a legitimate observation, and not the result of a data
recording or reporting error (perhaps it should be 20!), it is so unusual that
it deserves special attention and should likely not be analyzed along with
the rest of the data.
Odd-span moving medians (also called running medians) are an alternative to moving averages that are effective data smoothers when the time
series may be contaminated with unusual values or outliers. The moving
median of span N is defined as
m[N]
= med(yt−u , … , yt , … , yt+u ),
t
(2.5)
where N = 2u + 1. The median is the middle observation in rank order
(or order of value). The moving median of span 3 is a very popular and
effective data smoother, where
m[3]
t = med(yt−1 , yt , yt+1 ).
This smoother would process the data three values at a time, and replace
the three original observations by their median. If we apply this smoother
to the data above, we obtain
, 15, 13, 13, 14, 16, 17, 17, 18, 18, 19, 19, 19, 21, 21, 24,
.
33
NUMERICAL DESCRIPTION OF TIME SERIES DATA
This smoothed data are a reasonable representation of the original data,
but they conveniently ignore the value 200. The end values are lost when
using the moving median, and they are represented by “
”.
In general, a moving median will pass monotone sequences of data
unchanged. It will follow a step function in the data, but it will eliminate
a spike or more persistent upset in the data that has duration of at most
u consecutive observations. Moving medians can be applied more than
once if desired to obtain an even smoother series of observations. For
example, applying the moving median of span 3 to the smoothed data above
results in
,
, 13, 13, 14, 16, 17, 17, 18, 18, 19, 19, 19, 21, 21,
,
.
These data are now as smooth as it can get; that is, repeated application of
the moving median will not change the data, apart from the end values.
If there are a lot of observations, the information loss from the missing
end values is not serious. However, if it is necessary or desirable to keep
the lengths of the original and smoothed data sets the same, a simple way
to do this is to “copy on” or add back the end values from the original data.
This would result in the smoothed data:
15, 18, 13, 13, 14, 16, 17, 17, 18, 18, 19, 19, 19, 21, 21, 19, 25
There are also methods for smoothing the end values. Tukey (1979) is a
basic reference on this subject and contains many other clever and useful
techniques for data analysis.
Example 2.3
The chemical process viscosity readings shown in
Figure 1.11 are an example of a time series that benefits from smoothing to evaluate patterns. The selection of a moving median over a moving
average, as shown in Figure 2.7, minimizes the impact of the invalid measurements, such as the one at time period 70.
2.3 NUMERICAL DESCRIPTION OF TIME SERIES DATA
2.3.1 Stationary Time Series
A very important type of time series is a stationary time series. A time
series is said to be strictly stationary if its properties are not affected
34
STATISTICS BACKGROUND FOR FORECASTING
90
85
Viscosity, cP
80
75
70
65
60
Variable
Actual
Fits
55
1
10
20
30
40
50
60
70
80
90
100
60
70
80
90
100
Time period
(a)
90
85
Viscosity, cP
80
75
70
65
60
Variable
Observation
Median_span3
55
1
10
20
30
40
50
Time period
(b)
FIGURE 2.7 Viscosity readings with (a) moving average and (b) moving
median.
by a change in the time origin. That is, if the joint probability distribution of the observations yt , yt+1 , … , yt+n is exactly the same as the joint
probability distribution of the observations yt+k , yt+k+1 , … , yt+k+n then the
time series is strictly stationary. When n = 0 the stationarity assumption
means that the probability distribution of yt is the same for all time periods
NUMERICAL DESCRIPTION OF TIME SERIES DATA
35
Units, in Thousands
10,800
10,600
10,400
10,200
10,000
9800
1
12
24
36
48
60
72
Week
84
96
108
120
FIGURE 2.8 Pharmaceutical product sales.
and can be written as f (y). The pharmaceutical product sales and chemical
viscosity readings time series data originally shown in Figures 1.2 and 1.3,
respectively, are examples of stationary time series. The time series plots
are repeated in Figures 2.8 and 2.9 for convenience. Note that both time
series seem to vary around a fixed level. Based on the earlier definition, this
is a characteristic of stationary time series. On the other hand, the Whole
90
89
Viscosity, cP
88
87
86
85
84
83
82
81
1
10
20
FIGURE 2.9
30
40
50
60
Time period
70
80
90
100
36
STATISTICS BACKGROUND FOR FORECASTING
Foods Market stock price data in Figure 1.7 tends to wander around or drift,
with no obvious fixed level. This is behavior typical of a nonstationary
time series.
Stationary implies a type of statistical equilibrium or stability in the
data. Consequently, the time series has a constant mean defined in the usual
way as

y = E(y) =
∫−∞
yf (y)dy
(2.6)
and constant variance defined as

y2 = Var(y) =
∫−∞
(y − y )2 f (y)dy.
(2.7)
The sample mean and sample variance are used to estimate these parameters. If the observations in the time series are y1 , y2 , … , yT , then the sample
mean is
ȳ = ̂ y =
T
1∑
y
T t=1 t
(2.8)
and the sample variance is
2
s =
̂ y2
T
1∑
=
(y − ȳ )2 .
T t=1 t
(2.9)
Note that the divisor in Eq. (2.9) is T rather than the more familiar T − 1.
This is the common convention in many time series applications, and
because T is usually not small, there will be little difference between using
T instead of T − 1.
2.3.2 Autocovariance and Autocorrelation Functions
If a time series is stationary this means that the joint probability distribution of any two observations, say, yt and yt+k , is the same for any two
time periods t and t + k that are separated by the same interval k. Useful
time series, can be obtained by plotting a scatter diagram of all of the data
pairs yt , yt+k that are separated by the same interval k. The interval k is
called the lag.
NUMERICAL DESCRIPTION OF TIME SERIES DATA
37
10,800
Sales, Week t + 1
10,600
10,400
10,200
10,000
9800
9800
FIGURE 2.10
10,000
10,200
10,400
Sales, Week t
10,600
108,00
Scatter diagram of pharmaceutical product sales at lag k = 1.
Example 2.4 Figure 2.10 is a scatter diagram for the pharmaceutical
product sales for lag k = 1 and Figure 2.11 is a scatter diagram for the
chemical viscosity readings for lag k = 1. Both scatter diagrams were
constructed by plotting yt+1 versus yt . Figure 2.10 exhibits little structure;
the plotted pairs of adjacent observations yt , yt+1 seem to be uncorrelated.
That is, the value of y in the current period does not provide any useful
information about the value of y that will be observed in the next period.
A different story is revealed in Figure 2.11, where we observe that the
90
Reading, Time period t + 1
89
88
87
86
85
84
83
82
81
81
82
83
84
85
86
87
88
89
90
FIGURE 2.11 Scatter diagram of chemical viscosity readings at lag k = 1.
38
STATISTICS BACKGROUND FOR FORECASTING
pairs of adjacent observations yt+1 , yt are positively correlated. That is, a
small value of y tends to be followed in the next time period by another
small value of y, and a large value of y tends to be followed immediately by
another large value of y. Note from inspection of Figures 2.10 and 2.11 that
the behavior inferred from inspection of the scatter diagrams is reflected
in the observed time series.
The covariance between yt and its value at another time period, say, yt+k
is called the autocovariance at lag k, defined by
k = Cov(yt , yt+k ) = E[(yt − )(yt+k − )].
(2.10)
The collection of the values of k , k = 0, 1, 2, … is called the autocovariance function. Note that the autocovariance at lag k = 0 is just the variance of the time series; that is, 0 = y2 ,which is constant for a stationary
time series. The autocorrelation coefficient at lag k for a stationary time
series is
k = √
E[(yt − )(yt+k − )]
E[(yt − )2 ]E[(yt+k − )2 ]
=
Cov(yt , yt+k ) k
= .
Var(yt )
0
(2.11)
The collection of the values of k , k = 0, 1, 2, … is called the autocorrelation function (ACF). Note that by definition 0 = 1. Also, the ACF is independent of the scale of measurement of the time series, so it is a dimensionless quantity. Furthermore, k = −k ; that is, the ACF is symmetric around
zero, so it is only necessary to compute the positive (or negative) half.
If a time series has a finite mean and autocovariance function it is
said to be second-order stationary (or weakly stationary of order 2). If, in
addition, the joint probability distribution of the observations at all times is
multivariate normal, then that would be sufficient to result in a time series
that is strictly stationary.
It is necessary to estimate the autocovariance and ACFs from a time
series of finite length, say, y1 , y2 , … , yT . The usual estimate of the autocovariance function is
ck = ̂k =
T−k
1∑
(y − ȳ )(yt+k − ȳ ),
T t=1 t
k = 0, 1, 2, … , K
(2.12)
and the ACF is estimated by the sample autocorrelation function (or
sample ACF)
rk = ̂k =
ck
,
c0
k = 0, 1, … , K
(2.13)
NUMERICAL DESCRIPTION OF TIME SERIES DATA
39
A good general rule of thumb is that at least 50 observations are required
to give a reliable estimate of the ACF, and the individual sample autocorrelations should be calculated up to lag K, where K is about T/4.
Often we will need to determine if the autocorrelation coefficient at a
particular lag is zero. This can be done by comparing the sample autocorrelation coefficient at lag k, rk , to its standard error. If we make the
assumption that the observations are uncorrelated, that is, k = 0 for all k,
then the variance of the sample autocorrelation coefficient is
1
T
(2.14)
1
se(rk ) ≅ √
T
(2.15)
Var(rk ) ≅
and the standard error is
Example 2.5 Consider the chemical process viscosity readings plotted
in Figure 2.9; the values are listed in Table 2.1.
The sample ACF at lag k = 1 is calculated as
c0 =
100−0
1 ∑
(y − ȳ )(yt+0 − ȳ )
100 t=1 t
1
[(86.7418 − 84.9153)(86.7418 − 84.9153) + ⋯
100
+ (85.0572 − 84.9153)(85.0572 − 84.9153)]
= 280.9332
=
100−1
1 ∑
c1 =
(y − ȳ )(yt+1 − ȳ )
100 t=1 t
1
[(86.7418 − 84.9153)(85.3195 − 84.9153) + ⋯
100
+ (87.0048 − 84.9153)(85.0572 − 84.9153)]
= 220.3137
c
220.3137
r1 = 1 =
= 0.7842
c0
280.9332
=
A plot and listing of the sample ACFs generated by Minitab for the first
25 lags are displayed in Figures 2.12 and 2.13, respectively.
40
STATISTICS BACKGROUND FOR FORECASTING
TABLE 2.1 Chemical Process Viscosity Readings
Time
Period
Time
Period
Time
Period
Time
Period
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
86.7418
85.3195
84.7355
85.1113
85.1487
84.4775
84.6827
84.6757
86.3169
88.0006
86.2597
85.8286
83.7500
84.4628
84.6476
84.5751
82.2473
83.3774
83.5385
85.1620
83.7881
84.0421
84.1023
84.8495
87.6416
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
87.2397
87.5219
86.4992
85.6050
86.8293
84.5004
84.1844
85.4563
86.1511
86.4142
86.0498
86.6642
84.7289
85.9523
86.8473
88.4250
89.6481
87.8566
88.4997
87.0622
85.1973
85.0767
84.4362
84.2112
85.9952
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
85.5722
83.7935
84.3706
83.3762
84.9975
84.3495
85.3395
86.0503
84.8839
85.4176
84.2309
83.5761
84.1343
82.6974
83.5454
86.4714
86.2143
87.0215
86.6504
85.7082
86.1504
85.8032
85.6197
84.2339
83.5737
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
84.7052
83.8168
82.4171
83.0420
83.6993
82.2033
82.1413
81.7961
82.3241
81.5316
81.7280
82.5375
82.3877
82.4159
82.2102
82.7673
83.1234
83.2203
84.4510
84.9145
85.7609
85.2302
86.7312
87.0048
85.0572
1.0
0.8
Autocorrelation, rk
0.6
0.4
0.2
0.0
−0.2
−0.4
−0.6
−0.8
−1.0
2
4
6
8
10
12 14
Lag, k
16
18
20
22
24
FIGURE 2.12 Sample autocorrelation function for chemical viscosity readings,
with 5% significance limits.
NUMERICAL DESCRIPTION OF TIME SERIES DATA
41
Lag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
ACF
0.784221
0.628050
0.491587
0.362880
0.304554
0.208979
0.164320
0.144789
0.103625
0.066559
0.003949
−0.077226
−0.051953
0.020525
0.072784
0.070753
0.001334
−0.057435
−0.123122
−0.180546
−0.162466
−0.145979
−0.087420
−0.011579
0.063170
T
7.84
4.21
2.83
1.94
1.57
1.05
0.82
0.72
0.51
0.33
0.02
−0.38
−0.25
0.10
0.36
0.35
0.01
−0.28
−0.60
−0.88
−0.78
−0.70
−0.42
−0.06
0.30
LBQ
63.36
104.42
129.83
143.82
153.78
158.52
161.48
163.80
165.01
165.51
165.51
166.20
166.52
166.57
167.21
167.81
167.81
168.22
170.13
174.29
177.70
180.48
181.50
181.51
182.06
FIGURE 2.13 Listing of sample autocorrelation functions for first 25 lags of
chemical viscosity readings, Minitab session window output (the definition of T
and LBQ will be given later).
Note the rate of decrease or decay in ACF values in Figure 2.12 from 0.78
to 0, followed by a sinusoidal pattern about 0. This ACF pattern is typical
of stationary time series. The importance of ACF estimates exceeding the
5% significance limits will be discussed in Chapter 5. In contrast, the plot
of sample ACFs for a time series of random values with constant mean
has a much different appearance. The sample ACFs for pharmaceutical
product sales plotted in Figure 2.14 appear randomly positive or negative,
with values near zero.
While the ACF is strictly speaking defined only for a stationary time
series, the sample ACF can be computed for any time series, so a logical
question is: What does the sample ACF of a nonstationary time series look
like? Consider the daily closing price for Whole Foods Market stock in
Figure 1.7. The sample ACF of this time series is shown in Figure 2.15.
Note that this sample ACF plot behaves quite differently than the ACF
plots in Figures 2.12 and 2.14. Instead of cutting off or tailing off near
zero after a few lags, this sample ACF is very persistent; that is, it decays
very slowly and exhibits sample autocorrelations that are still rather large
even at long lags. This behavior is characteristic of a nonstationary time
series. Generally, if the sample ACF does not dampen out within about 15
to 20 lags, the time series is nonstationary.
42
STATISTICS BACKGROUND FOR FORECASTING
1.0
0.8
Autocorrelation, rk
0.6
0.4
0.2
0.0
−0.2
−0.4
−0.6
−0.8
−1.0
2
4
6
8
10
12 14 16 18
Lag, k
20
22 24
26
28 30
FIGURE 2.14 Autocorrelation function for pharmaceutical product sales, with
5% significance limits.
2.3.3 The Variogram
We have discussed two techniques for determining if a time series is
nonstationary, plotting a reasonable long series of the data to see if it drifts
or wanders away from its mean for long periods of time, and computing
the sample ACF. However, often in practice there is no clear demarcation
1.0
0.8
Autocorrelation, rk
0.6
0.4
0.2
0.0
−0.2
−0.4
−0.6
−0.8
−1.0
1
5
10
15
20
25
30
35
Lag, k
40
45
50
55
60
FIGURE 2.15 Autocorrelation function for Whole Foods Market stock price,
with 5% significance limits.
NUMERICAL DESCRIPTION OF TIME SERIES DATA
43
between a stationary and a nonstationary process for many real-world time
series. An additional diagnostic tool that is very useful is the variogram.
Suppose that the time series observations are represented by yt . The
variogram Gk measures variances of the differences between observations
that are k lags apart, relative to the variance of the differences that are one
time unit apart (or at lag 1). The variogram is defined mathematically as
Gk =
Var (yt+k − yt )
Var (yt+1 − yt )
k = 1, 2, …
(2.16)
and the values of Gk are plotted as a function of the lag k. If the time series
is stationary, it turns out that
Gk =
1 − k
,
1 − 1
but for a stationary time series k → 0 as k increases, so when the variogram
is plotted against lag k, Gk will reach an asymptote 1∕(1 − 1 ). However,
if the time series is nonstationary, Gk will increase monotonically.
Estimating the variogram is accomplished by simply applying the usual
sample variance to the differences, taking care to account for the changing
sample sizes when the differences are taken (see Haslett (1997)). Let
dtk = yt+k − yt
1 ∑ k
d̄ k =
dt .
T −k
Then an estimate of Var (yt+k − yt ) is
T−k (

s2k =
t=1
dtk − d̄ k
T −k−1
)2
.
Therefore the sample variogram is given by
̂k =
G
s2k
s21
k = 1, 2, …
(2.17)
To illustrate the use of the variogram, consider the chemical process viscosity data plotted in Figure 2.9. Both the data plot and the sample ACF in
44
STATISTICS BACKGROUND FOR FORECASTING
Lag
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Variogram
1.0000
1.7238
2.3562
2.9527
3.2230
3.6659
3.8729
3.9634
4.1541
4.3259
4.6161
4.9923
4.8752
4.5393
4.2971
4.3065
4.6282
4.9006
5.2050
5.4711
5.3873
5.3109
5.0395
4.6880
4.3416
Plot Variogram
FIGURE 2.16 JMP output for the sample variogram of the chemical process
viscosity data from Figure 2.19.
Figures 2.12 and 2.13 suggest that the time series is stationary. Figure 2.16
is the variogram. Many software packages do not offer the variogram as
a standard pull-down menu selection, but the JMP package does. Without
software, it is still fairly easy to compute.
Start by computing the successive differences of the time series for a
number of lags and then find their sample variances. The ratios of these
sample variances to the sample variance of the first differences will produce
the sample variogram. The JMP calculations of the sample variogram are
shown in Figure 2.16 and a plot is given in Figure 2.17. Notice that the
sample variogram generally converges to a stable level and then fluctuates
around it. This is consistent with a stationary time series, and it provides
additional evidence that the chemical process viscosity data are stationary.
Now let us see what the sample variogram looks like for a nonstationary
time series. The Whole Foods Market stock price data from Appendix
Table B.7 originally shown in Figure 1.7 are apparently nonstationary, as it
wanders about with no obvious fixed level. The sample ACF in Figure 2.15
decays very slowly and as noted previously, gives the impression that the
time series is nonstationary. The calculations for the variogram from JMP
are shown in Figure 2.18 and the variogram is plotted in Figure 2.19.
NUMERICAL DESCRIPTION OF TIME SERIES DATA
45
6
5
Variogram
4
3
2
1
0
5
10
15
20
25
Lag
FIGURE 2.17 JMP sample variogram of the chemical process viscosity data
from Figure 2.9.
Lag
Variogram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
1.0000
2.0994
3.2106
4.3960
5.4982
6.5810
7.5690
8.5332
9.4704
10.4419
11.4154
12.3452
13.3759
14.4411
15.6184
16.9601
18.2442
19.3782
20.3934
21.3618
22.4010
23.4788
24.5450
25.5906
26.6620
Plot Variogram
FIGURE 2.18 JMP output for the sample variogram of the Whole Foods Market
stock price data from Figure 1.7 and Appendix Table B.7.
46
STATISTICS BACKGROUND FOR FORECASTING
30
25
Variogram
20
15
10
5
0
0
5
10
15
Lag
20
25
30
FIGURE 2.19 Sample variogram of the Whole Foods Market stock price data
from Figure 1.7 and Appendix Table B.7.
Notice that the sample variogram in Figure 2.19 increases monotonically for all 25 lags. This is a strong indication that the time series is
nonstationary.
2.4 USE OF DATA TRANSFORMATIONS AND ADJUSTMENTS
2.4.1 Transformations
Data transformations are useful in many aspects of statistical work, often
for stabilizing the variance of the data. Nonconstant variance is quite common in time series data. For example, the International Sunspot Numbers
plotted in Figure 2.20a show cyclic patterns of varying magnitudes. The
variability from about 1800 to 1830 is smaller than that from about 1830
to 1880; other small periods of constant, but different, variances can also
be identified.
A very popular type of data transformation to deal with nonconstant
variance is the power family of transformations, given by
y
( )
⎧ y − 1
,

= ⎨ ẏ −1
⎪ ẏ ln y,

≠0
=0
,
(2.18)
USE OF DATA TRANSFORMATIONS AND ADJUSTMENTS
47
200
In (Yearly sunspot number)
Yearly sunspot number
5
150
100
50
0
4
3
2
1
0
1700 1750 1800 1850 1900 1950 2006
1700 1750 1800 1850 1900 1950 2006
(a)
(b)
FIGURE 2.20 Yearly International Sunspot Number, (a) untransformed and (b)
natural logarithm transformation. Source: SIDC.
∑T
where ẏ = exp[(1∕T) t=1 ln yt ] is the geometric mean of the observations.
If = 1, there is no transformation. Typical values of used with time
series data are = 0.5 (a square root transformation), = 0 (the log transformation), = −0.5 (reciprocal square root transformation), and = −1
(inverse transformation). The divisor ẏ −1 is simply a scale factor that
ensures that when different models are fit to investigate the utility of different transformations (values of ), the residual sum of squares for these
models can be meaningfully compared. The reason that = 0 implies a log
transformation is that (y − 1)∕ approaches the log of y as approaches
zero. Often an appropriate value of is chosen empirically by fitting a
model to y( ) for various values of and then selecting the transformation
that produces the minimum residual sum of squares.
The log transformation is used frequently in situations where the variability in the original time series increases with the average level of the
series. When the standard deviation of the original series increases linearly with the mean, the log transformation is in fact an optimal variancestabilizing transformation. The log transformation also has a very nice
physical interpretation as percentage change. To illustrate this, let the time
series be y1 , y2 , … , yT and suppose that we are interested in the percentage
change in yt , say,
xt =
100(yt − yt−1 )
,
yt−1
48
STATISTICS BACKGROUND FOR FORECASTING
The approximate percentage change in yt can be calculated from the
differences of the log-transformed time series xt ≅ 100[ln(yt ) − ln(yt−1 )]
because
(
)
(
)
yt
yt−1 + (yt − yt−1 )
100[ln(yt ) − ln(yt−1 )] = 100 ln
= 100 ln
yt−1
yt−1
(
xt )
= 100 ln 1 +
≅ xt
100
since ln(1 + z) ≅ z when z is small.
The application of a natural logarithm transformation to the International
Sunspot Number, as shown in Figure 2.20b, tends to stabilize the variance
and leaves just a few unusual values.
that are useful in time series modeling and forecasting. Two of the most
these procedures are called trend and seasonal decomposition.
A time series that exhibits a trend is a nonstationary time series. Modeling and forecasting of such a time series is greatly simplified if we
can eliminate the trend. One way to do this is to fit a regression model
describing the trend component to the data and then subtracting it out of
the original observations, leaving a set of residuals that are free of trend.
The trend models that ar…
attachment

#### Why Choose Us

• 100% non-plagiarized Papers
• Affordable Prices
• Any Paper, Urgency, and Subject
• Will complete your papers in 6 hours
• On-time Delivery
• Money-back and Privacy guarantees
• Unlimited Amendments upon request
• Satisfaction guarantee

#### How it Works

• Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
• Fill in your paper’s requirements in the "PAPER DETAILS" section.