Читать книгу Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri - Страница 6

Flavors of Machine Learning for Time Series Forecasting

Оглавление

In this first section of Chapter 1, we will discover together why time series forecasting is a fundamental cross-industry research area. Moreover, you will learn a few important concepts to deal with time series data, perform time series analysis, and build your time series forecasting solutions.

One example of the use of time series forecasting solutions would be the simple extrapolation of a past trend in predicting next week hourly temperatures. Another example would be the development of a complex linear stochastic model for predicting the movement of short-term interest rates. Time-series models have been also used to forecast the demand for airline capacity, seasonal energy demand, and future online sales.

In time series forecasting, data scientists' assumption is that there is no causality that affects the variable we are trying to forecast. Instead, they analyze the historical values of a time series data set in order to understand and predict their future values. The method used to produce a time series forecasting model may involve the use of a simple deterministic model, such as a linear extrapolation, or the use of more complex deep learning approaches.

Due to their applicability to many real-life problems, such as fraud detection, spam email filtering, finance, and medical diagnosis, and their ability to produce actionable results, machine learning and deep learning algorithms have gained a lot of attention in recent years. Generally, deep learning methods have been developed and applied to univariate time series forecasting scenarios, where the time series consists of single observations recorded sequentially over equal time increments (Lazzeri 2019a).

For this reason, they have often performed worse than naïve and classical forecasting methods, such as exponential smoothing and autoregressive integrated moving average (ARIMA). This has led to a general misconception that deep learning models are inefficient in time series forecasting scenarios, and many data scientists wonder whether it's really necessary to add another class of methods, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), to their time series toolkit (we will discuss this in more detail in Chapter 5, “Introduction to Neural Networks for Time Series Forecasting”) (Lazzeri 2019a).

In time series, the chronological arrangement of data is captured in a specific column that is often denoted as time stamp, date, or simply time. As illustrated in Figure 1.2, a machine learning data set is usually a list of data points containing important information that are treated equally from a time perspective and are used as input to generate an output, which represents our predictions. On the contrary, a time structure is added to your time series data set, and all data points assume a specific value that is articulated by that temporal dimension.


Figure 1.2: Machine learning data set versus time series data set

Now that you have a better understanding of time series data, it is also important to understand the difference between time series analysis and time series forecasting. These two domains are tightly related, but they serve different purposes: time series analysis is about identifying the intrinsic structure and extrapolating the hidden traits of your time series data in order to get helpful information from it (like trend or seasonal variation—these are all concepts that we will discuss later on in the chapter).

Data scientists usually leverage time series analysis for the following reasons:

 Acquire clear insights of the underlying structures of historical time series data.

 Increase the quality of the interpretation of time series features to better inform the problem domain.

 Preprocess and perform high-quality feature engineering to get a richer and deeper historical data set.

Time series analysis is used for many applications such as process and quality control, utility studies, and census analysis. It is usually considered the first step to analyze and prepare your time series data for the modeling step, which is properly called time series forecasting.

Time series forecasting involves taking machine learning models, training them on historical time series data, and consuming them to forecast future predictions. As illustrated in Figure 1.3, in time series forecasting that future output is unknown, and it is based on how the machine learning model is trained on the historical input data.


Figure 1.3: Difference between time series analysis historical input data and time series forecasting output data

Different historical and current phenomena may affect the values of your data in a time series, and these events are diagnosed as components of a time series. It is very important to recognize these different influences or components and decompose them in order to separate them from the data levels.

As illustrated in Figure 1.4, there are four main categories of components in time series analysis: long-term movement or trend, seasonal short-term movements, cyclic short-term movements, and random or irregular fluctuations.

Figure 1.4: Components of time series

Let's have a closer look at these four components:

 Long-term movement or trend refers to the overall movement of time series values to increase or decrease during a prolonged time interval. It is common to observe trends changing direction throughout the course of your time series data set: they may increase, decrease, or remain stable at different moments. However, overall you will see one primary trend. Population counts, agricultural production, and items manufactured are just some examples of when trends may come into play.

 There are two different types of short-term movements:Seasonal variations are periodic temporal fluctuations that show the same variation and usually recur over a period of less than a year. Seasonality is always of a fixed and known period. Most of the time, this variation will be present in a time series if the data is recorded hourly, daily, weekly, quarterly, or monthly. Different social conventions (such as holidays and festivities), weather seasons, and climatic conditions play an important role in seasonal variations, like for example the sale of umbrellas and raincoats in the rainy season and the sale of air conditioners in summer seasons.Cyclic variations, on the other side, are recurrent patterns that exist when data exhibits rises and falls that are not of a fixed period. One complete period is a cycle, but a cycle will not have a specific predetermined length of time, even if the duration of these temporal fluctuations is usually longer than a year. A classic example of cyclic variation is a business cycle, which is the downward and upward movement of gross domestic product around its long-term growth trend: it usually can last several years, but the duration of the current business cycle is unknown in advance.As illustrated in Figure 1.5, cyclic variations and seasonal variations are part of the same short-term movements in time series forecasting, but they present differences that data scientists need to identify and leverage in order to build accurate forecasting models:Figure 1.5: Differences between cyclic variations versus seasonal variations

 Random or irregular fluctuations are the last element to cause variations in our time series data. These fluctuations are uncontrollable, unpredictable, and erratic, such as earthquakes, wars, flood, and any other natural disasters.

Data scientists often refer to the first three components (long-term movements, seasonal short-term movements, and cyclic short-term movements) as signals in time series data because they actually are deterministic indicators that can be derived from the data itself. On the other hand, the last component (random or irregular fluctuations) is an arbitrary variation of the values in your data that you cannot really predict, because each data point of these random fluctuations is independent of the other signals above, such as long-term and short-term movements. For this reason, data scientists often refer to it as noise, because it is triggered by latent variables difficult to observe, as illustrated in Figure 1.6.


Figure 1.6: Actual representation of time series components

Data scientists need to carefully identify to what extent each component is present in the time series data to be able to build an accurate machine learning forecasting solution. In order to recognize and measure these four components, it is recommended to first perform a decomposition process to remove the component effects from the data. After these components are identified and measured, and eventually utilized to build additional features to improve the forecast accuracy, data scientists can leverage different methods to recompose and add back the components on forecasted results.

Understanding these four time series components and how to identify and remove them represents a strategic first step for building any time series forecasting solution because they can help with another important concept in time series that may help increase the predictive power of your machine learning algorithms: stationarity. Stationarity means that statistical parameters of a time series do not change over time. In other words, basic properties of the time series data distribution, like the mean and variance, remain constant over time. Therefore, stationary time series processes are easier to analyze and model because the basic assumption is that their properties are not dependent on time and will be the same in the future as they have been in the previous historical period of time. Classically, you should make your time series stationary.

There are two important forms of stationarity: strong stationarity and weak stationarity. A time series is defined as having a strong stationarity when all its statistical parameters do not change over time. A time series is defined as having a weak stationarity when its mean and auto-covariance functions do not change over time.

Alternatively, time series that exhibit changes in the values of their data, such as a trend or seasonality, are clearly not stationary, and as a consequence, they are more difficult to predict and model. For accurate and consistent forecasted results to be received, the nonstationary data needs to be transformed into stationary data. Another important reason for trying to render a time series stationary is to be able to obtain meaningful sample statistics such as means, variances, and correlations with other variables that can be used to get more insights and better understand your data and can be included as additional features in your time series data set.

However, there are cases where unknown nonlinear relationships cannot be determined by classical methods, such as autoregression, moving average, and autoregressive integrated moving average methods. This information can be very helpful when building machine learning models, and it can be used in feature engineering and feature selection processes. In reality, many economic time series are far from stationary when visualized in their original units of measurement, and even after seasonal adjustment they will typically still exhibit trends, cycles, and other nonstationary characteristics.

Time series forecasting involves developing and using a predictive model on data where there is an ordered relationship between observations. Before data scientists get started with building their forecasting solution, it is highly recommended to define the following forecasting aspects:

 The inputs and outputs of your forecasting model – For data scientists who are about to build a forecasting solution, it is critical to think about the data they have available to make the forecast and what they want to forecast about the future. Inputs are historical time series data provided to feed the model in order to make a forecast about future values. Outputs are the prediction results for a future time step. For example, the last seven days of energy consumption data collected by sensors in an electrical grid is considered input data, while the predicted values of energy consumption to forecast for the next day are defined as output data.

 Granularity level of your forecasting model – Granularity in time series forecasting represents the lowest detailed level of values captured for each time stamp. Granularity is related to the frequency at which time series values are collected: usually, in Internet of Things (IoT) scenarios, data scientists need to handle time series data that has been collected by sensors every few seconds. IoT is typically defined as a group of devices that are connected to the Internet, all collecting, sharing, and storing data. Examples of IoT devices are temperature sensors in an air-conditioning unit and pressure sensors installed on a remote oil pump. Sometimes aggregating your time series data can represent an important step in building and optimizing your time series model: time aggregation is the combination of all data points for a single resource over a specified period (for example, daily, weekly, or monthly). With aggregation, the data points collected during each granularity period are aggregated into a single statistical value, such as the average or the sum of all the collected data points.

 Horizon of your forecasting model – The horizon of your forecasting model is the length of time into the future for which forecasts are to be prepared. These generally vary from short-term forecasting horizons (less than three months) to long-term horizons (more than two years). Short-term forecasting is usually used in short-term objectives such as material requirement planning, scheduling, and budgeting; on the other hand, long-term forecasting is usually used to predict the long-term objectives covering more than five years, such as product diversification, sales, and advertising.

 The endogenous and exogenous features of your forecasting model – Endogenous and exogenous are economic terms to describe internal and external factors, respectively, affecting business production, efficiency, growth, and profitability. Endogenous features are input variables that have values that are determined by other variables in the system, and the output variable depends on them. For example, if data scientists need to build a forecasting model to predict weekly gas prices, they can consider including major travel holidays as endogenous variables, as prices may go up because the cyclical demand is up.On the other hand, exogenous features are input variables that are not influenced by other variables in the system and on which the output variable depends. Exogenous variables present some common characteristics (Glen 2014), such as these:They are fixed when they enter the model.They are taken as a given in the model.They influence endogenous variables in the model.They are not determined by the model.They are not explained by the model.In the example above of predicting weekly gas prices, while the holiday travel schedule increases demand based on cyclical trends, the overall cost of gasoline could be affected by oil reserve prices, sociopolitical conflicts, or disasters such as oil tanker accidents.

 The structured or unstructured features of your forecasting model – Structured data comprises clearly defined data types whose pattern makes them easily searchable, while unstructured data comprises data that is usually not as easily searchable, including formats like audio, video, and social media postings. Structured data usually resides in relational databases, whose fields store length delineated data such as phone numbers, Social Security numbers, or ZIP codes. Even text strings of variable length like names are contained in records, making it a simple matter to search (Taylor 2018).Unstructured data has internal structure but is not structured via predefined data models or schema. It may be textual or non-textual, and human or machine generated. Typical human-generated unstructured data includes spreadsheets, presentations, email, and logs. Typical machine-generated unstructured data includes satellite imagery, weather data, landforms, and military movements.In a time series context, unstructured data doesn't present systematic time-dependent patterns, while structured data shows systematic time dependent patterns, such as trend and seasonality.

 The univariate or multivariate nature of your forecasting model – A univariate data is characterized by a single variable. It does not deal with causes or relationships. Its descriptive properties can be identified in some estimates such as central tendency (mean, mode, median), dispersion (range, variance, maximum, minimum, quartile, and standard deviation), and the frequency distributions. The univariate data analysis is known for its limitation in the determination of relationship between two or more variables, correlations, comparisons, causes, explanations, and contingency between variables. Generally, it does not supply further information on the dependent and independent variables and, as such, is insufficient in any analysis involving more than one variable.To obtain results from such multiple indicator problems, data scientists usually use multivariate data analysis. This multivariate approach will not only help consider several characteristics in a model but will also bring to light the effect of the external variables.Time series forecasting can either be univariate or multivariate. The term univariate time series refers to one that consists of single observations recorded sequentially over equal time increments. Unlike other areas of statistics, the univariate time series model contains lag values of itself as independent variables (itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm). These lag variables can play the role of independent variables as in multiple regression. The multivariate time series model is an extension of the univariate case and involves two or more input variables. It does not limit itself to its past information but also incorporates the past of other variables. Multivariate processes arise when several related time series are observed simultaneously over time instead of a single series being observed as in univariate case. Examples of the univariate time series are the ARIMA models that we will discuss in Chapter 4, “Introduction to Some Classical Methods for Time Series Forecasting.” Considering this question with regard to inputs and outputs may add a further distinction. The number of variables may differ between the inputs and outputs; for example, the data may not be symmetrical. You may have multiple variables as input to the model and only be interested in predicting one of the variables as output. In this case, there is an assumption in the model that the multiple input variables aid and are required in predicting the single output variable.

 Single-step or multi-step structure of your forecasting model – Time series forecasting describes predicting the observation at the next time step. This is called a one-step forecast as only one time step is to be predicted. In contrast to the one-step forecast are the multiple-step or multi-step time series forecasting problems, where the goal is to predict a sequence of values in a time series. Many time series problems involve the task of predicting a sequence of values using only the values observed in the past (Cheng et al. 2006). Examples of this task include predicting the time series for crop yield, stock prices, traffic volume, and electrical power consumption. There are at least four commonly used strategies for making multi-step forecasts (Brownlee 2017):Direct multi-step forecast: The direct method requires creating a separate model for each forecast time stamp. For example, in the case of predicting energy consumption for the next two hours, we would need to develop a model for forecasting energy consumption on the first hour and a separate model for forecasting energy consumption on the second hour.Recursive multi-step forecast: Multi-step-ahead forecasting can be handled recursively, where a single time series model is created to forecast next time stamp, and the following forecasts are then computed using previous forecasts. For example, in the case of forecasting energy consumption for the next two hours, we would need to develop a one-step forecasting model. This model would then be used to predict next hour energy consumption, then this prediction would be used as input in order to predict the energy consumption in the second hour.Direct-recursive hybrid multi-step forecast: The direct and recursive strategies can be combined to offer the benefits of both methods (Brownlee 2017). For example, a distinct model can be built for each future time stamp, however each model may leverage the forecasts made by models at prior time steps as input values. In the case of predicting energy consumption for the next two hours, two models can be built, and the output from the first model is used as an input for the second model.Multiple output forecast: The multiple output strategy requires developing one model that is capable of predicting the entire forecast sequence. For example, in the case of predicting energy consumption for the next two hours, we would develop one model and apply it to predict the next two hours in one single computation (Brownlee 2017).

 Contiguous or noncontiguous time series values of your forecasting model – A time series that present a consistent temporal interval (for example, every five minutes, every two hours, or every quarter) between each other are defined as contiguous (Zuo et al. 2019). On the other hand, time series that are not uniform over time may be defined as noncontiguous: very often the reason behind noncontiguous timeseries may be missing or corrupt values. Before jumping to the methods of data imputation, it is important to understand the reason data goes missing. There are three most common reasons for this:Missing at random: Missing at random means that the propensity for a data point to be missing is not related to the missing data but it is related to some of the observed data.Missing completely at random: The fact that a certain value is missing has nothing to do with its hypothetical value and with the values of other variables.Missing not at random: Two possible reasons are that the missing value depends on the hypothetical value or the missing value is dependent on some other variable's value.In the first two cases, it is safe to remove the data with missing values depending upon their occurrences, while in the third case removing observations with missing values can produce a bias in the model. There are different solutions for data imputation depending on the kind of problem you are trying to solve, and it is difficult to provide a general solution. Moreover, since it has temporal property, only some of the statistical methodologies are appropriate for time series data.I have identified some of the most commonly used methods and listed them as a structural guide in Figure 1.7.Figure 1.7: Handling missing dataAs you can observe from the graph in Figure 1.7, listwise deletion removes all data for an observation that has one or more missing values. Particularly if the missing data is limited to a small number of observations, you may just opt to eliminate those cases from the analysis. However, in most cases it is disadvantageous to use listwise deletion. This is because the assumptions of the missing completely at random method are typically rare to support. As a result, listwise deletion methods produce biased parameters and estimates.Pairwise deletion analyses all cases in which the variables of interest are present and thus maximizes all data available by an analysis basis. A strength to this technique is that it increases power in your analysis, but it has many disadvantages. It assumes that the missing data is missing completely at random. If you delete pairwise, then you'll end up with different numbers of observations contributing to different parts of your model, which can make interpretation difficult.Deleting columns is another option, but it is always better to keep data than to discard it. Sometimes you can drop variables if the data is missing for more than 60 percent of the observations but only if that variable is insignificant. Having said that, imputation is always a preferred choice over dropping variables.

 Regarding time series specific methods, there are a few options:Linear interpolation: This method works well for a time series with some trend but is not suitable for seasonal data.Seasonal adjustment and linear interpolation: This method works well for data with both trend and seasonality.Mean, median, and mode: Computing the overall mean, median, or mode is a very basic imputation method; it is the only tested function that takes no advantage of the time series characteristics or relationship between the variables. It is very fast but has clear disadvantages. One disadvantage is that mean imputation reduces variance in the data set.

In the next section of this chapter, we will discuss how to shape time series as a supervised learning problem and, as a consequence, get access to a large portfolio of linear and nonlinear machine learning algorithms.

Machine Learning for Time Series Forecasting with Python

Подняться наверх