Читать книгу Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri - Страница 8

Python for Time Series Forecasting

Оглавление

In this section, we will look at different Python libraries for time series data and how libraries such as pandas, statsmodels, and scikit-learn can help you with data handling, time series modeling, and machine learning, respectively. The Python ecosystem is the dominant platform for applied machine learning.

The primary rationale for adopting Python for time series forecasting is that it is a general-purpose programming language that you can use both for experimentation and in production. It is easy to learn and use, primarily because the language focuses on readability. Python is a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications.

Python is also widely used for machine learning and data science because of the excellent library support, and it has a few libraries for time series, such as NumPy, pandas, SciPy, scikit-learn, statsmodels, Matplotlib, datetime, Keras, and many more. Below we will have a closer look at the time series libraries in Python that we will use in this book:

 SciPy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering. Some of the core packages are NumPy (a base n-dimensional array package), SciPy library (a fundamental library for scientific computing), Matplotlib (a comprehensive library for 2-D plotting), IPython (an enhanced interactive console), SymPy (a library for symbolic mathematics), and pandas (a library for data structure and analysis). Two SciPy libraries that provide a foundation for most others are NumPy and Matplotlib:NumPy is the fundamental package for scientific computing with Python. It contains, among other things, the following:A powerful n-dimensional array objectSophisticated (broadcasting) functionsTools for integrating C/C++ and Fortran codeUseful linear algebra, Fourier transform, and random number capabilitiesThe most up-to-date NumPy documentation can be found at numpy.org/devdocs /. It includes a user guide, full reference documentation, a developer guide, meta information, and “NumPy Enhancement Proposals” (which include the NumPy Roadmap and detailed plans for major new features).Matplotlib: Matplotlib is a Python plotting library that produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.Matplotlib is useful to generate plots, histograms, power spectra, bar charts, error charts, scatterplots, and so on with just a few lines of code. The most up-to-date Matplotlib documentation can be found in the Matplotlib user's guide (matplotlib.org/3.1.1/users/index.html).Moreover, there are three higher-level SciPy libraries that provide the key features for time series forecasting in Python, they are pandas, statsmodels, and scikit-learn for data handling, time series modeling, and machine learning, respectively:Pandas: Pandas is an open-source, BSD-licensed library providing high performance, easy-to-use data structures, and data analysis tools for the Python programming language. Python has long been great for data munging and preparation, but less so for data analysis and modeling. Pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain-specific language like R.The most up-to-date pandas documentation can be found in the pandas user's guide (pandas.pydata.org/pandas-docs/stable /).Pandas is a NumFOCUS sponsored project. This will help ensure the success of development of pandas as a world-class open-source project.Pandas does not implement significant modeling functionality outside of linear and panel regression; for this, look to statsmodels and scikit-learn below.Statsmodels: Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models as well as for conducting statistical tests and statistical data exploration. An extensive list of result statistics is available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open-source Modified BSD (3-clause) license.The most up-to-date statsmodels documentation can be found in the statsmodels user's guide (statsmodels.org/stable/index.html).Scikit-learn - Scikit-learn is a simple and efficient tool for data mining and data analysis. In particular, this library implements a range of machine learning, pre-processing, cross-validation, and visualization algorithms using a unified interface. It is built on NumPy, SciPy, and Matplotlib and is released under the open-source Modified BSD (3-clause) license.Scikit-learn is focused on machine learning data modeling. It is not concerned with the loading, handling, manipulating, and visualizing of data. For this reason, data scientists usually combine using scikit-learn with other libraries, such as NumPy, pandas, and Matplotlib, for data handling, pre-processing, and visualization.The most up-to-date scikit-learn documentation can be found in the scikit-learn user's guide (scikit-learn.org/stable/index.html).

In this book, we will also use Keras for time series forecasting:

 Keras: Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, and Theano. Data scientists usually use Keras if they need a deep learning library that does the following:Allows for easy and fast prototyping (through user friendliness, modularity, and extensibility)Supports both convolutional networks and recurrent networks, as well as combinations of the twoRuns seamlessly on central processing unit (CPU) and graphics processing unit (GPU)The most up-to-date Keras documentation can be found in the Keras user's guide (keras.io).

Now that you have a better understanding of the different Python packages that we will use in this book to build our end-to-end forecasting solution, we can move to the next and last section of this chapter, which will provide you with general advice for setting up your Python environment for time series forecasting.

Machine Learning for Time Series Forecasting with Python

Подняться наверх