Читать книгу Large-Dimensional Panel Data Econometrics - Chihwa Kao - Страница 8
Chapter 1 Introduction
ОглавлениеThis book is motivated by the recent development in high-dimensional panel data models with large amount of individuals/countries (n) and observations over time (T). Specifically, it introduces four important research topics in large panels, including testing for cross-sectional dependence, estimation of factor-augmented panel data models, structural changes and group patterns in panels in the following four chapters. To address these issues, we examine the properties of traditional tests and estimators in large-dimensional setup. In addition, we also take advantage of some techniques in Random Matrix Theory and Machine Learning.
Chapter 2 covers testing for cross-sectional dependence in panel data regression models with large n and large T. Cross-sectional dependence, described as the interaction between cross-sectional units (e.g., households, firms and states, etc.), has been well discussed in the spatial econometrics literature. Intuitively, dependence across “space” can be regarded as the counterpart of serial correlation in time series. It could arise from the behavioral interaction between individuals, e.g., imitation and learning among consumers in a community, or firms in the same industry. This has been widely studied in game theory and industrial organization. It could also be due to unobservable common factors or common shocks popular in macroeconomics.
In recent literature, cross-sectional dependence among individuals is a concern when n is large. As serial correlation in time-series analysis, the cross-sectional of dependence/correlation leads to efficiency loss for least squares and invalidates conventional t-tests and F-tests which use standard variance–covariance estimators. In some cases, it could potentially result in inconsistent estimators (Lee, 2002; Andrews, 2005). Several estimators have been proposed to deal with cross-sectional dependence, including the popular spatial methods (Anselin, 1988; Anselin and Bera, 1998; Kelejian and Prucha, 1999; Kapoor, Kelejian and Prucha, 2007; Lee, 2007; Lee and Yu, 2010), and factor models in panel data (Pesaran, 2006, Kapetanios, Pesaran and Yamagata, 2011; Bai, 2009). However, before imposing any structure on the disturbances of our model, it may be wise to test the existence of cross-sectional dependence.
There has been a lot of work on testing for cross-sectional dependence in the spatial econometrics literature, see Anselin and Bera (1998) for cross-sectional data and Baltagi, Song and Koh (2003) for panel data, to mention a few. The latter derives a joint Lagrange multiplier (LM) test for the existence of spatial error correlation as well as random region effects in a panel data regression model. Panel data provide richer information on the covariance matrix of the errors than cross-sectional data. This is especially relevant for the off-diagonal elements which are of particular importance in determining cross-sectional dependence. With panel data one can test for cross-sectional dependence without imposing ad hoc specifications on the error structure generating the covariance matrix, e.g., the spatial autoregressive model in the spatial literature, or the single or multiple factor structures imposed on the errors in the macro literature. Ng (2006) and Pesaran (2004) propose two test procedures based on the sample covariance matrix in panel data. Ng (2006) develops a test tool using spacing method in a panel model. Pesaran (2004) proposes a cross-sectional dependence (CD) test using the pairwise average of the off-diagonal sample correlation coefficients in a seemingly unrelated regressions model. The CD test is closely related to the RAVE test statistic advanced by Frees (1995). Unlike the traditional Breusch-Pagan (1980) LM test, the CD test is applicable for a large number of cross–sectional units (n) observed over T time periods. In Pesaran (2015), the CD test is interpreted as a test for weak cross-sectional dependence. Sarafidis, Yamagata and Robertson (2009) develop a test for cross-sectional dependence based on Sargan’s difference test in a linear dynamic panel data model, in which the error cross-sectional dependence is modeled by a multifactor structure. Hsiao, Pesaran and Pick (2012) propose a LM-type test for nonlinear panel data models. For a recent survey of some cross-sectional dependence tests in panels, see Moscone and Tosetti (2009). Baltagi, Feng and Kao (2011) propose a test for sphericity following John (1972) and Ledoit and Wolf (2002) in the statistics literature. Sphericity means that the variance–covariance matrix is proportional to the identity matrix. The rejection of the null could be due to cross-sectional dependence or heteroskedasticity or both.
Based on Baltagi, Feng and Kao (2012), Chapter 2 discusses testing procedures in the fixed effects panel data models, including static and dynamic cases. It is well known that the standard Breusch and Pagan (1980) LM test for cross-equation correlation in a SUR model is not appropriate for testing cross-sectional dependence in panel data models when n is large and T is small. We derive the asymptotic bias of this scaled version of the LM test in the context of a fixed effects panel data model. This asymptotic bias is found to be a constant related to n and T, which suggests a simple bias corrected LM test for the null hypothesis.
There are two ways of modeling cross-sectional dependence: spatial models and factor models. In Chapter 3, we introduce three leading approaches of estimating large panel data regression models with an error factor structure: the common correlated effects (CCE) approach proposed by Pesaran (2006), Bai’s (2009) iterated principal components (IPC) approach and the maximum likelihood estimation (MLE) method proposed by Bai and Li (2014). The use of these approaches is illustrated by an empirical example in the context of the productivity of infrastructure investment in China.
Chapter 4 examines the issue of structural changes in large panel data regression models. In the literature on panel data models with large time dimension, e.g., Kao (1999), Phillips and Moon (1999), Hahn and Kuersteiner (2002), Alvarez and Arellano (2003), Phillips and Sul (2007), Pesaran and Yamagata (2008), Hayakawa (2009), to name a few, the implicit assumption is that the slope coefficients are constant over time. However, due to policy implementation or technological shocks, structural breaks are possible especially for panels with a long time span. Consequently, ignoring structural breaks may lead to inconsistent estimation and invalid inference.
Based on Baltagi, Feng and Kao (2016, 2019), Chapter 4 extends Pesaran’s (2006) work on CCE estimators for large heterogeneous panels with a general multifactor error structure by allowing for unknown common structural breaks in slopes and unobserved factor structure. We propose a general framework that includes heterogeneous panel data models and structural break models as special cases. The least squares method proposed by Bai (1997a, 2010) is applied to estimate the common change points, and the consistency of the estimated change points is established. We find that the CCE estimators have the same asymptotic distribution as if the true change points were known. Additionally, Monte Carlo simulations are used to verify the main findings.
By considering both cross-sectional dependence and structural breaks in a general panel data model, this chapter also contributes to the change point literature in several ways. First, it extends Bai’s (1997a) time-series regression model to heterogeneous panels, showing that the consistency of estimated change points can be achieved with the information along the cross-sectional dimension. This result confirms the findings of Bai (2010) and Kim (2011). Second, it also enriches the analysis of common breaks of Bai (2010) and Kim (2011) in a panel mean-shift model and a panel deterministic time trend model by extending them to a regression model using panel data. This makes it possible to allow for structural breaks and cross-sectional dependence in empirical work using panel regressions. In particular, our methods can be applied to regression models using large stationary panel data, such as country-level panels and state/provincial-level panels.
Regarding estimating common breaks in panels, Feng, Kao and Lazarova (2009) and Baltagi, Kao and Liu (2012) also show the consistency of the estimated change point in a simple panel regression model. Hsu and Lin (2012) examine the consistency properties of the change point estimators for non-stationary panels. More recently, Qian and Su (2016) and Li, Qian and Su (2016) study the estimation and inference of common breaks in panel data models with and without interactive fixed effects using Lasso-type methods. Westerlund (2019) establishes the consistency of least squares estimator of break point in a mean-shift model with fixed T, using the CCE approach to deal with unobserved error factors. In terms of detecting structural breaks in panels, some recent literature includes Horváth and Hušková (2012) in a panel mean-shift model with and without cross-sectional dependence, De Wachter and Tzavalis (2012) in dynamic panels, and Pauwels, Chan and Mancini-Griffoli (2012) in heterogeneous panels, Oka and Perron (2018) in multiple equation systems, to name a few.
Chapter 5 studies heterogeneity and grouping issues in large dimensional panel data models. When a large number of individuals/countries are involved in the regression, it is costly to allow for individual unobserved heterogeneity, for example, fixed effects, which may lead to incidental parameter problem in the regression. One way to balance between modeling heterogeneity and incidental parameters is grouping. With within-group homogeneity and cross-group difference, we can still allow for a certain degree of heterogeneity and avoid incidental parameter problem.