WP/08/75
Testing for Structural Breaks in Small
Samples
Sergei Antoshin, Andrew Berg, and
Marcos Souto
© 2008 International Monetary Fund
WP/08/75
IMF Working Paper
African
Department
Testing for Structural Breaks in Small Samples
Prepared by Sergei Antoshin, Andrew Berg, and Marcos Souto1
March 2008
Abstract
This Working Paper should not be reported as representing the views of the IMF.
The views expressed in this Working Paper are those of the author(s) and do not necessarily represent
those of the IMF or IMF policy. Working Papers describe research in progress by the author(s) and are
published to elicit comments and to further debate.
In a recent paper, Bai and Perron (2006) demonstrate that their approach for testing for
multiple structural breaks in time series works well in large samples, but they found
substantial deviations in both the size and power of their tests in smaller samples. We
propose modifying their methodology to deal with small samples by using Monte Carlo
simulations to determine sample-specific critical values under the null each time the test is
run. We draw on the results of our simulations to offer practical suggestions on handling
serial correlation, model misspecification, and the use of alternative test statistics for
sequential testing. We show that, for most types of data generating processes in samples with
as low as 50 observations, our proposed modifications perform substantially better.
JEL Classification Numbers: C29, C39, C59
Keywords: Structural breaks, small samples, Monte Carlo simulation.
Author’s E-Mail Address: santoshin@imf.org, aberg@imf.org, msouto@imf.org.
1 We thank without implication for numerous useful conversations with Jonathan Ostry and Jeromin
Zettelmeyer, who suggested the sequential UDmax procedure presented here, and for useful comments from
Jushan Bai, Sam Ouliaris, Zhongjun Qu, and Pierre Perron. Zhongjun also guided us through Perron’s code on
structural breaks, which we have modified and used extensively in this study.
2
Contents Page
I. Introduction ............................................................................................................................3
II. The BP Methodology ............................................................................................................4
III. A Modified BP Methodology for Small Samples................................................................9
IV. Results................................................................................................................................11
V. Summary and Conclusion ...................................................................................................14
Tables
1. Size Tests—Weiner DGP with No Breaks ..........................................................................21
2. Size Tests with Autocorrelated DGP with No Breaks and Parametric Estimation..............22
3. Size Tests with Autocorrelated DGP with No Breaks and Standard Errors Robust
to Serial Correlation..........................................................................................................23
4. Size Tests with DGP with No Breaks and Over- and Under-Specification of Degree of
Autocorrelation ..................................................................................................................24
5. Power Tests..........................................................................................................................25
Appendixes
Appendix..................................................................................................................................18
References
References................................................................................................................................16
3
I. INTRODUCTION
In a series of influential papers, Bai and Perron (1998, 2003a and 2003b, henceforth
BP) developed a methodology for finding multiple structural breaks in time series and testing
for their statistical significance. The simulation analysis conducted in BP (2006)
demonstrates that the size and power of their tests can be significantly distorted by several
factors, such as: 1) a small sample size, 2) a small break size, 3) a small segment size and
breaks clustering, and 4) the use of heteroskedasticity and autocorrelation corrections.
In this paper, we extend the BP methodology in several directions, all aimed at
improving small-sample (time series with as low as 50 observations) performance. First, in
tests for significance of structural breaks, we propose to use critical values that are specific to
the time series in question, instead of relying on the asymptotic critical values (i.e.,
bootstrap). The asymptotic critical values in BP are generated for Wiener (white Gaussian
noise) processes with a large number of observations, and can cause considerable distortions
in the test size and power for small samples with a non-Wiener data generating process.
We instead estimate a “mimicking process” from the data under the null and bootstrap
critical values at each step of the sequential procedure, under the corresponding null
hypothesis. The use of bootstrapped segment-specific residuals allows us to: calculate
sample-size-specific critical values; relax the assumption of the normal distribution of the
residuals; and account for segmental heteroskedasticity.
Second, we address the issue of misspecification of the data generating process. In the
presence of serial correlation, BP consider two alternative approaches to modeling the
underlying data generation process. The first approach is to model the process explicitly
(e.g., as an AR(1)), so that the error terms are independently identically distributed (iid). The
second approach is to model the process in a simple way (e.g., as a Wiener process), and to
use a heteroskedasticity-autocorrelation-consistent (HAC) correction. In the general case,
when the nature of the process is unknown, the first approach may yield over-specified tests,
while the second may yield underspecified statistics in small samples.
Third, we examine the small-sample performance of the two statistics put forward by
BP for testing for an unknown number of multiple breaks. After finding a first break, BP
suggest testing sequentially for two breaks versus a null of one break by testing for the
existence of one break in each of the two segments formed by the initial break (the sequential
supF test), and so on, until the null hypothesis is not rejected. We compare this approach to a
variant that uses another BP statistic to test for any number of breaks in each segment (the
sequential Dmax test). We show (?) that the performance of the sequential supF test can be
poor when the segment size becomes small.
We focus on a sample size of 50 observations, where the true number of breaks is as
high as two. This case is partly inspired by a companion paper (Berg, Ostry and Zettelmeyer,
4
2006) which uses the techniques presented here to characterize and analyze breaks in annual
per capita GDP growth for a broad sample of countries.
The rest of the paper is organized as follows. In Section II, we briefly review the BP
methodology, focusing on the empirical procedure and simulation analysis. In Section III, we
outline our strategy on how to modify and apply the BP methodology for small samples. The
results from our Monte Carlo simulations are presented in Section IV. Section V includes
discussion and concluding remarks.
II. THE BP METHODOLOGY
Drawing heavily on Bai and Perron (1998, 2001), we summarize the main
elements of their methodology for estimating and testing linear models for multiple structural
changes, focusing on the ones that are most relevant to our analysis in Section III.
The BP methodology can be disentangled in two separate and independent parts.
First, one can identify any number of breaks in a time series, regardless of statistical
significance. Second, once the breaks have been identified, BP propose a series of statistics
to test for the statistical significance of these breaks, using asymptotic critical values. As we
shall see in more details below, these statistics can yield significant deviations in both size
and power, especially when dealing with small time series (with as low as 50 observations).
It is worth stressing one finite-sample complication involved in testing for the
statistical significance of a set of breaks, which forms the second part of the BP
methodology. The usual method is to use the F ratio that compares the SSR for the restricted
versus the unrestricted model. For example, in testing for the presence of one break, the F
ratio is the ratio between the SSR for 0 breaks over the SSR for one break. Because the
breaks are found through a global minimization procedure, there are instances when the set
of t breaks is not a subset of t +1 breaks. In this case, the hypothesis of t +1 breaks does not
nest the hypothesis of t breaks, and the SSR / SSR ratio does not have the property of
t 1
+
t
asymptotic convergence to the F-distribution. In particular, its asymptotic distribution
depends on sample-specific parameters, such as the size of the break.
BP propose to overcome this problem by always testing for the presence of one break
versus 0 breaks in the segments between breaks, thus avoiding the issue of non-nested
hypotheses. But this solution comes at a price, particularly when dealing with an already
small time series: the segments will be even smaller and the statistics will need to be
computed/calculated with just a few observations.
One important advantage of the BP framework is its capability of allowing for
autocorrelation and heteroskedasticity in the time series, as compared to other breaks
selection procedures that cannot accommodate these features (e.g., the Bayesian Information
Criteria by Yao (1988) and the modified Schwarz criterion proposed by Liu et al. (1997)).
This feature is of particular importance in BP methodology, as their statistics utilize
5
asymptotic critical values that are generated for a Wiener process. To deal with
autocorrelation in a non-parametric fashion, BP propose to correct the time series residuals
either through a Newey-West procedure or by including the lag of the time series as one of
the regressors in the projection model.
BP provide tables with asymptotic critical values for all statistics (at main confidence
levels), for a Wiener process. When dealing with smaller time series, BP recommend using a
larger segment size, relative to the sample size. BP also suggest using the autocorrelation and
heteroskedasticity correction only when there is a strong prior that the correction is
necessary.
A. The model
BP adopt the following model:
y = x β
′ + z δ
′ + u ,
(1)
t
t
t
j
t
for j = 1, , m +1
K
, where m is the number of breaks, y is the dependent variable, x and z
t
t
t
are vectors of covariates, β and δ are the corresponding vectors of coefficients, and u is
j
t
the disturbance term.
This model has some interesting features. First, it allows for joint the estimation of
the regression coefficients, through the term x β
′ , along with the identification of structural
t
changes, captured through the term z δ
′ , which may be useful for several applications.
t
j
Second, equation (1) represents a partial structural model, since the parameter vector β is
not subject to shifts and is estimated using the entire sample. Dropping the term x β
′ from
t
equation (1) results in a pure structural change model, where all coefficients are subject to
change, and is the model used for the analysis in this paper. Finally, u can be non iid under
t
the null.
For locating the breaks, BP propose two approaches using (1). In the first, global,
approach, each partition m , where m is the number of breaks, is obtained as the one that
minimizes the sum of square residuals (SSR). In other words, the break locationsT ,
i
m 1
+
i
T
i = 1,..., m , are determined so as to minimize
2
∑ ∑ [y − x β′ − z δ′ ] . BP use a dynamic
t
t
t
j
i 1
= t=
+
i
T 1 1
−
programming algorithm so as to optimize the computational time when finding the global
SSR-minimizing breaks.
In the second approach, breaks are determined sequentially, starting with the single
break that minimizes the SSR. Then, for each resulting partition, the single break that
minimizes the SSR is determined. The second break is the one with the minimum SSR
between the two. This process is repeated sequentially to find further breaks. The search for
6
the breaks that minimize SSR is implemented regardless of whether these breaks are
statistically significant or not. As it turns out, the test for the existence of breaks can be done
separately, which will be discussed below.
The procedure of global minimization has the advantage of assuring that only
the biggest breaks (i.e., those that cause the biggest reduction in the SSR) will be selected
(as opposed to the sequential breaks selection), at least asymptotically. This distinguishes the
approach from others that proceed sequentially (e.g. Altissimo and Corradi (2003)). 2 The
main disadvantage, as we shall see, which is related to the fact that, for a particular time
series, the biggest n breaks may not all be included among the biggest n+1 breaks. This issue
poses significant challenges for sequentially testing for the significance of the breaks, as the
tested hypotheses will in general be non-nested.
B. Testing for the existence of breaks
The statistics proposed by BP for multiple breaks are generalizations of Andrews
(1993) test for the single structural change case, and are shown to be robust to serial
correlation and heterogeneity of the residuals under the null.
B.1. Zero versus a fixed number of breaks
In this case, one wants to test the null hypothesis of no breaks against the alternative
of a known number of breaks k . The test is calculated as the usual F-ratio between the SSE
for the null (‘unrestricted’ SSE) and the SSE for the alternative hypothesis (‘restricted’ SSE).
In other words, it is simply the conventional test of the null δ = ... = δ
against the
1
k 1
+
alternativeδ ≠ δ , for some i , where δ is the vector of coefficients attached to the
i
i 1
+
covariate z in the pure structural change model. For the global minimized breaks, this test is
referred to as the sup F(0, m ).
One problem with this formulation relates to the estimation of the variance-
covariance matrix for δ , which is part of the formula for the F-statistic and which may
become quite cumbersome to compute in the presence of autocorrelation and
heteroskedasticity in the error term. To overcome this problem, BP propose to estimate a
much simpler variance-covariance matrix for δ that is equivalent asymptotically. However,
it has been shown that this simplification can introduce a source of potential size and power
distortions, particularly when this test is used in small time series.
2 Sequential methodologies first find the single break that minimizes the SSR. If this break is found to be
statistically significant, then they move to find the second break, given the existence and location of the first
break, that minimize the SSR, and so forth.
7
B.2. Zero versus an unknown number of breaks
The number of breaks is often not known, and the standard F-statistic becomes
insufficient for testing for the existence of breaks. In this case, BP propose variations of the
sup F(0, m ) test, which are called double maximum tests and are defined as:
D
= max a sup F(0, n) ,
(2)
max
( n
)
n 1,
=
,m
K
where the weights a can be equal to 1, for all n = 1, , m
D
statistic is
n
K
. In this case, the max
called UD
test. More generally, a can be a function of the asymptotic critical values for
max
n
the sup F(0, n) , so as to make the marginal p-values equal across the values of n, in which
case the D
statistic is called WD
test. It is important to note that since the D
max
max
max
statistics are based on the sup F(0, m ), finite sample distortions in the estimation of the
variance-covariance matrix for δ will also affect the size and power of the UD
and
max
WD
tests..
max
B.3. l versus l +1 breaks
Similarly to the F(0, m ) ratio, the F( l +1| l ) ratio also relates the ‘unrestricted’ SSE
(for l breaks), to the ‘restricted’ SSE(for l +1 breaks). Calculating the F( l +1| l ) ratio is
equivalent to estimating l +1 tests of the null of zero breaks against the alternative of a single
break. More specifically, the test decides in favor of the null whenever the sum of SSE for
the optimal l + 2 partitions (or l +1 breaks) is sufficiently larger than that for l +1 partitions
(or l breaks). A complicating factor is that the critical values of the statistic under the null
l +1 depend on sample-specific factors, such as the break size and the properties of the
residual. BP propose an alternative approach that uses the sup F(0,1) (testing for the presence
of one significant break) in each of the partitions. If the null of 0 breaks can be rejected
against the alternative of one break in at least one of the l +1 partitions, then BP approach
establishes that l+1 breaks are statistically significant.
B.4. Criteria for finding the number of breaks
The number of significant breaks can be found via information criteria, such as the
Bayesian Information Criterion (BIC), proposed by Yao (1988); and the modified Schwarz
criterion (LWZ), proposed by Liu et al. (1997). It is also possible to determine the number of
breaks by estimating a sequence of sup F statistics, as suggested by BP. The basic steps
would include testing for the presence of one break via the sup F(0,1) and moving forward to
test for the presence of l +1 breaks, via the F( l +1| l ) ratio, stopping when the null is not
rejected. The variance-covariance of δ embedded in these tests, is robust to
heteroskedasticity and auto-correlation. Thus, the BP approach accounts for these features,
unlike the information criteria-based approaches.
8
The BP approach may, however, incorrectly estimate the number of significant breaks
in some situations, particularly when time series have more than one break and the regimes
switch up and down. To illustrate this point, consider Figures 1a and 1b.
2.5
2
1.5
1
0.5
0
1
6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
(a)
One
break.
2.5
2
1.5
1
0.5
0
1
6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
(b)
Two
breaks.
Figure 1: The problem of sequential testing for determining the number of breaks.
In the situation depicted on Figure 1, a sequence of the sup F( l +1| l ) tests may fail to
detect the correct number of breaks. The test of one break against the null of zero breaks may
lack power, because the alternative of one break is badly misspecified. According to the BP
sequential algorithm, there is no test of whether the sup F(1,2) will reject the null of one
break in favor of the alternative of two breaks. This problem can be reduced by using the
Dmax statistics in the first step (as proposed by BP (2001)), but the identification problem
will still persist when there may be a greater number of breaks.
Add New Comment