MODELLING THE HIDDEN ECONOMY AND THE TAX-GAP
IN NEW ZEALAND
David E. A. Giles*
Revised, February 1999
Department of Economics, University of Victoria
P.O. Box 1700, STN CSC, Victoria, B.C.
Canada, V8W 3P5
(FAX: (250) 721-6214; Voice: (250) 721-8540; e-mail: dgiles@uvic.ca )
Proposed Running Head:
New Zealand Hidden Economy
Keywords:
Underground Economy; Latent Variables; Tax Avoidance; Tax
Evasion; Tax-Gap
JEL Classification(s):
C32; C51; E32; E41
*
I am grateful to Patrick Caragata, for initiating and supporting this research, and for his many
contributions which have greatly improved this paper. Earlier versions were discussed at Workshops
on the Health of the New Zealand Tax System, Wellington, 1995. I would like to thank Daniel
Aldersley, Lief Bluck, Johannah Branson, Phil Briggs, Linda DeBenedictis, Erwin Diewert,
Johannah Dods, Paul Dunmore, Michael Dunn, Ed Feige, Judith Giles, Chris Gillion, Anna Heiller,
Knox Lovell, Ewen McCann, Michael O'Connor, Gerald Scully, John Small, Adolf Stroombergen,
and Ken White for their many comments, suggestions, and assistance with data. The insightful
comments of two referees led to a significant improvement of this paper, including the addition of
Appendix II. The content this paper is the responsibility of the author, and should not be attributed
to Inland Revenue New Zealand, which financed this study. The author's related papers on the hidden
economy and tax evasion are available in Adobe pdf format on the internet at
http://web.uvic.ca/econ/economet_he.html.
Abstract
This paper develops and estimates a structural, latent variable, model for the hidden economy in New
Zealand, and a separate currency-demand model. The estimated latent variable model is used to
generate an historical time-series index of hidden economic activity, which is calibrated via the
information from the currency-demand model. Special attention is paid to data non-stationarity, and
to diagnostic testing. Over the period 1968 to 1994, the size of the hidden economy is found to vary
between 6.8% and 11.3% of measured GDP. This, in turn, implies that the total tax-gap is of the
order of 6.4% to 10.2% of total tax liability in that country. Of course, not all of this foregone
revenue would be recoverable, as not all of the activity in the underground economy is responsive
to changes in taxation or other policies.
1
I. INTRODUCTION
Foregone tax revenue resulting from the underground economy is a major, and apparently growing,
problem. We describe a modelling methodology which yields a time-series of the underground
economy for New Zealand, from which a series of the "tax-gap" can be obtained. there have been
no previous attempts to obtain such measures for New Zealand previously, but this is a topical issue
in view of the current political debate on taxation policy and taxation compliance in that country.
The hidden economy and tax-gap have sizeable budgetary implications, and implications for taxation
incidence and income distribution. For instance, if a principal cause of growth in the hidden
economy is an actual, or perceived, increase over time in the tax burden, then an increase in (average
or marginal) tax rates may reduce revenue and worsen the budget deficit. Similarly, if there is a
significant hidden component to economic activity, then many economic indicators will be measured
with error. Finally, there are political and social implications - a flourishing informal sector may
reflect dissatisfaction, on the part of the electorate, with the degree of regulation of their activities.
There is an extensive literature on the measurement of the hidden economy, and section II discusses
the major methods that have been used to address this issue. Our own econometric methodology is
described in section III; data issues are discussed in section IV; and sections V and VI discuss the
formulation and estimation of our models. section VII provides estimated time-paths for the hidden
economy and the tax-gap in New Zealand, and our conclusions are summarized in section VIII.
II. MEASURING THE HIDDEN ECONOMY
The evidence on the actual size of the hidden economy is very mixed. Frey and Weck-Hanneman
(1984) report that for seventeen OECD countries in 1978, the size of the underground economy
(relative to GNP) varied from 4.1% for Japan, through 8.0% for the UK and 8.3% for the USA, to
13.2% in the case of Sweden, and with Canada at the sample mean of 8.8%. In more recent work,
Schneider (1997) found that the average OECD figure had risen to about 15% of GDP by 1994, with
Canada still close to this international average. The latter figure can be compared with the 5% to 7%
2
of GDP that Mirus and Smith (1994) estimate for Canada in 1976, rising to almost 15% in 1990.
Spiro (1994) estimates the Canadian underground economy at between 8% and 11% of GDP in 1993.
Other studies summarised by Aigner et al. (1988) report figures for the USA in 1978 which range
from 4% (Park (1979)) to 33% (Feige (1982)) of GNP. On the other hand, evidence for the USA in
1970 yields a range, for this ratio, from 2.6% (Tanzi (1983)) to 11% (Schneider and Pommerehne
(1985)). Bhattacharyya (1990) estimates the hidden economy for the UK to be 3.8% of GNP in 1960,
with a peak of 11.1% in early 1976, and averaging around 8% during 1984; while a British Inland
Revenue analysis reported by Chote (1995) suggests that the hidden economy comprises 6% to 8%
of GDP. The available evidence is varied and imprecise, but the results of our study are consistent
with the more robust of the above numbers. There are several surveys of the literature on measuring
the hidden economy, including those of Blades (1982), Boeschoten and Fase (1984), Carter (1984),
Frey and Pommerehne (1982, 1984), Gaertner and Wenig (1985), Kirchgaessner (1984), Weck
(1983), and Tedds (1998).
As well as providing information about the range of the international estimates, these surveys
discuss the different techniques (and their strengths and weaknesses) that have been used by various
authors. One criticism of most of these approaches is that they focus on one cause of underground
economic activity, and one indicator. In contrast, Frey and Weck-Hannemann (1984), Aigner et al.
(1988), and Tedds (1998) use "latent variable" structural modelling to measure the size of the hidden
economy. The (unobservable) latent variable here is the extent of underground activity, perhaps
expressed as a percentage of measured real GDP. The MIMIC ("Multiple Indicators, Multiple
Causes") model of Zellner (1970), Goldberger (1972), Jöreskog and Goldberger (1975), and others
allows for several "indicator" variables and several "causal" variables in forming structural
relationships to "explain" the latent variable. This latent variable/MIMIC model approach forms the
basis for our own analysis here.
3
III. A MODELLING METHODOLOGY
The MIMIC model is a variant of the LISREL ("Linear Interdependent Structural Relationships")
models, of Jöreskog and Sörbom (1993a,b) and others. A MIMIC model uses observable data on a
range of "causal" variables, and a range of data on observable "indicator" variables, to "predict" the
values for one or more unobserable ("latent") varaibles. This type of model yields only an time-series
index for the latent variables - in our case there is just one such variable, namely the size of the
underground economy relative to the size of measured GDP. Accordingly, some sort of extraneous
information is needed to calibrate the index so that we can then construct a cardinal time-path of the
underground economy. Once the underground activity is measured, the effective tax rate (i.e., the
ratio of tax revenue to GDP) can be used to obtain an estimate of the size of the "tax-gap", and to
address other policy issues.
We calibrate our hidden economy index via the estimation of a particular currency demand equation.
Our currency-demand equation differs from the interesting model proposed by Bhattacharyya (1990),
also in the context of underground activity. We allow for different velocities of circulation in the
"hidden" and "recorded" sectors; explicitly "explain" hidden activity; and avoid a functional
approximation in his approach. We allow for the non-stationarity of our time-series data, which he,
and others, do not. Interestingly, our results imply a long-run average value for the "size" of the
hidden economy that is almost identical to that obtained by using Bhattacharyya's approach in an
earlier version of our work (Giles (1995, 1997a)), as is discussed briefly in Appendix II.
In our model, measured (nominal) currency demand is :
M = ' Y Y R P ,
t
0
Rt 1
Ht 2
t 3
t 4
(1)
where Y and Y are "recorded" and "hidden" real output or income, R is a short-term interest rate
Rt
Ht
t
variable, and P is the price level. The unobservable ratio of "hidden" to "recorded" activity is taken
t
to be a function of variables such as the rate of growth in measured output; the inflation rate and the
change in the latter; variables measuring the extent of the tax burden; and one to allow for the
4
introduction of the Goods and Services Tax (GST) in October 1986. The latter is included because
Inland Revenue Department (IRD) records suggest that the introduction of this tax in 1986 (together
with the simultaneous abolition of sales taxes and dramatic changes to the personal and sales tax
scales) had a negative impact on unrecorded activity, especially among the self-employed. The
inflation rate is included to allow for the upward "creep" of taxpayers through the tax brackets that
it causes, and the associated incentive for tax-payers to engage in unreported activities. A more
pervasive effect of inflation is that, as it tends to be uneven across sectors, it alters income
distribution, and this may induce disrespect for tax law. The change in the rate of inflation is
included in equation (2) below because such variability adds to uncertainty, and strengthens the
incentive to enter the hidden economy as a means of risk or cost reduction. So, we have:
(Y / Y ) = + GST + log Y + logP + (logP ).
Ht
Rt
1
2
t
3
Rt
4
t
5
t
(2)
Solving (2) for Y , substituting in (1), taking (natural) logarithms, adding an error term and dummy
Ht
variables to allow for deterministic seasonality, and for the introduction of "EFTPOS" ("Electronic
Fund Transfer at Point of Sale"), bank debit card electronic retail transactions in lieu of cash in
1987.2:
m = + ( + )y + log[ + GST + gdp + p + (p )]
t
0
1
2
Rt
2
1
2
t
3
t
4
t
5
t
+ r + p + S + S + S + DEFT + ,
3 t
4 t
1
1t
2
2t
3
3t
4
t
(3)
where = log( '), lower case symbol denote natural logarithms of the variables, S is the i'th
0
0
i
seasonal dummy, and DEFT is the EFTPOS dummy. (We also considered adding a variable for the
value of EFTPOS transactions as a regressor, without success. A "dynamic" version of the model,
incorporating a lagged value of the dependent variable as an additional regressor was also
considered, as were the inclusion of various "tax burden" variables in equation (2). None of these
refinements produced satisfactory results
In Table 1 below we also report on specifications of (1) which include (P / P ), (P / P ), or their
t
t-1
t
t-4
lagged values, as regressors with a coefficient denoted . Estimates of the 's and of can be used
5
i
5
with (2) to measure (Y / Y ) at each point in the sample. These values are of less interest than those
Ht
Rt
obtained from the MIMIC model, as they are based rather narrowly on a single-equation model, but
they provide a useful cross-check on orders of magnitude. The estimate of in Table 1 is also
1
especially important in its own right as it measures the "long-run average" value for this ratio, and
is used for the calibration of the MIMIC model.
Our MIMIC model of the hidden economy is formulated mathematically as follows: is the scalar
(unobservable) "latent" variable (the size of the hidden economy); y' = (y , y , ....., y ) is a vector of
1
2
p
"indicators for ; x' = (x , x , ....., x ) is a vector of "causes" of ; and are (p×1) and (q×1)
1 2
q
vectors of parameters; and and are (p×1) and scalar random errors. It is assumed that and all
of the elements of are Normal and mutually uncorrelated, with Var.() = 5, and Cov.( ) = . The
MIMIC model is :
y = +
(4)
= 'x + .
(5)
Substituting (5) into (4), the MIMIC model can also be viewed as a multivariate regression model,
y = $x + z ,
(6)
where $ = ', z = + , and Cov.(z) = '5 + .
The p-equation model in (6) has a regressor matrix of rank one, and the error covariance matrix is
also constrained. Accordingly, we cannot obtain cardinal estimates of all of the parameters. Only
certain "estimable functions" of the parameters can be identified, so we can estimate the relative
magnitudes of the parameters, but not their levels. Thus, the estimation of (4) and (5) requires a
normalization for (4), which is generally achieved by constraining one element of to a pre-assigned
value. As both y and x are observable data vectors, the multi-equation model in (6) can then be
estimated by conventional (restricted) Maximum Likelihood Estimation - in our case we have used
6
the LISREL package (Jöreskog and Sörbom (1993a,b)) to obtain consistent and asymptotically
efficient estimates of the elements of $, and hence of and .
Given an estimate of the vector, and setting the error term to its mean value of zero, equation (5)
enables us to "predict" ordinal values for (which in our case is the hidden economy) at each sample
point. Then, if we have a specific value for at some sample point, obtained form some other
source, we can convert the within-sample predictions for into a cardinal series. We use the
"average" value for (Y / Y ) from our estimated currency demand equation (i.e., our estimate of
Ht
Rt
) to calibrate our time-series for the hidden economy by setting the latter to this value in 1981.
1
IV. DATA ISSUES
The variables are defined in Appendix I. Given the limitations of quarterly New Zealand time-series
data, our MIMIC models have been estimated with annual data, for 1968 to 1994, but some
experimentation with simple quarterly MIMIC models yielded strikingly similar results. Our
currency demand model has been estimated with quarterly data for 1975.1 1994.4. Considerable
attention has been paid to testing for stationarity and cointegration, and this appears to be the first
application of a MIMIC model which addresses these issues.
The logarithms or levels of the series, as appropriate, have been tested for unit roots at the
appropriate frequencies. Complete details of these unit root test results are given by Giles (1995,
1997a). Following Dickey and Pantula (1987), we test I(3) against I(2). If we reject I(3) we then test
I(2) against I(1). Then we test I(1) against I(0), as appropriate. We have used the "augmented"
Dickey-Fuller (ADF) test (e.g., Said and Dickey (1984)) to test for unit roots at the zero frequency.
The quarterly data are not seasonally adjusted, and in this case we include a drift and seasonal
dummy variables in the ADF regression and choose an augmentation level of at least three. This is
based on the evidence provided by Ghysels et al. (1994). The lower limit of p=3 was never binding,
as can be seen from Table 1. The dummy variables (S , S , and S ) allow for deterministic
1t
2t
3t
seasonality in the data, and in this case the ADF regression is always fitted with a "drift" term. Dods
and Giles (1995) show that for samples of our size a preferred method involves choosing this number
7
so that the autocorrelation and partial autocorrelation functions for the residuals of the ADF
regression are "clean", and this is the procedure followed here. To determine if a time-trend should
also be included in the ADF regression, we follow the Dolado et al. (1990) sequential testing
strategy. The series PUBEMP exhibits a major structural break in its trend from 1988, and as this
will distort the ADF "t-tests" in favour of not rejecting a unit root, Perron's (1989) modified test has
been used in this case.
With the quarterly data we also allow for stochastic seasonality, and test for unit roots at the zero,
%, and (%/2) frequencies, following Hylleberg et al. (1990) (or HEGY hereafter) and Ghysels et al.
(1994). We have determined the augmentation levels in the HEGY regressions in the same way as
for the ADF tests. Following the recommendations of Ghysels et al. (1994), we include a trend, drift,
and seasonal dummy variables in the HEGY regressions.
V. ESTIMATING THE CURRENCY DEMAND MODEL
Our currency demand model is given in equation (3), and it contains several non-stationary variables.
The stationarity of the regressor in (3) whose coefficient is is unclear - this term is both non-linear
2
and unobservable; but p and m are both I(2), and y and r are I(1), so we have an "unbalanced
t
t
t
t
regression". We cannot simply "filter" the series according to their orders of integration, as this
generates many negative observations, making the estimation of the model impossible. Estimating
the model without filtering the data would result in a "spurious regression" (Granger and Newbold
(1974)). One possibility is to exploit any cointegration among the variables, and estimate (3) directly
as a long-run cointegrating relationship, resulting in valid asymptotic inferences. Testing for
cointegration is complicated, here, given the mixture of I(1) and I(2) variables, the non-linear model,
and the possibility of seasonal cointegration. Given these problems, a simple but somewhat indirect
cointegration testing strategy has been followed. The HEGY tests indicate that the only potential for
cointegration is at the zero frequency, but this is rejected when standard Engle-Granger tests are
applied. We then apply Haldrup's (1994) tests for cointegration involving I(2) data, using within-
sample predictions for the series, log[ + GST + gdp + p + (p ) ]. The null of no
1
2
t
3
t
4
t
5
t
cointegration is again easily rejected, providing reasonable justification treating our estimated
8
"unbalanced" regressions as long-run equilibrium relationships. More complete details of this aspect
of the modelling work are given by Giles (1997a).
The results of estimating the currency demand model, by Maximum Likelihood, using the SHAZAM
(1993) package, over the period 1975.1 to 1994.4, appear in Table 3 for our preferred specification,
together with some alternative specifications, including several in which the inflation rate enters the
basic equation (1), with a coefficient denoted . (In Models 3 and 5 the regressor associated with
5
is the current quarterly rate of inflation; in Model 2 it is this rate lagged one period; and in Model
5
4 it is the lagged annual inflation rate.) These results illustrate the robustness of our estimate of the
long-run average ratio of "hidden" to "measured" output, . A range of conventional diagnostic tests
1
for the preferred specification appear in Table 4.
The within-sample averages of the "predicted" (Y / Y ) ratio, from equation (2), range from 8.4%
Ht
Rt
to 8.7% across the models. In "Model 1" this estimated ratio varies from 5.5% to 10.8% over the
sample, and these values may be compared with the data in Figure 1 below. The estimates of ,
1
which represent long-run average values for (Y / Y ) generally are very "sharp", and are consistent
Ht
Rt
with the value arrived at from a different currency-demand model - a modification of that of
Bhattacharyya (1990) - in Appendix II. The estimate of is 8.9% for Model 1, and that the
1
corresponding sample average of (Y / Y ) is 8.7%, so we use 8.8% in 1981.4, which is where the
Ht
Rt
sample mean of the GDP series occurs, to calibrate our MIMIC models below by setting the
"predicted" hidden economy series to this value in 1981.
The sample correlation between actual and "fitted" m is 0.99 for all of the models in Table 3, and
t
the estimated coefficients have the expected signs. As = log( '), its sign is ambiguous, even
0
0
though we expect > 0. The anticipated sign of is also ambiguous: we might expect high
0
5
inflation to lead to a reduction in the holding of nominal balances, including currency; or, as the
estimated Models 2 to 4 are non-homogeneous in prices, a positive estimate of may reflect the
5
effect of inflationary expectations. This is not an issue in the preferred Model 1. Although the
significance of the individual parameter estimates is "mixed", many of the key parameters are
precisely estimated. Testing the appropriate non-linear restrictions on the parameters with Wald tests,
9
Add New Comment