Robust Regression Analysis:
Some Popular Statistical Package Options
Robert A. Yaffee
Statistics, Social Science, and Mapping Group
Academic Computing Services
Information Technology Services
Acknowledgements: I want to express my gratitude to Annette Sanders, Colin Chen, and
Kevin Meyer of SAS, Institute for their gracious assistance.
Robust regression analysis provides an alternative to a least squares regression
model when fundamental assumptions are unfulfilled by the nature of the data. When the
analyst estimates his statistical regression models and tests his assumptions, he frequently
finds that the assumptions are substantially violated. Sometimes the analyst can transform
his variables to conform to those assumptions. Often, however, a transformation will not
eliminate or attenuate the leverage of influential outliers that bias the prediction and
distort the significance of parameter estimates. Under these circumstances, robust
regression that is resistant to the influence of outliers may be the only reasonable
recourse. In this article, some robust regression options within popular statistical
packages –including SAS9.0, STATA7, S-PLUS6.1, E-Views, LIMDEP8 --are
Ordinary Least Squares Model Assumptions
There are several assumptions that have to be fulfilled for the ordinary least
squares regression model to be valid. When the regression model does not meet the
fundamental assumptions, the prediction and estimation of the model may become
biased. Residuals, differences between the values predicted by the model and the real
data, that are very large can seriously distort the prediction. When these residuals are
extremely large, they are called outliers. The outliers will inflate the error variance.
They inflate the standard errors. The confidence interval becomes stretched. The
estimation cannot become asymptotically consistent. Outliers that bias the parameter
estimates are those with leverage. They are called bad leverage points, whe reas outlier
that lie along the predicted line are those called good leverage points. When outliers
inflate the error variance, they sap the model of power to detect the outliers.
One of the basic assumptions of regression analysis is equality of the error
variance along the predicted line, a condition called homoskedasticity. Homoskedasticity
provides a modicum of uniformity to the confidence intervals. If the residual distribution
is normally distributed, the analyst can determine where the level of significance or
rejection regions begin. Even if the sample size is large, the influence of the outlier can
increase the local and possibly even the global error variance. This inflation of error
variance decreases the efficiency of estimation.
Another form of violation resides in the lack of independence of observations.
This lack of independence of the errors can manifest itself in terms of residual
autocorrelation, which can further bias the estimation of significance tests as the error
variance becomes artificially compressed by residual correlation. When this happens, the
R2, F and t values become inflated. Failures of these assumptions can predispose output
toward false statistical significance. However, failure of basic classical regression model
assumptions can be detected with the proper tests.
Another assumption is a normality of the residuals. When there are violations of
the assumption of normality of the residuals in ordinary least square regression analysis,
the estimation of significance becomes impaired.
Model Assumption Tests
The researcher should not only know what assumptions have to be fulfilled for the
model to be valid, he should also know how to test them for fulfillment and what
statistical packages contain which tests. Frequently, fundamental assumptions--such as,
independence of observations, linear functional form, no influential outliers, sufficiently
large sample size, normality and homoskedasticity of the residual distribution--are not
adequately fulfilled. To test for independence of observations, the analyst may apply a
runs test, a Durbin-Watson d or h test, or an autocorrelation function. To test for linear
functional form, the statistician may graph the dependent with each of the independent
variables or he may run curve estimation regressions. For the detection and assessment
of influential outliers, he may extract the standardized residuals and run a frequencies
analysis upon them or obtain the leverage and Cook’s d statistics for additional tests
(Stata, S-Plus, SAS). Another way to assess the influence of the outliers is to use the dffit
or dfbeta options. Where these are available, the dffit, which show how much the outlier
influences the model fit while the dfbeta will reveal how much the outlier influences the
parameter estimate (PROC REG in SAS, in STATA, S-Plus). To test for nonnormality of
the residuals, he may examine a quantile-quantile plot or a Cramer-Von Mises,
Anderson-Darling, Kolmogorov-Smirnov or a Shapiro-Wilk test (available in SAS PROC
UNIVARIATE or STATA sktest). He may graph the residuals on with normal quantile-
quantile plot (SAS, STATA, LIMDEP, or S-PLUS). Should he wish to test the residuals
for heteroskedasticity, he can run a Cook-Weisberg test (see hettest in STATA) or a
White’s general test (in SAS PROC REG or a different version in PROC MODEL) on the
residuals. When tests for these assumptions do not pass muster or outliers plague the
model, robust regression or nonparametric regression analysis may provide a reliable
alternative. Before attempting a regression analysis, the researcher should run a sample
size test to be sure that he will have enough statistical power to test his hypotheses. His
sample size will have to be sufficiently larger than the number of independent variables
in his model. Indeed, it should be large enough so that his standard errors will be small
enough to test his hypotheses. If one, some, or all of the assumptions are not adequately
fulfilled, he should know what fixes are available to deal with residual problems. In this
article, we explain the applications of different types of robust regression analysis to
situations when the tests show that the assumptions are not adequately met.
Types of Robust Regression
Several popular statistical packages have procedures for robust regression
analysis. Among them are SAS, STATA, S-PLUS, LIMDEP, and E-Views. They will
need to know in which statistical package the type of robust regression appropriate for
that particular application can be found. There is a family of robust regression analysis
that replaces the sum of squared errors as the criterion to be minimized with one less
influenced by outliers. Least absolute deviations, sometimes called L1 regression, is
another method that seeks to minimize the influenced of outliers. Least median of
squares (Rouseseeuw, 1984) is one member of this family. Least trimmed means
(Rousseeuw and Leroy, 1987) is another approach. Weighted least squares can be
employed to serve this purpose. There is a method called iteratively reweighted least
squares using robust corrections for influence of the outliers (Mächler and Chatterjee,
1995). Another resistant regression method uses White variance estimators that are
consistent and efficient amidst heteroskedastic residuals of unknown form. There are
some nonparametric regression techniques that handle nonnormal distribution of the
errors, some of which are addressed later. We consider how robust regression analysis
attempts to handle these violations as well as some of the statistical package procedures
that can be applied to correct these problems.
Least Median of Squares
One of the earliest types of robust regression is called median regression,
which has the advantage of diminishing the influence of the residuals. According to
Venables and Ripley (1999), this algorithm minimizes the median of ordered squares of
residuals to obtain the regression coefficient, b:
Least median of squares (LMS )
= min median| y ? x b |
SAS8.2 users can call least median of squares with the LMS call in PROC IML.
S-Plus users can execute this algorithm with lmsreg. According to Martin (Martin,
2002), the median squared residuals lacks a smooth squared residual function and takes a
long time to converge
Least absolute deviation, sometimes called L1 or least absolute value (LAV)
regression, is also known as median regression. SAS users call this procedure with the
LAV command within the IML library. In STATA, median regression is performed with
the quantile regression (qreg) procedure. The procedure forms initial estimates from a
weighted least squares of absolute residuals. Then qreg estimates the quantile of the
dependent variable, or by default, the median, by taking the raw sum of absolute
deviations around the unconditional median. It finds the regression coefficient, b, that
minimizes this function. The procedure estimates a constant and parameter estimates that
predicts the median. For this reason, qreg is sometimes called median regression.
median regression estimates by
| y ? xb |
The computation of the standard errors comes from the estimation of the weighted
variance-covariance matrix for the regression coefficients. The weights are a function of
the quantile being estimated divided by the true density of the residual. Median
regression with bootstrapped standard errors will be in LIMDEP 8.
If influential outliers are not a serious problem, though the data are skewed and
nonnormal, then qreg may be the answer. For influential outliers may distort the results.
If the errors are homoskedastic, qreg may be applied with the caveat (STATA 7
Reference Q-ST) that it is reported to underestimate the standard errors for
heteroskedastic errors. If outliers do plague the data set, the STATA qreg procedure will
be sensitive to their distortion, whereas the STATA rreg procedure may be preferred
because it downweights their influence.
Least Trimmed Squares
One way to eliminate possible outliers is to run the analysis on trimmed or
winsorized distributions. Distributions that have their outliers trimmed prior to the
analysis are sometimes called trimmed means procedures. While trimming a distribution
means truncating it at the ? and 1- ? quantile (Wilcox, 1997), winsorizing a distribution
means setting the values at or more extreme than the ? quantile to that of the ? quantile on
one tail and setting those values at or more extreme than the 1- ? quantile to those of that
quantile. The trimmed or winsorized distribution is then estimated by minimizing the
sum of squared absolute residuals. By trimming the alpha rejection region, the distorting
effects of influential outliers could be pruned from the variables prior to processing. SAS
has provision in its INSIGHT or ANALYST module for trimming or winsorizing the
distribution to eliminate the outliers before the estimating the means. Winsorizing outliers
limits the values of outliers beyond a particular sigma limit away from a measure of
central tendency to those found at that particular sigma limit. Rousseeuw advocated
trimming by minimizing the sum of the absolute values of the squared residuals to obtain
the regression coefficient. He called this procedure least trimmed squares, as cited in
Venables and Ripley (1999).
Least trimmed squares (LTS )
? ? x b |
( i )
According to Rousseeuw, the LTS procedure is more efficient than the LMS or M
procedure. It has a smoother objective function that is “less sensitive to local effects” and
it has “the highest possible breakdown value (Rousseau and Van Driessen, 1998).”
The LTS procedure propounded by Rosseeuw and Leroy (1987) can be called
with the LTS function in SAS PROC IML. Although the original algorithm is time-
consuming, a newer fast-LTS, which takes h random subsets of large data sets (Rosseuw
and Van Driessen, 1998) has now become the default algorithm and is to become part of
PROC ROBUSTREG in version 9 of SAS. This algorithm “minimizes the sum of the h
smallest squared residua l (SAS OnLineDoc):
Least Trimmed Squares (SAS LTS) minimizes
? 1 h
? i:N ?
where h is defined to be within
the range of
+1 ? h ?
N = sample size
p = number of parameters
S-PLUS, however, contains this routine in its menu system so that the
minimization of the sum of squared absolute residuals is run over a subset of the
residuals. In this case q = [(n+p+1)/2], where p=number of parameters in the model and
n=sample size. S-Plus contains this algorithm in its ltsreg procedure. Although this
converges more smoothly and faster than the LMS algorithm, Martin (2002) maintains
that it has low Gaussian efficiency that seems to roughen with the trimming fraction. Be
that as it may, the slow convergence of the older procedure inclines developers toward
use of the fast-LTS algorithm.
Heteroskedasticity and Weighted least squares
Weighted least squares (WLS) estimation is available in all of these packages. To
compensate for heteroskedasticity, weights computed as the inverse of the variances of
the dependent variable, can be applied. The weights have to be computed as a separate
variable. When they are applied to the estimation, they compensate for the distorting
effect of the heteroskedasticity. At the same time, they may be an inverse function of the
leverage of the outlier to downweight the influence of outliers. Such fixed weights may
be applied as an option in the SAS PROC REG, LIMDEP Regress; wts=, as an option in
the lm function in S-PLUS, as well as in EVIEWS least squares and two-stage least
squares procedures. In STATA, wls is used to find the initial estimates of qreg. WLS
can also be applied in the variance weighted least squares procedure, called vwls. Once
the weights are included, WLS estimation takes place.
Heteroskedasticity and Autocorrelation Consistent Variance Estimation
To perform regression analysis resistant to heteroskedasticity, there are other
procedures. One of the ways to relax the requirement of homoskedasticity is to iteratively
apply a robust filter to the heteroskedasticity. Harold White in 1980 showed that for
asymptotic (large sample) estimation, the sample sum of squared error corrections
approximated those of their population parameters under conditions of heteroskedasticity
and yielded a heteroskedastically consistent sample variance estimate of the standard
errors (STATA 7 Reference Manual, Q-ST ).
R o b u s t V a r i a n c e =
( c )
( c )
' u k
( 5 )
k = 1
c = c l u s t e r s
u k = r o w
v e c t o r
o f s c o r e s k = 1 , . . . , K
V = v a r i a n c e m a t r i x
Thus, the robust White variance estimator rendered regression resistant to the
heteroskedasticity problem. LIMDEP can generate the White estimators with the het
option for the regress procedure, as can EVIEWS with its LS and TSLS options. Stata
generates them with the regress y x1 x2, …, robust command.
In STATA, there are additional corrections. Regress, robust in STATA uses a
degree of freedom correction of n/(n-k) times the error variance to improve the small
sample estimates. The hc2 option provides a leverage (h) correction for homoskedastic
models by dividing the error variance by (1 – h) while the hc3 option invokes a correction
for heteroskedastic models that divides the error variance by (1 –h)2.
White asymptotic covariance estimation can be performed with the ACOV option
in SAS PROC REG. When tests are performed using this option, the heteroskedastic
consistent standard errors are used (Meyer, 2002).
The Newey-West (1987) extended this heteroskedastically consistent variance
estimator to handle residual autocorrelation as well. The White and Newey West
estimators as well as their robust Generalized Methods of Moments analogs are available
in LIMDEP nonlinear least squares (NLSQ with the GMM option), STATA (regress
procedure with the robust option), LS, TSLS, ARCH, and GMM models in EVIEWS
with the HAC option, and SAS (using PROC MODEL with the GMM option).
Iteratively Reweighted Least Squares
Another robust regression analysis is performed with M (Maximum likelihood-
like) estimators. STATA begins regression analysis with computation of case weights
from scaled residuals. The scale is a median absolute deviation about the median
(MADAM) residual divided by a constant (Huber, 1981). If the residua l is small the case
weight is equal to one, but if it is larger the case weight is equal to tuning constant
divided by the absolute value of the scale. Other weights, called biweights, are applied
later. With biweights, all cases with residuals are downweighted and cases with large
residuals are assigned zero weights, thereby eliminating their influence altogether.
Because Huber weights sometimes have problems with extreme outliers and because bi-
weights sometimes have trouble converging on one solution, Huber weights are first used
in the early iterations and then biweights are used during later iterations of the regression
until final convergence is reached. STATA applies with estimation with the rreg
command (STATA7 Reference, Q-ST). SAS version 9 by default applies M estimation
to minimize a robust function of the residuals in its experimental ROBUSTREG
procedure. Although the Tukey weight is the default, one can employ other weights as an
alternative. The SAS ROBUSTREG procedure has a number of appealing features
including a nice menu of outlier diagnostics, which identify the outliers, Mahalanobis
distances, robust MCD distances, and their leverages. These diagnostics make it easy to
distinguish the good from the bad leverage points.
In Robust MM Regression, robust initial regression coefficients are used as
starting values. The robust regression coefficients are found by minimizing a scale
parameter, S, while ? may be one of several bounded loss functions that serves the
purpose of minimizing the empirical influence of troublesome residuals. ? is an integral
of ? (u) in the formula
? y ? ixb
?=(n ? p)?
? c0s ?
? (u) 6
=u ?3u + 3u ,u ?1
tuning constant 1.548
? = .5
The M-estimate is derived from according to the loss function from the S estimate
and the fixed scale estimate produced. The final M-estimate is computed and thus, the
MM-estimator proposed by Yohai, Stahel, and Zammer in 1991 is generated. A test of
bias that compares the M estimate to the least squares estimate is also produced. S-PLUS
contains the lmRobMM function, which generates the above procedure, cited in Venables
and Ripley (1999) and S-PLUS. 2000 Guide to Statistics, Vol. 1 (2000). The SAS
version 9 PROC ROBUSTREG also includes both M and robust MM Re gression.
Another method, utilizing iteratively reweighted least squares of a modified M-
estimator weighting to deal with the outlier and heteroskedasticity problems, was
developed by Prof. Samprit Chatterjee and Martin Mächler, respectively of NYU’s Stern
School and Swiss Federal Institute of Technology, have developed an iteratively
reweighted least squares approach that downweights influential residuals to achieve a fit.
Their case weights are updated at each iteration of the estimation process in accordance
with the following formula:
(1? h )
, i = 1,..., n
|, med | r ? | )
where me i
x = median( 1
x , x2,..., xn)
r = residual
h = leverage of case i
This process estimates parameters by minimizing the weights least squares
regression until convergence. Chatterjee and Mächler (1995) have tested their method on
a variety of difficult data sets and found that it performed well on all of them. They have
written SAS, S-Plus, and Minitab Macro code for their procedure (Chatterjee, 2002).
When distributional normality and homoskedasticity assumptions are violated,
many researchers resort to nonparametric bootstrapping methods. Where the errors are
independently distributed, bootstrapping can provide an empirical supplement to analytic
means of assessing parameters, errors, standard errors, levels of significance and
confidence intervals. Bootstrapping entails random resampling (according to Mooney
and Duval, 1993) to obtain the desired empirical distribution. Based on asymptotic
theory, bootstrapping estimation using one of several standard methods, one can
determine with great accuracy the empirical standard errors for such samples.
Forthcoming LIMDEP 8 (with linear regression via LAV with bootstrapped standard
errors, as well as the current versions of STATA 7 (bs, bstrap, or bsqreg), S-PLUS 6.0
(with its bootstrap suite of functions), and EVIEWS 4.0 (with its bootstrapping function)
all have procedures for resampling.
One advantage of bootstrapping is that some of the methods provide more
accurate confidence intervals than the convention asymptotic distribution approaches do.
As Mooney and Duval (1993) have it, the percentile bootstrap method empirically proves
to be the most accurate. Bsqreg in STATA and a forthcoming bootstrap procedure in the
new LIMDEP 8 will perform quantile regression with bootstrapped standard errors.
Those who would resort to the bootstrap must be aware that the bootstrapped
sampling distribution will reflect bias already in the sample, unless bias correction is
performed. Mooney and Durval (1993) warn that not much work has been done in that
area as of 1993.
William Cleveland proposed a robust scatterplot smoothing with a weighted least
squares algorithm that locally weights the data. Once the bandwidth is set, a regression is
performed that gives local weights the most influence. This locally weighted scatterplot
smoothing (LOESS) is available in many statistical packages, including PROC LOESS in
SAS and a “loess” procedure in S-Plus. These approaches lend themselves to nonnormal
data. STATA also performs LOWESS with “roblowes.” One needs only to search for
the keyword, lowess, and then download and install the available files. Even SPSS can
perform a LOWESS plot with a fit option to a scatterplot.
Robust spline smoothing can also be performed with S-PLUS, SAS, and STATA.
With the nonparametric spline smoothing, a function is formed that sums the squares of
the residuals plus some penalty function. The penalty function is a product of a
smoothing parameter times a measure of roughness, such as the integral of the square of
the second derivative of the function. The minimization of function yields the cubic
spline. With SAS, the spline smoothing can be performed with PROC LOESS or PROC
TPSPLINE for thin plate splines. SAS also has transformation regression in PROC
TRANSREG with a spline option for transforming variables. With the “spline” command,
or the “ksm” command coupled with the lowess option, STATA7SE can also perform
these procedures. In S-Plus, one can use the spline command to interpolate the data points
and when one selects the 2-D line and scatterplots, one can opt for spline smoothing.
The generalized additive model was proposed by Hastie and Tibshirani more than
a decade ago. In these models, the mean of the dependent variable is connected to the
predictors via a link function. The distribution of the response need not be normal; it can
be any member of the exponential family of distributions. These can include binomial
logistic functions. They can include poisson, inverse Gaussian, or quasi link functions.
Other terms may be added as well. The model is semi-parametric. The parametric part
iteratively smoothes the partial residuals as part of a process called backfitting. The
nonparametric part is smoothed with loess or splines. The gam procedures in S-Plus,
STATA, and PROC GAM in SAS are all general additive model procedures.
Kernel regression is a form of nonparametric regression. This procedure is
planned for release in LIMDEP 8. The user can choose from 8 types of kernels to apply
to his model. SAS has a procedure, PROC KDE, to do univariate or bivariate kernal
density estimation with a normal kernel. There is a movement toward providing options
to render ordinary statistical procedures more resistant against the effects of
heteroskedasticity and outliers. As these options become more available, the teaching of
the robust procedures as backups will naturally follow the teaching of the classical
procedures in the near future. As S-Plus provides in its robust library, the analysis will be
run with both procedures and when they agree, the classical procedures will suffice.
When the results disagree substantially, the robust procedures will be preferred.
To recapitulate, when fundamental regression analysis assumptions are violated,
the researcher may wish to consider what alternatives are available to him. If the data
contain influential outliers, then he may wish to employ some form or robust regression
that downweights the influence of the troublesome outliers. He could use least trimmed
squares, weighted least squares, iteratively reweighted least squares, a form of M or MM-
estimators or least median squares regression. If the data are heteroskedastic, the analyst
may wish to employ one of the weighted least squares, iteratively reweighted least
squares, MM-estimators, median regression or least median squares estimation options. If
the data are nonnormal, he may wish to use least absolute deviation, iteratively
reweighted least squares, median regression, or least median squares estimation. He may
compare his robust results to his classical results. He may also wish to estimate his bias
from his parametric estimate with bootstrapping. When there are substantial differences,
Martin and others maintain that the robust statistics yield better estimation and prediction.
These are some of the ways by which the analyst can fortify his model against unruly
Chatterjee, S. (2002) Personal Communication, March 26, 2002.
Chatterjee, S. and Mächler, M. (1995). Robust Regression: A Weighted Least Squares
Approach, Communications in Statistics, Theory and Methods, 26, 1381-1394.
Chen, Colin (2002). Robust Regression and Outlier Detection with the ROBUSTREG
procedure. SUGI Paper 265-27. SAS Institute: Cary, NC.
Draper, N. R. and Smith, H. (1966). Applied Regression Analysis, John Wiley & Sons,
New York, 342-344.
Eviews 4 User’s Guide (2000). Quantitative Micro Software, Irvine, Ca., 271-293, 381-
Greene, W.H. (2000). Econometric Analysis, 4th ed., Prentice Hall, Inc.: Saddle River,
NJ, Chapter 11.
Greene, W. H. (1997). LIMDEP7 User’s Manual. Econometric Software, Inc.