The Inaugural Coase Lecture
An Introduction to Regression Analysis
Alan O. Sykes*
Regression analysis is a statistical tool for the investigation of re-
lationships between variables. Usually, the investigator seeks to
ascertain the causal e?ect of one variable upon another—the e?ect of
a price increase upon demand, for example, or the e?ect of changes
in the money supply upon the in?ation rate. To explore such issues,
the investigator assembles data on the underlying variables of
interest and employs regression to estimate the quantitative e?ect of
the causal variables upon the variable that they in?uence. The
investigator also typically assesses the “statistical signi?cance” of the
estimated relationships, that is, the degree of con?dence that the
true relationship is close to the estimated relationship.
Regression techniques have long been central to the ?eld of eco-
nomic statistics (“econometrics”). Increasingly, they have become
important to lawyers and legal policy makers as well. Regression has
been o?ered as evidence of liability under Title VII of the Civil
Rights Act of ????,? as evidence of racial bias in death penalty litiga-
tion,? as evidence of damages in contract actions,? as evidence of
violations under the Voting Rights Act,? and as evidence of damages
in antitrust litigation,? among other things.
In this lecture, I will provide an overview of the most basic tech-
niques of regression analysis—how they work, what they assume,
?Professor of Law, University of Chicago, The Law School. I thank Donna
Cote for helpful research assistance.
?See, e.g, Bazemore v. Friday, ??? U.S. ???, ??? (????).
?See, e.g., McClesky v. Kemp, ??? U.S. ??? (????).
?See, e.g., Cotton Brothers Baking Co. v. Industrial Risk Insurers, ??? F.?d
??? (?th Cir. ????).
?See, e.g., Thornburgh v. Gingles, ??? U.S. ?? (????).
?See, e.g., Sprayrite Service Corp. v. Monsanto Co., ??? F.?d ???? (?th Cir.
????).
Chicago Working Paper in Law & Economics
?
and how they may go awry when key assumptions do not hold. To
make the discussion concrete, I will employ a series of illustrations
involving a hypothetical analysis of the factors that determine indi-
vidual earnings in the labor market. The illustrations will have a
legal ? avor in the latter part of the lecture, where they will
incorporate the possibility that earnings are impermissibly in?uenced
by gender in violation of the federal civil rights laws.? I wish to
emphasize that this lecture is not a comprehensive treatment of the
statistical issues that arise in Title VII litigation, and that the
discussion of gender discrimination is simply a vehicle for expositing
certain aspects of regression technique.? Also, of necessity, there are
many important topics that I omit, including simultaneous equation
models and generalized least squares. The lecture is limited to the
assumptions, mechanics, and common di?culties with single-
equation, ordinary least squares regression.
?. What is Regression?
For purposes of illustration, suppose that we wish to identify and
quantify the factors that determine earnings in the labor market. A
moment’s re?ection suggests a myriad of factors that are associated
with variations in earnings across individuals—occupation, age, ex-
perience, educational attainment, motivation, and innate ability
come to mind, perhaps along with factors such as race and gender
that can be of particular concern to lawyers. For the time being, let
us restrict attention to a single factor—call it education. Regression
analysis with a single explanatory variable is termed “simple regres-
sion.”
?See ?? U.S.C. §????e-? (????), as amended.
?Readers with a particular interest in the use of regression analysis under Title
VII may wish to consult the following references: Campbell, “Regression Analysis
in Title VII Cases—Minimum Standards, Comparable Worth, and Other Issues
Where Law and Statistics Meet,” ?? Stan. L. Rev. ???? (????); Connolly, “The Use
of Multiple Rgeression Analysis in Employment Discrimination Cases,” ??
Population Res. and Pol. Rev. ??? (????); Finkelstein, “The Judicial Reception of
Multiple Regression Studies in Race and Sex Discrimination Cases,” ?? Colum. L.
Rev. ??? (????); and Fisher, “Multiple Regression in Legal Proceedings”, ??
Colum. L. Rev. ??? (????), at ???– ??.
A? Introduction to Regression Analysis
?
?. Simple Regression
In reality, any e?ort to quantify the e?ects of education upon
earnings without careful attention to the other factors that a?ect
earnings could create serious statistical di?culties (termed “omitted
variables bias”), which I will discuss later. But for now let us assume
away this problem. We also assume, again quite unrealistically, that
“education” can be measured by a single attribute—years of school-
ing. We thus suppress the fact that a given number of years in school
may represent widely varying academic programs.
At the outset of any regression study, one formulates some hy-
pothesis about the relationship between the variables of interest,
here, education and earnings. Common experience suggests that
better educated people tend to make more money. It further suggests
that the causal relation likely runs from education to earnings rather
than the other way around. Thus, the tentative hypothesis is that
higher levels of education cause higher levels of earnings, other
things being equal.
To investigate this hypothesis, imagine that we gather data on
education and earnings for various individuals. Let E denote educa-
tion in years of schooling for each individual, and let I denote that
individual’s earnings in dollars per year. We can plot this informa-
tion for all of the individuals in the sample using a two-dimensional
diagram, conventionally termed a “scatter” diagram. Each point in
the diagram represents an individual in the sample.
Chicago Working Paper in Law & Economics
?
The diagram indeed suggests that higher values of E tend to
yield higher values of I, but the relationship is not perfect—it seems
that knowledge of E does not su?ce for an entirely accurate predic-
tion about I.? We can then deduce either that the e?ect of education
upon earnings di?ers across individuals, or that factors other than
education in?uence earnings. Regression analysis ordinarily
embraces the latter explanation.? Thus, pending discussion below of
omitted variables bias, we now hypothesize that earnings for each
individual are determined by education and by an aggregation of
omitted factors that we term “noise.”
To re?ne the hypothesis further, it is natural to suppose that
people in the labor force with no education nevertheless make some
?More accurately, what one can infer from the diagram is that if knowledge of
E su?ces to predict I perfectly, then the relationship between them is a complex,
nonlinear one. Because we have no reason to suspect that the true relationship
between education and earnings is of that form, we are more likely to conclude
that knowledge of E is not su?cient to predict I perfectly.
?The alternative possibility—that the relationship between two variables is
unstable—is termed the problem of “random” or “time varying” coe?cients and
raises somewhat di?erent statistical problems. See, e.g., H. Theil, Principles of
Econometrics ???– ?? (????); G. Chow, Econometrics ???– ?? (????).
A? Introduction to Regression Analysis
?
positive amount of money, and that education increases earnings
above this baseline. We might also suppose that education a?ects in-
come in a “linear” fashion—that is, each additional year of schooling
adds the same amount to income. This linearity assumption is com-
mon in regression studies but is by no means essential to the appli-
cation of the technique, and can be relaxed where the investigator
has reason to suppose a priori that the relationship in question is
nonlinear.??
Then, the hypothesized relationship between education and
earnings may be written
I = ? + ?E + ?
where
? = a constant amount (what one earns with zero education);
? = the e?ect in dollars of an additional year of schooling on in-
come, hypothesized to be positive; and
? = the “noise” term re?ecting other factors that in?uence earn-
ings.
The variable I is termed the “dependent” or “endogenous” vari-
able; E is termed the “independent,” “explanatory,” or “exogenous”
variable; ? is the “constant term” and ? the “coe?cient” of the vari-
able E.
Remember what is observable and what is not. The data set
contains observations for I and E. The noise component ? is com-
prised of factors that are unobservable, or at least unobserved. The
parameters ? and ? are also unobservable. The task of regression
analysis is to produce an estimate of these two parameters, based
??When nonlinear relationships are thought to be present, investigators typi-
cally seek to model them in a manner that permits them to be transformed into
linear relationships. For example, the relationship y = cx? can be transformed into
the linear relationship log y = log c + ?•log x. The reason for modeling nonlinear
relationships in this fashion is that the estimation of linear regressions is much
simpler and their statistical properties are better known. Where this approach is
infeasible, however, techniques for the estimation of nonlinear regressions have
been developed. See, e.g., G. Chow, supra note ?, at ???– ??.
Chicago Working Paper in Law & Economics
?
upon the information contained in the data set and, as shall be seen,
upon some assumptions about the characteristics of ?.
To understand how the parameter estimates are generated, note
that if we ignore the noise term ?, the equation above for the rela-
tionship between I and E is the equation for a line—a line with an
“intercept” of ? on the vertical axis and a “slope” of ?. Returning to
the scatter diagram, the hypothesized relationship thus implies that
somewhere on the diagram may be found a line with the equation I
= ? + ?E. The task of estimating ? and ? is equivalent to the task of
estimating where this line is located.
What is the best estimate regarding the location of this line? The
answer depends in part upon what we think about the nature of the
noise term ?. If we believed that ? was usually a large negative num-
ber, for example, we would want to pick a line lying above most or
all of our data points—the logic is that if ? is negative, the true value
of I (which we observe), given by I = ? + ?E + ?, will be less than the
value of I on the line I = ? + ?E. Likewise, if we believed that ? was
systematically positive, a line lying below the majority of data points
would be appropriate. Regression analysis assumes, however, that
the noise term has no such systematic property, but is on average
equal to zero—I will make the assumptions about the noise term
more precise in a moment. The assumption that the noise term is
usually zero suggests an estimate of the line that lies roughly in the
midst of the data, some observations below and some observations
above.
But there are many such lines, and it remains to pick one line in
particular. Regression analysis does so by embracing a criterion that
relates to the estimated noise term or “error” for each observation. To
be precise, de?ne the “estimated error” for each observation as the
vertical distance between the value of I along the estimated line I = ?
+ ?E (generated by plugging the actual value of E into this equation)
and the true value of I for the same observation. Superimposing a
candidate line on the scatter diagram, the estimated errors for each
observation may be seen as follows:
A? Introduction to Regression Analysis
?
With each possible line that might be superimposed upon the data, a
di?erent set of estimated errors will result. Regression analysis then
chooses among all possible lines by selecting the one for which the
sum of the squares of the estimated errors is at a minimum. This is
termed the minimum sum of squared errors (minimum SSE) crite-
rion The intercept of the line chosen by this criterion provides the
estimate of ?, and its slope provides the estimate of ?.
It is hardly obvious why we should choose our line using the
minimum SSE criterion. We can readily imagine other criteria that
might be utilized (minimizing the sum of errors in absolute value,??
for example). One virtue of the SSE criterion is that it is very easy to
employ computationally. When one expresses the sum of squared
errors mathematically and employs calculus techniques to ascertain
the values of ? and ? that minimize it, one obtains expressions for ?
and ? that are easy to evaluate with a computer using only the ob-
??It should be obvious why simply minimizing the sum of errors is not an at-
tractive criterion—large negative errors and large positive errors would cancel out,
so that this sum could be at a minimum even though the line selected ?tted the
data very poorly.
Chicago Working Paper in Law & Economics
?
served values of E and I in the data sample.?? But computational
convenience is not the only virtue of the minimum SSE criterion—it
also has some attractive statistical properties under plausible as-
sumptions about the noise term. These properties will be discussed
in a moment, after we introduce the concept of multiple regression.
?. Multiple Regression
Plainly, earnings are a?ected by a variety of factors in addition to
years of schooling, factors that were aggregated into the noise term
in the simple regression model above. “Multiple regression” is a
technique that allows additional factors to enter the analysis sepa-
rately so that the e?ect of each can be estimated. It is valuable for
quantifying the impact of various simultaneous in?uences upon a
single dependent variable. Further, because of omitted variables bias
with simple regression, multiple regression is often essential even
when the investigator is only interested in the e?ects of one of the
independent variables.
For purposes of illustration, consider the introduction into the
earnings analysis of a second independent variable called “experi-
ence.” Holding constant the level of education, we would expect
someone who has been working for a longer time to earn more. Let
X denote years of experience in the labor force and, as in the case of
education, we will assume that it has a linear e?ect upon earnings
that is stable across individuals. The modi?ed model may be written:
I = ? + ?E + ?X + ?
where ? is expected to be positive.
??The derivation is so simple in the case of one explanatory variable that it is
worth including here: Continuing with the example in the text, we imagine that
we have data on education and earnings for a number of individuals, let them be
indexed by j. The actual value of earnings for the jth individual is Ij, and its esti-
mated value for any line with intercept ? and slope ? will be ? + ?Ej. The esti-
mated error is thus Ij – ? – ?Ej. The sum of squared errors is then ?j(Ij – ? –
?Ej)2. Minimizing this sum with respect to a requires that its derivative with re-
spect to ? be set to zero, or – 2?j(Ij – ? – ?Ej) = 0. Minimizing with respect to ?
likewise requires – 2?jEi(Ij – ? – ?Ej) = 0. We now have two equations in two
unknowns that can be solved for ? and ?.
A? Introduction to Regression Analysis
?
The task of estimating the parameters ?, ?, and ? is conceptually
identical to the earlier task of estimating only ? and ?. The di?er-
ence is that we can no longer think of regression as choosing a line
in a two-dimensional diagram—with two explanatory variables we
need three dimensions, and instead of estimating a line we are
estimating a plane. Multiple regression analysis will select a plane so
that the sum of squared errors—the error here being the vertical
distance between the actual value of I and the estimated plane—is at
a minimum. The intercept of that plane with the I-axis (where E
and X are zero) implies the constant term ?, its slope in the
education dimension implies the coe?cient ?, and its slope in the
experience dimension implies the coe?cient ?.
Multiple regression analysis is in fact capable of dealing with an
arbitrarily large number of explanatory variables. Though people lack
the capacity to visualize in more than three dimensions, mathematics
does not. With n explanatory variables, multiple regression analysis
will estimate the equation of a “hyperplane” in n-space such that the
sum of squared errors has been minimized. Its intercept implies the
constant term, and its slope in each dimension implies one of the
regression coe?cients. As in the case of simple regression, the SSE
criterion is quite convenient computationally. Formulae for the pa-
rameters ?, ?, ? . . . can be derived readily and evaluated easily on a
computer, again using only the observed values of the dependent and
independent variables.??
The interpretation of the coe?cient estimates in a multiple re-
gression warrants brief comment. In the model I = ? + ?E + ?X + ?,
? captures what an individual earns with no education or experience,
? captures the e?ect on income of a year of education, and ? captures
the e?ect on income of a year of experience. To put it slightly di?er-
ently, ? is an estimate of the e?ect of a year of education on income,
??The derivation may be found in any standard econometrics text. See, e.g., E.
Hanushek and J. Jackson, Statistical Methods for Social Scientists ???– ?? (????); J.
Johnston, Econometric Methods ???– ?? (?d ed. ????).
Chicago Working Paper in Law & Economics
??
holding experience constant. Likewise, ? is the estimated e?ect of a
year of experience on income, holding education constant.
?. Essential Assumptions and Statistical Properties of Regression
As noted, the use of the minimum SSE criterion may be de-
fended on two grounds: its computational convenience, and its desir-
able statistical properties. We now consider these properties and the
assumptions that are necessary to ensure them.??
Continuing with our illustration, the hypothesis is that earnings
in the “real world” are determined in accordance with the equation
I = ? + ?E + ?X + ?—true values of ?, ?, and ? exist, and we desire
to ascertain what they are. Because of the noise term ?, however, we
can only estimate these parameters.
We can think of the noise term ? as a random variable, drawn by
nature from some probability distribution—people obtain an educa-
tion and accumulate work experience, then nature generates a ran-
dom number for each individual, called ?, which increases or de-
creases income accordingly. Once we think of the noise term as a
random variable, it becomes clear that the estimates of ?, ?, and ? (as
distinguished from their true values) will also be random variables,
because the estimates generated by the SSE criterion will depend
upon the particular value of ? drawn by nature for each individual in
the data set. Likewise, because there exists a probability distribution
from which each ? is drawn, there must also exist a probability dis-
tribution from which each parameter estimate is drawn, the latter
distribution a function of the former distributions. The attractive
statistical properties of regression all concern the relationship be-
tween the probability distribution of the parameter estimates and the
true values of those parameters.
We begin with some de?nitions. The minimum SSE criterion is
termed an estimator. Alternative criteria for generating parameter es-
timates (such as minimizing the sum of errors in absolute value) are
also estimators.
??An accessible and more extensive discussion of the key assumptions of
regression may be found in Fisher, supra note ?.
Add New Comment
Showing 1 comment