Regression And Causation: A Critical Examination Of Six Econometrics Textbooks

Bryant Chen

University of California, Los Angeles (UCLA)

Judea Pearl

UCLA, Computer Science Department

September 10, 2013

Real-World Economics Review, Issue No. 65, 2-20, 2013


This report surveys six influential econometric textbooks in terms of their mathematical treatment of causal concepts. It highlights conceptual and notational differences among the authors and points to areas where they deviate significantly from modern standards of causal analysis. We find that econometric textbooks vary from complete denial to partial acceptance of the causal content of econometric equations and, uniformly, fail to provide coherent mathematical notation that distinguishes causal from statistical concepts. This survey also provides a panoramic view of the state of causal thinking in econometric education which, to the best of our knowledge, has not been surveyed before.


We surveyed the following textbooks:

Greene, W. Econometric Analysis. Pearson Education, New Jersey. 7th edition, 2012.

Hill, R., Griffiths, W., and Lim, G. Principles of Econometrics. John Wiley & Sons Inc. New York. 4th edition, 2011.

Kennedy, P. A Guide to Econometrics. Blackwell Publishers, Oxford. 6th edition, 2008.

Ruud, P. An Introduction to Classical Econometric Theory. Oxford University Press, Oxford. 1st edition, 2000.

Stock, J.; Watson, M. Introduction to Econometrics. Pearson Education, Massachusetts. 3rd edition, 2011.

Wooldridge. Introductory Econometrics: A Modern Approach. South-Western College Pub. 4th edition, 2009.




Regression And Causation: A Critical Examination Of Six Econometrics Textbooks – Introduction

The traditional and most popular formal language used in econometrics is the structural equation model (SEM). While SEMs are not the only type of econometric model, they are the primary subject of each introductory econometrics textbook that we have encountered. An example of an SEM taken from (Stock and Watson, 2011, p. 3) is modeling the effect of cigarette taxes on smoking. In this case, smoking, Y , is the dependent variable, and cigarette taxes, X, is the independent variable. Assuming that the relationship between the variables is linear, the structural equation is written Screenshot_1. Additionally, if X is statistically independent of Screenshot_2 often called exogeneity, linear regression can be used to estimate the value of Screenshot_3 the “effect coefficient”.

More formally, an SEM consists of one or more structural equations, generally written as Screenshot_4 in the linear case, in which Y is considered to be the dependent or effect variable, Screenshot_5 a vector of independent variables that cause Y , and Screenshot_6 a vector of slope parameters such that Screenshot_8 is the expected value of Y given that we intervene and set the value of X to x. Lastly, Screenshot_7 is an error term that represents all other direct causes of Y, accounting for the difference between Screenshot_9 and the actual values of Y 1. If the assumptions underlying the model are correct, the model is capable of answering all causally related queries, including questions of prospective and introspective counterfactuals2. For purposes of discussion, we will use the simplest case in which there is only one structural equation and one independent variable and refer to the structural equation as Screenshot_10.

The foundations for structural equation modeling in economics were laid by Haavelmo in his paper, “The statistical implications of a system of simultaneous equations” (Haavelmo, 1943). To Haavelmo, the econometric model represented a series of hypothetical experiments. In his 1944 paper, “The Probabilistic Approach in Econometrics”, he writes:

“What makes a piece of mathematical economics not only mathematics but also economics is, I believe, this: When we have set up a system of theoretical relationships and use economic names for the otherwise purely theoretical variables involved, we have in mind some actual experiment, or some design of an experiment, which we could at least imagine arranging, in order to measure those quantities in real economic life that we think might obey the laws imposed on their theoretical namesakes” (Haavelmo, 1944, p. 5).

Using a pair of non-recursive equations with randomized Screenshot_11, Haavelmo shows that Screenshot_12in the equation Screenshot_13is not equal to the conditional expectation, Screenshot_14, but rather to the expected value of Y given that we intervene and set the value of X to x. This “intervention-based expectation” was later given the notation Screenshot_15in (Pearl, 1995)3.

In the years following Haavelmo’s 1944 paper, this interpretation has been questioned and misunderstood by many statisticians. When Arthur Goldberger explained that Screenshot_12may be interpreted as the expected value of Y “if x were fixed,” Nanny Wermuth replied that since Screenshot_16, “the parameters… cannot have the meaning Arthur Golberger claims” (Goldberger, 1992; Wermuth, 1992).

(Pearl, 2012b) summarizes the debate in the following way: For statisticians like Wermuth, structural coefficients have dubious meaning because they cannot be expressed in the language of statistics, while for economists like Goldberger, statistics has dubious substance if it excludes from its province all aspects of the data generating mechanism that do not show up in the joint probability distribution.

Econometric textbooks fall on all sides of this debate. Some explicitly ascribe causal meaning to the structural equation while others insist that it is nothing more than a compact representation of the joint probability distribution. Many fall somewhere in the middle{ attempting to provide the econometric model with sufficient power to answer economic problems but hesitant to anger traditional statisticians with claims of causal meaning. The end result for many textbooks is that the meaning of the econometric model and its parameters are vague and at times contradictory.



See full PDF below.

1, 2  - View Full Page