Econ 304: Econometrics

Remember, the immediate focus of our classes right now is on learning to test hypotheses about the parameters of the underlying model from OLS regression estimates.

Today, you will

The expected value operator offers a way of identifying the mean and variance of a random variable. In most cases, these -- along with the general functional form of the probability density function -- tell us all we need to know to describe the probability distribution of a random variable.

Usually, however, we are in the position of trying to infer or estimate the mean and variance of the underlying stochastic process from our data. The general class of tools we use are called estimators. Our next step is to review the properties we'd like a good estimator to have.

With this background, we are ready to derive the probability distribution of the OLS regression coefficients -- a process that requires us to make some assumptions about the process that produced our data. This in turn, will require you to remember what you've learned about the standard normal probability distribution.

As we go through all of this detail, remember the goal. We need all this in order to use our data to test hypotheses about the processes that produced that data, i.e., to test the validity of the models produced by economic theory.

 

 

 

 

 

 

The ordinary least squares regression technique yields estimates of the true coefficient values a and b in the model

Yi = a + bXi + ei

Any function or rule that estimates the value of a coefficient or parameter is an estimator.

There are several desirable properties we would like an estimator to have. The mean of a good estimator should equal the parameter we are trying to estimate. An estimator with this property is unbiased. Most estimates generated by the estimator should be near the parameter value.

Consider a general parameter b and its estimator b.

Bias(b) = E[b] - b

If b is an unbiased estimator then

Bias(b) = 0; i.e., E[b] = b

Now, let's try to formalize nearness

b is an efficient estimator of b if

var[b] £ var[b*], where

b* is any other estimator of b

Problem: Consider the rather boring estimator 3?

One solution is to limit our attention to efficient unbiased estimators.

But, proving that an estimator has minimum variance among all unbiased estimators can be quite difficult. Often the best we can do is find an estimator that has minimum variance among all linear unbiased estimators. An estimator that has this property is the best linear unbiased estimator -- BLUE.

I can conceive of cases in which a biased but low variance estimator might be preferable to an unbiased inefficient estimator.

To capture this trade-off, consider the mean square error of b

MSE(b) = E[(b - b)2]

Show that

MSE(b) = Bias(b)2 + Var[b]

In those situations when we cannot find an unbiased estimator, we often can find a consistent estimator. b is a consistent estimator of b if the probability that b is extremely close to b rises as N becomes extremely large.

The OLS estimates a and b are estimators -- functions that yield estimates of a + b -- and random variables -- since they are functions of the residual ei, i = 1, 2, ... N, each of which is a random variable. To assess the properties of the estimators a and b and their probability distributions we need to make assumptions about the process that generates the data with which we are working. We start with a very restrictive model: the classical regression model. Much of the second half of this course will be spent considering what happens when we relax these assumptions.

 

 

 

 

 

Classical regression model assumptions

1) The model is correctly specified (after appropriate transformations)

Yi = a + b1X1i + b2X2i + ... bkXki + ei

with a and the b's unknown coefficients

2) E[ei] = 0, for all i -- otherwise

3) The X's are nonstochastic variables with fixed values

4) var[ei] = E[ei2] = s2 i.e., homoskedastic errors

5) The ei's are statistically independent, i.e., E[eiej] = 0

 

 

 

 

 

Using the classical regression model assumptions, we can show that the OLS estimators have a number of desirable properties.

The OLS estimators are unbiased. I'll show this for b in the two variable case, i.e.,

E[b] = b

You showed that

b = S(xiYi)/Sxi2, where x indicates the deviation from the mean.

Let ci = xi/Sxi2

Then, b = SciYi

substitute for Y: b = Sci(a + bXi + ei)

multiply through the parenthesis: b = aSci + bSciXi + Sciei

I claim the first term on RHS is zero.

I claim that SciXi = 1

Therefore, b = b + Sciei

E[b] = E[b + Sciei] = b + SciE[ei] = b, by the classical regression model

 

 

 

The classical regression model assumptions solve the otherwise knotty problem of determining the variance of the OLS coefficients. For the two variable case:

var[b] = E[(b - E[b])2]

using the previous result: var[b] = E[(b - b)2]

from the previous derivation: b = b + Sciei

Thus, var[b] = E[(b + Sciei - b)2] = E[(Sciei)2]

Can you appreciate, just how messy this term would be in general? Not only would we have hall the terms with ei2 in them, but we would also have to worry about all the cross product terms -- the eiej terms. But, under the classical regression model, statistical independence of the error terms ensures that E[eiej] = 0, so

E[(Sciei)2] = SE[(ciei)2] = Sci2E[ei2]

By the homoskedasticity assumption: E[ei2] = s2, so that

var[b] = s2Sci2

But Sci2 = S{xi/Sxi2}2 = Sxi2/(Sxi2)2 = 1/Sxi2

So, sb2 = var [b] = s2/Sxi2

var[a] is a bit uglier looking but the basic idea holds. Infact, for the general multiple regression case, using linear algebra, it can be shown that the variance-covariance matrix for the regression coefficients is s2(X'X)-1

a k x k matrix in which the diagonal elements are the variance of each coefficient and the off diagonal elements are the covariances among the the coefficients.

 

 

 

 

You may still not be very impressed with this variance, but I can say one more thing about it. Under the classical regression model, the OLS coefficient estimators are BLUE, i.e.,

var[b] £ var[b*]

where b* is any other linear unbiased estimator. The basic method for solving the problem is to define b* to be a linear function of the observations

b* = SdiYi

derive its variance and chose di to minimize that variance subject to the constraint that b* be unbiased, which turns out to require Sdixi = 1.

Minimization subject to a constraint is a Lagrange multiplier problem. If you want to take a crack at it in your "free time" I'd be happy to help you with it -- but it's not a problem I expect you to have mastered in this course.

The net result is that the optimal di that solves the minimization problem turns out to be the ci we used above in deriving the variance of b. So, b* is b.

 

 

 

 

 

Many discussions of the classical regression model just assume that ei has a normal probability distribution. But, it's not hard to persuade yourself that this is true by relying on

Central Limit Theorem: Suppose you have m random variables generated independently by an unknown probability distribution. As m becomes infinitely large, the probability distribution of the simple average of those random variables will become indistinguishable from the normal probability distribution.

Theorem: Any linear function of normally distributed random variables is a random variable with a normal distribution.

We say b ~ N(b,sb2)

f(b) = (1/2psb2)1/2exp{-(b - b)2/2sb2}

The distribution is symmetric and, although possible values of b run from positive to negative infinity, most of the probability of observing b is concentrated between 2 standard deviations above and below the mean of 0.

The integral of this pdf has no analytic form -- so probabilities for the normal distribution must be calculated through approximation. That's why we follow the tradition of relying on computer generated tables for the standard normal:

 

 

 

 

 

 

 

 

 

Z ~ N(0,1)

Claim: Z = (b - b)/sb

Can you show that E[(b - b)/sb] = 0?

Can you show that var[(b - b)/sb] = 1?

Pr(b < b0) = Pr(Z < (b0 - b)/sb)

Remember how to work with the standard normal statistical table, such as that on p. 603 of Pindyck and Rubinfeld?

The body of the table gives Pr(Z>Z*) for Z*>0.

Since the distribution is symmetric and centered at zero,

Pr(Z>0) = .5

Pr(Z>.75) = .227

Pr(Z>1.29) = ?

By symmetry

Pr(Z< -Z*) = Pr(Z>Z*)

Pr(Z< -1.1) = ?

Pr(|Z| > .7) = Pr(Z>.7) + Pr(Z<-.7) = 2*Pr(Z>.7) = 2*.242 = .484

Ok, now connect this with the regression equation.

Suppose b ~ N(3, 4),

find Pr(b > 4)

Pr(b>4) = Pr(Z > (4-3)/2 ) = Pr(Z > .5) = .309

Suppose b ~ N(-2, 9),

Pr(b>1) = ?

 

Recall that it was important to be able to find the Z* that solves

 

Pr(Z>Z*) = some upper tail probability

 

Pr(Z>Z*) = .25 --> .67<Z*<.68

Z* = .67 to two decimal places

 

Suppose b ~ N(3, 4).

Can you find b0 such that Pr(b > b0) = .05?

Z* = 1.645

(b0 - 3)/2 = 1.645

b0 = 3.29 + 3 = 6.29

 

 

 

 

 

 

 

This is lots of fun, but it assumes we know b and sb2

If we knew those values we wouldn't be estimating the regression.

Well, we can make and test hypotheses b, but in order to do hypothesis testing using the normal distribution we still need to know sb2.

The solution, you'll recall, involves the t-distribution. We'll introduce that distribution and talk about formal hypothesis testing next class.