Billy Goats Gruff

Tuesday, March 02, 2010

A brief comparison of OLS and ML estimation procedures and oh my fucking god I suck at math.

Mmm...rye toast and coffee. Good breakfast.

Alrighty, time to post profound shit from my brain. Ok. Here we go.

Ok...let's see...um....yeah...uh..

See, the problem with using Ordinary Least Squares regression with a dichotomous dependent variable (this is known as the Linear Probability Model) is that it yields impossible predictions. We know that Y must be bounded by zero and one, yet the OLS beta estimates can yield predictions for Y that exceed this known boundary.

That is why Maximum Likelihood Estimation is preferable to OLS when the dependent variable is dichotomous. MLE has a much easier time estimating a non-linear model, and in order to properly bound the dependent variable between zero and one, we need to use a non-linear model. So, MLE and OLS are quite different. OLS regression uses a least squares lost function; it seeks to minimize the sum of squared errors (i.e., it seeks a hyperplane that minimizes the distance between the observed values of variables and the hyperplane. For a two variable model, y=a + Bx + e, it seeks the "best fit" line through the data). This is an analytic process; by solving a system of simultaneous equations (one for each coefficient estimate), one can derive a unique solution for the coefficients in the model. (In the bivariate model above, that's the B).

The logic of Maximum Likelihood Estimation is different. MLE starts with the observed values of the data. It then says, ok, given these data, what values of a the population from which they are sampled would make these observed values most likely? This is not an analytic process. Instead, MLE uses numerical optimization to reach a solution. This is an itterative (i.e., trial and error) process that uses an algorithm to produce the beta coefficient values (the B from above) that maximize the likelihood function.

OLS is in a general family of estimation procedures known as General Method of Moments. MLE is another family, of which probit and logit regression are the most famous examples. The other large family is Bayesian estimation, which is a little more rigorous, but I don't know much about it. And I believe there's something called non-parametric estimation, but I don't know anything about it.

The big drawback to MLE is that its properties are all asymptotic. One needs quite large sample sizes for these asymptotic properties to really kick in (i.e., for one's estimates and hypothesis tests to be reliable). It also makes the coefficients much harder to interpret. In logit, for example, the dependent variable is not Y; it's the natural log of the odds that Y=1. So, a Beta coefficient must be interpreted as the change in the log of the odds that y=1 when X increases by 1 unit, holding the other variables constant at their means. So, to calculate a real marginal effect (what happens to y when x increases by 1 unit), one has to "back out" of the likelihood function, which can be kind of tedious. Compare that to the ease of the interpretation with OLS. If B=.5, then when x increases by one unit, y increases by .5. Easy, peezy, japaneezy.

Sigh....

I fuckin suck at statistics. It's goddamn hard and I never understand anything! why can't I just be a math genius and get it?! The more I learn, the less I understand (Henley, Don, 1989). Dear my statistics professor: I DON'T FUCKING GET IT!!!!!!

0 Comments:

Post a Comment

<< Home

Enter your Email


Preview | Powered by FeedBlitz

free html web counters
Bloomingdale's Shopping