01-24-2008, 02:33 PM | #21 |
Senior Member
Join Date: Nov 2006
Posts: 1,431
|
What Mike is doing is going to be Memory intensive. For both GLM and the logistic regressions all the covariates will need to be read into memory (the entire matrix of 35 million observations) and then inverted (well actually something like a choleski decompositon and then it will use the lower triangle matrix to solve). Actually, the logistic is going to use a non-linear minimization routine to solve but the principle is the same.
MW, in my profession people tend to use SAS for datasets this large (when we use transaction level stock data the datasets get this large). But that is because Stata sucks if there is not enough physical memory. SAS handles it better if there isn't. If you are using Stata you better have enough memory. Stata is not open source. Actually, MW, I just reread your description. How many observations on average will you have in your regressions? Last edited by pelagius; 01-24-2008 at 04:00 PM. |
02-05-2008, 12:57 PM | #22 |
Demiurge
Join Date: Aug 2005
Posts: 36,365
|
Well, I think my current computer pooped out when I told it to go through 25 passes per observation in an array statement, with 112 million observations.
<sigh> |
Bookmarks |
|
|