-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow LMM to init GLMM #588
Conversation
Codecov Report
@@ Coverage Diff @@
## main #588 +/- ##
==========================================
+ Coverage 96.22% 96.23% +0.01%
==========================================
Files 28 28
Lines 2516 2523 +7
==========================================
+ Hits 2421 2428 +7
Misses 95 95
Continue to review full report at Codecov.
|
Thanks for doing this and for cleaning up some of our earlier code. This definitely looks worth pursuing. I'm not sure I understand the plots, particularly the objective axis. What is the scaling between the first and second plots? Where are the values for lmm-\beta on the second plot? The oscillating behavior for lmm-\beta\theta is alarming as is the difference in the eventual objective values for the different starting value methods. Lots of good stuff here to contemplate. |
P.S. Let me know when you want a review of this. |
@dmbates I messed up the first plot -- I was trying to see if I could get it all on one scale and log-scaled the objective on that one, but not on the second one. For the second plot, I think lmm-beta is overplotted by the glm-init. I'm a bit concerned by the difference in the final objectives as well -- it's 75 points of log-likelihood! I think part of it is that we're looking at a very large dataset, so tiny perturbations in the parameters can lead to a big change in the log-likelihood because even if the change in LL per observation is small, we're still summing over a huge number of observations. Taken all together, it seems that the random effects dominate optimization in larger models. Do you have a better name for the kwarg? I'm tending towards adding a note to the docstring that this feature and kwarg is experimental and may disappear/change without being considered a breaking change. Then we can merge and use this code for further experimentation elsewhere. |
I should have thought of a logarithm scale. Are you referring to the By the way, we currently have the initial value repeated because we start the structure with the initial value then also record the first iteration. If we want to allow ourselves breaking changes later we should fix that and maybe eliminate a few of the fields of the |
For this PR, I meant the For |
@dmbates I think this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for doing this.
Benchmarking suggests that the GLM inits we're currently using are the best for small models (although at a few hundred millisecond total difference in the fit, it's realistically a wash). But things get more interesting if we look at big models.
Here's an example from the English Lexicon project data -- I've split it into two plots because the scale changes pretty dramatically.
First 50 iterations (after LMM fitting when the LMM is used)

All successive iterations

Here's the dataframe showing the progress (wrapped in a zip file to make GitHub happy):
glmm_fitlog_by_init.arrow.zip
The relevant timing info
It could be interesting plot the norm of successive differences between the parameter vector so that we can see how much movement we're getting -- and maybe how much of that is change in beta and how much is change in theta.
@dmbates