numDeriv::jacobian is slow when applied to every observation #36

vincentarelbundock · 2016-08-16T19:32:38Z

Thanks for your work on this!

I was playing around with this cool package, and wondering why margins was so slow when applied to a small logistic glm with 100 observations. Basic profiling suggests that the function spends about 80% of its take taking the jacobian in the delta_once function.

I can't quite find my way around the code yet, but it looks like it's calculating the variance for every observation, which may not be necessary in every use case.

Any thoughts on how to speed things up? If you point me to the right direction, I may be able to PR something.

The text was updated successfully, but these errors were encountered:

leeper · 2016-08-16T20:13:36Z

Definitely. If you're only interested in the effects (without their variances), you can use the marginal_effects() function, which is quite fast on its own.

I've been meaning to add a toggle to turn of the calculation of unit-specific variances (and, in fact, I think it should probably be off by default since it's probably not needed in most cases). I'll make that change shortly.

vincentarelbundock · 2016-08-16T20:24:42Z

Sounds great. Thanks for the marginal_effects pointer. Will give it a spin.

leeper · 2016-08-18T10:04:33Z

Just pushed an update where this is set to FALSE by default - let me know what kind of performance improvements you see with it.

vincentarelbundock · 2016-08-21T23:36:34Z

Thanks!

I'm not quite sure what the expectation should be here, as I've never used Stata's margins command, but this still seems unreasonably long to me (at least for my kind of practical use, where I often go back and forth between data munging and estimating).

What's your expectation for reasonable time?

> library(tictoc)
> library(margins)
> N = 1000
> y = sample(c(0, 1), N, replace=TRUE)
> x = rnorm(N)
> mod = glm(y ~ x, family=binomial())
> tic()
> margins(mod)

       x 
0.005662 

> toc()
63.547 sec elapsed
>

leeper · 2016-08-22T06:43:23Z

What are the specs on your machine? I have not seen times like that.

vincentarelbundock · 2016-08-22T13:10:01Z

That was on a 27" imac bought in 2014. This morning, on my linux machine (AMD FX(tm)-8350 Eight-Core Processor + 32GB RAM) microbenchmark gives me:

> microbenchmark(margins(mod), times=3)
Unit: seconds
         expr      min       lq     mean   median       uq      max neval
 margins(mod) 29.87201 29.99878 30.40625 30.12555 30.67337 31.22119     3

leeper · 2016-08-22T13:24:32Z

Thanks. I think #37 will produce some (hopefully substantial) performance enhancements, so please hold out for that.

vincentarelbundock · 2016-08-22T13:30:38Z

Sounds good. I guess I'm just not sure why you're using numerical approximations at all when the derivative is well know (at least in the logit case).

leeper · 2016-08-22T14:27:52Z

Generality. If you scroll through the git history here, you'll see an approach using symbolic derivatives. It works in simple cases but not in many others, especially anytime any of the following occurs:

factor variables
I() expressions or similar transformations relevel(), center(), etc. in model formulae
some other edge cases

So, the choice is between an approach (symbolic derivatives) that only works in a set of cases for common models or an approach (numerical derivatives) that work for any formula and any model type. I think the latter is preferable because it will be possible to gradually optimize it once the code works as intended whereas the former approach simply won't work at all in lots of common cases.

vincentarelbundock · 2016-08-22T14:30:35Z

[insert thumbs up emoticon]

leeper · 2016-08-22T16:15:02Z

Indeed, this looks pretty good: 43fe90e

vincentarelbundock · 2016-08-22T16:43:10Z

Yeah, that's a whole other ballgame:

> library(microbenchmark)
> library(margins)
> N = 1000
> y = sample(c(0, 1), N, replace=TRUE)
> x = rnorm(N)
> mod = glm(y ~ x, family=binomial())
> microbenchmark(margins(mod), times=3)
Unit: milliseconds
         expr      min       lq    mean   median       uq      max neval
 margins(mod) 37.10477 37.64591 44.0599 38.18706 47.53747 56.88788     3

vincentarelbundock · 2016-08-22T16:44:14Z

nearly 800x faster

leeper · 2016-08-22T17:27:18Z

Nice.

leeper added enhancement question labels Aug 16, 2016

leeper added this to the CRAN Release milestone Aug 16, 2016

leeper self-assigned this Aug 16, 2016

leeper closed this as completed in 427f0b2 Aug 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

numDeriv::jacobian is slow when applied to every observation #36

numDeriv::jacobian is slow when applied to every observation #36

vincentarelbundock commented Aug 16, 2016

leeper commented Aug 16, 2016

vincentarelbundock commented Aug 16, 2016

leeper commented Aug 18, 2016

vincentarelbundock commented Aug 21, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

numDeriv::jacobian is slow when applied to every observation #36

numDeriv::jacobian is slow when applied to every observation #36

Comments

vincentarelbundock commented Aug 16, 2016

leeper commented Aug 16, 2016

vincentarelbundock commented Aug 16, 2016

leeper commented Aug 18, 2016

vincentarelbundock commented Aug 21, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

vincentarelbundock commented Aug 22, 2016

leeper commented Aug 22, 2016