Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for svyglm #47

Closed
carlganz opened this issue Nov 3, 2016 · 13 comments
Closed

Support for svyglm #47

carlganz opened this issue Nov 3, 2016 · 13 comments

Comments

@carlganz
Copy link

carlganz commented Nov 3, 2016

Hello,

This is a wonderful package. In Stata, you can run a regression with survey data, and then run margins, but with this package it appears that you cannot currently run margins for survey data.

So in Stata you can do something like this:

* california health interview survey 2014 teen dataset
* http://healthpolicy.ucla.edu/chis/data/public-use-data-file/Pages/2013-2014.aspx
use teen

svyset [pw=rakedw0], jkrw(rakedw1-rakedw80, multiplier(1)) vce(jack) mse

svy: regress bmi_p i.srsex##i.raceh2t_p1 srage_p

margins i.srsex i.raceh2t_p1

Perhaps I am doing something wrong, but this won't work in R:

library(margins)
library(haven)
library(survey)
# as.data.frame because tibbles break survey package
chis14 <- as.data.frame(read_dta("teen.dta")) 

chis14.svy <- svrepdesign(data=chis14,
                          weights=~rakedw0,
                          repweights="rakedw[1-9]",
                          scale=1,rscales=1,
                          type='other',combined.weights=TRUE,
                          mse=TRUE)

mylm <- svyglm(bmi_p~factor(srsex)*factor(raceh2t_p1)+srage_p,design=chis14.svy)

summary(mylm)

margins(mylm)
# Error in eval(expr, envir, enclos) : object 'srsex' not found

Issue is that prediction::find_data looks for data in the wrong place for svyglm objects.

I believe it shouldn't be too difficult to get margins working with the survey package. Would you consider a pull request?

Thanks

@carlganz
Copy link
Author

carlganz commented Nov 3, 2016

I just noticed you actually added some support for svyglm today in the prediction package. That is a super odd coincidence.

@leeper
Copy link
Owner

leeper commented Nov 4, 2016

Great minds think alike! Can you let me know if this is now working with the dev version of prediction? It probably won't (I fear), so I'll try to work on it soon. This is definitely a top priority for me.

@carlganz
Copy link
Author

carlganz commented Nov 4, 2016

Unfortunately it did not work with the dev version of prediction. The issue is with this code in prediction::find_data:

data <- eval(model[["call"]][["data"]], env) 

With svyglm there is no data parameter, instead there is a design parameter. However, you don't actually need to go into call because the svyglm object holds the data used to generate it. For example:

# from examples
library(survey)
data(scd)
repweights<-2*cbind(c(1,0,1,0,1,0), c(1,0,0,1,0,1), c(0,1,1,0,0,1),
                    c(0,1,0,1,1,0))
scdrep<-svrepdesign(data=scd, type="BRR", repweights=repweights, combined.weights=FALSE)
x <- svyglm(arrests~alive,scdrep)

x$data # gives full scd data.frame
# also works for database backed survey objects
library(RSQLite)
dbclus1<-svydesign(id=~dnum, weights=~pw, fpc=~fpc,
                   data="apiclus1",dbtype="SQLite", 
                   dbname=system.file("api.db",package="survey"))

y <-svyglm(meals~api00,dbclus1)

y$data # gives data.frame with columns used including weights

I was able to get things to work by making prediction::find_data generic, and creating a method for svyglm, but you may be opposed to making prediction::find_data generic. I can make a PR if you'd like.

Regards

@leeper
Copy link
Owner

leeper commented Nov 4, 2016

Ooh, I like the idea of the making it generic, actually. That would likely simplify it moving forward as new model classes are supported. Send a PR?

@leeper
Copy link
Owner

leeper commented Nov 5, 2016

This appears to be working with latest dev versions of prediction and margins.

@leeper leeper added this to the CRAN Release milestone Nov 5, 2016
@carlganz
Copy link
Author

carlganz commented Nov 7, 2016

I find that factor/character variables cause errors. MRE:

library(margins)
library(survey)

data(api)

dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

# works without svy
margins(glm(api00~api99*sch.wide,data=apistrat))

# doesn't work with svy
margins(svyglm(api00~api99*sch.wide,design=dstrat))

# works with svy
margins(svyglm(api00~api99,design=dstrat))

# doesn't work with svy
margins(svyglm(api00~sch.wide,design=dstrat))

@carlganz
Copy link
Author

carlganz commented Nov 7, 2016

This appears to be a survey package issue:

library(survey)

data(api)

dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

svymodel <- svyglm(api00~sch.wide,design=dstrat)
# errors
predict(svymodel, data.frame(sch.wide=rep("No",10)))

regmodel <- glm(api00~sch.wide,data=apistrat)
# works
predict(regmodel,data.frame(sch.wide=rep("No",10)))

Issue is that the factor in the newdata needs to have more than one level to work (the levels don't even have to correspond to the levels used in the original data, it just needs more than one level), so this works:

library(survey)

data(api)

dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

svymodel <- svyglm(api00~sch.wide,design=dstrat)

predict(svymodel, data.frame(sch.wide=factor(rep("No",10),
                                             levels = c("No","random phrase"))))

We could add a line here: https://github.com/leeper/prediction/blob/master/R/prediction_methods.R#L67 to check that all factors have levels with length at least 1, and if they don't just add a level, but that feels like a hack. I am going to think about this some more.

@tslumley
Copy link

tslumley commented Nov 7, 2016

Version 3.31-5 of survey, now uploaded to r-forge, uses the $xlevels and $contrasts attributes of the objects so this should now all work.

@leeper
Copy link
Owner

leeper commented Nov 8, 2016

Thank you, @tslumley! Really appreciate it.

@carlganz Can you retest if you get a chance? I'm not able to look at this at the moment.

@carlganz
Copy link
Author

carlganz commented Nov 8, 2016

I can confirm everything works with the dev versions of margins and prediction, and version 3.31-5 of survey.

@leeper leeper closed this as completed in f0a99f7 Nov 9, 2016
@leeper
Copy link
Owner

leeper commented Nov 9, 2016

Excellent! Just checked it, as well, and it seems to be working. I've added versioned dependencies accordingly. Thank you for this!

@tzoltak
Copy link

tzoltak commented Jun 17, 2019

I wonder, weather design argument in svyglm method is needed. You can extract survey design from svyglm object itself (it's in "survey.design" element of a list that is svyglm object). Moreover survey design included in svyglm object is already restricted only to observations that were used in model estimation - this is important fact if there are missings in variables included in a model (and current implementation fails in such a situation).

@leeper
Copy link
Owner

leeper commented Jun 17, 2019

@tzoltak Can you open a new issue with an reproducible example (or just some more detail)? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants