Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary outcome, continuous treatment #377

Closed
lucasqcdh opened this issue Feb 8, 2022 · 3 comments
Closed

Binary outcome, continuous treatment #377

lucasqcdh opened this issue Feb 8, 2022 · 3 comments

Comments

@lucasqcdh
Copy link

Hi there,

Thanks so much for this great package! I'm looking into a problem in which my treatment is continuous and my outcome is binary, which is I think the exact opposite of most of the use-cases I find in literature, examples and documentation.

In fact, the linear_dataset has indeed the exact opposite as defaults defined:

dowhy.datasets.linear_dataset(beta, ..., treatment_is_binary=True, outcome_is_binary=False, ...)

But this just on the side, creating a dataset is not the issue here :)

So I was hoping to be able to use Double Machine Learning for this, and came across this github issue on EconML: py-why/EconML#204

Although I can't follow the full detail of the replies, it seems it's not straight forward to use DML for this use-case.

I also came across this in the documentation, but that seems to be only for binary treatment since the model_propensity should have a predict_proba method, see here

In any way, what kind of a model could I use best? Perhaps the Generalized Linear Model, see here

I found these two github issues related to using logistic regression:

#163 with a basic implementation
and #296 with some doubts whether things are working correctly under the hood.

Any help or guidance is greatly appreciated 🙏

@amit-sharma
Copy link
Member

amit-sharma commented Feb 14, 2022

@lucasqcdh actually binary outcome is one of the most popular cases in epidemiology and health! But I do agree our docs are thin on this usecase.

If you want to use DML, the suggestion in the EconML issue is to apply it on the scores between [0,1], not on the binary outcome. This is a common technique used by e.g., model explanation methods like SHAP. The problem changes to: How much does the treatment change the classification score (probability of class being 1)? Instead of change in the actual binary outcome.

If you'd like to stick to the binary outcome, here are your options:

  1. Use any of the propensity score-based methods. Binary outcome $Y$ will be interpreted as numeric 1 or 0 and you will get reasonable results. But this won't work for a continuous treatment.
  2. If treatment is continuous, your best bet is GLM with a logistic transformation.

Thanks for alerting to the open issue #296 on logistic. I will have a look at it this week and address it.

My personal opinion: I'd suggest you to use the classification score probability as your outcome and then use a method like DML. Two complications with using GML or another pure-prediction approach: 1) it may have bias due to putting all the confounders and treatment of interest together, 2) the estimate is quite unintuitive. if logistic or some other transformation needed to model the outcome, then you can no longer talk about additive effect: e.g., for logistic, you can no longer talk about the increase in outcome due to changing a treatment, rather you need to talk about log-odds or something like that.

@lucasqcdh
Copy link
Author

Hi @amit-sharma, thank you very much for your time and extensive reply!

binary outcome is one of the most popular cases in epidemiology and health!

ah, good to know, thanks! (and excuse for my ignorance on the topic)

My personal opinion: I'd suggest you to use the classification score probability as your outcome and then use a method like DML

sounds good, will try!

@tmorzade
Copy link

tmorzade commented Feb 2, 2025

Hello,

I am discovering DoWhy, and it sounds very interesting

I am facing a problem that consist in estimating the causal effect of a continuous treatment on a binary output

I fear that the backdoor.linear_regression just converts the boolean into 0 and 1, and considers it as a continuous quantity

I allow myself to comment this closed issue because the post it refers to, on DoWhy and EconML, are at least two years old

I would like to know what is the current best practice

ChatGPT suggest me to apply first a logistic regression, with treatment and confounders as features, to get scores and then apply back_door.linear_regression with scores as outcome

Is it the right way to solve this issue, please ?

Thank you in advance for your help,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants