Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sqlalchemy-redshift dependency from Amazon provider #42963

Closed
wants to merge 1 commit into from

Conversation

ashb
Copy link
Member

@ashb ashb commented Oct 12, 2024

sqlalchemy-redshift is unused. It is also not compatible with sqlalchemy>2, so good riddance!

Attempt no. 2 of #42830 after it got reverted in #42864

@gopidesupavan
Copy link
Member

@ashb is it possible to trigger full tests on this? The previous pr went through without any issues but later got issues when tests ran in ci. Thoughts?

@ashb
Copy link
Member Author

ashb commented Oct 12, 2024

Yeah, I'm hoping by editing more files it'll trigger full tests. If that didn't work I was going to investigate ways of triggering full tests.

@potiuk potiuk added the full tests needed We need to run full set of tests for this PR to merge label Oct 12, 2024
@potiuk potiuk closed this Oct 12, 2024
@potiuk potiuk reopened this Oct 12, 2024
@potiuk
Copy link
Member

potiuk commented Oct 12, 2024

Yeah, I'm hoping by editing more files it'll trigger full tests. If that didn't work I was going to investigate ways of triggering full tests.

It's a matter of setting "full tests needed" label and closing/reopen the PR. I just did it.

BTW. All the ways maintainers can interact with PRs via labels are described in https://github.com/apache/airflow/blob/main/dev/breeze/doc/ci/07_debugging.md - it explains all the labels you can use and what effect they have.

@potiuk potiuk added the disable image cache Disables cache when buidling CI images label Oct 12, 2024
@potiuk potiuk closed this Oct 12, 2024
@potiuk potiuk reopened this Oct 12, 2024
@potiuk
Copy link
Member

potiuk commented Oct 13, 2024

Can't rebase it myself - but you should rebase now this PR after I merged the fix for "package removal" in #42967 and this time I guess it should properly not have the sqlalchemy-redshift installed in CI image .

`sqlalchemy-redshift` is unused. It is also not compatible with sqlalchemy>2, so good riddance!
@potiuk
Copy link
Member

potiuk commented Oct 13, 2024

Seems that this time it failed on apt install of mysql client ... Happens from time to time when cache is disabled - their repo is rather unstable. I re-run it.

@potiuk
Copy link
Member

potiuk commented Oct 13, 2024

OK. Now we got what we wanted. The PR with caching disabled (i.e. removing the dependency) shows that it is actually used and the tests using it failed - showing that the dependency is actually used in openlineage so actually we need to talk to @mobuchowski and @kacpermuda and it's likely something related to #41632

@potiuk
Copy link
Member

potiuk commented Oct 13, 2024

The root cause seem to be that openlineage uses "redshift_connector" as sqlalchemy dialect internally - and while direct import is not present anywhere, it expects the library to be present as it contributes the "redshift_connector" dialect.

This is why we think the dependency is not used, but in fact it is.

@potiuk
Copy link
Member

potiuk commented Oct 13, 2024

And if you look at the summary of "Tests" job, in the CI output, you will see that the redshift connector has been removed:

https://github.com/apache/airflow/actions/runs/11314973292

For the future @ashb and others - just click on "Tests" in the list of checks and you will see the summary - which contains the diff in dependencies vs. main constraints - this is the diff vs. the last succesful main green build that pushed the constraints.

Screenshot 2024-10-13 at 23 25 08

@mobuchowski
Copy link
Contributor

mobuchowski commented Oct 14, 2024

The root cause seem to be that openlineage uses "redshift_connector" as sqlalchemy dialect internally - and while direct import is not present anywhere, it expects the library to be present as it contributes the "redshift_connector" dialect.

This is why we think the dependency is not used, but in fact it is.

Exactly, we use it to generate queries fitting the right db. We've migrated to this approach from earlier approach when we manually slapped SQL together via string concatenation.

IMO what could work is moving Redshift hook to use Postgres dialect - unless there is something breaking using Postgres dialect from working with Redshift now; it was used for this purpose before actual Redshift dialect.

BTW, IMO the fact that there are no tests other than OL breaking from this means that the get_sqlalchemy_engine for Redshift isn't tested, not that only OL uses Redshift dialect:

    def get_uri(self) -> str:
        """Overridden to use the Redshift dialect as driver name."""
        conn_params = self._get_conn_params()

        if "user" in conn_params:
            conn_params["username"] = conn_params.pop("user")

        # Compatibility: The 'create' factory method was added in SQLAlchemy 1.4
        # to replace calling the default URL constructor directly.
        create_url = getattr(URL, "create", URL)
        return str(create_url(drivername="redshift+redshift_connector", **conn_params))

@ashb
Copy link
Member Author

ashb commented Oct 16, 2024

@mobuchowski The hook itself doesn't use the SQLa get engine:

def get_conn(self) -> RedshiftConnection:
"""Get a ``redshift_connector.Connection`` object."""
conn_params = self._get_conn_params()
conn_kwargs_dejson = self.conn.extra_dejson
conn_kwargs: dict = {**conn_params, **conn_kwargs_dejson}
return redshift_connector.connect(**conn_kwargs)
(at least if I'm following the code right. Very possibly I'm not)

So this is a problem only for the OL integration I think.

We need to move away fro this sqla redshift integration as it doesn't support 2.0. Can you please take this on with some urgency please?

@mobuchowski
Copy link
Contributor

mobuchowski commented Oct 16, 2024

@ashb sure, I'll own that topic.

@mobuchowski
Copy link
Contributor

Merged in #43271

@ashb ashb deleted the remove-sqla-redshift-aws-provider branch October 25, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers disable image cache Disables cache when buidling CI images full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants