Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airbyte-cdk] Fix tab delimiter configuration in CSV file type #35901

Merged
merged 1 commit into from
Mar 13, 2024

Conversation

blarghmatey
Copy link
Contributor

What

Describe what the change is solving
This fixes use of the tab (\t) delimiter for CSV based file sources

How

The tab delimiter is only allowed to be a single character long. In the web UI entering a literal tab character (\t) yields an escaped tab character (\\t) in the stored configuration. This ensures that the escaped tab character is passed through to the call for csv.register_dialect as the proper literal tab \t.

Recommended reading order

  1. csv_format.py

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user?
The end result is that after the CDK change is published and affected sources (e.g. source-s3) are updated then the existing bug of not allowing tab separated files will work again.

For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.

If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Actions

Expand the relevant checklist and delete the others.

New Connector

Community member or Airbyter

  • Community member? Grant edit access to maintainers (instructions)
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Connector version is set to 0.0.1
    • Dockerfile has version 0.0.1
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • docs/integrations/<source or destination>/<name>.md including changelog with an entry for the initial version. See changelog example
    • docs/integrations/README.md

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Unit & integration tests added

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
Connector Generator
  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:generateScaffolds then checking in your changes
  • Documentation which references the generator is updated as needed
Updating the Python CDK

Airbyter

Before merging:

  • Pull Request description explains what problem it is solving
  • Code change is unit tested
  • Build and my-py check pass
  • Smoke test the change on at least one affected connector
    • On Github: Run this workflow, passing --use-local-cdk --name=source-<connector> as options
    • Locally: airbyte-ci connectors --use-local-cdk --name=source-<connector> test
  • PR is reviewed and approved

After merging:

  • Publish the CDK
    • The CDK does not follow proper semantic versioning. Choose minor if this the change has significant user impact or is a breaking change. Choose patch otherwise.
    • Write a thoughtful changelog message so we know what was updated.
  • Merge the platform PR that was auto-created for updating the Connector Builder's CDK version
    • This step is optional if the change does not affect the connector builder or declarative connectors.

@blarghmatey blarghmatey requested a review from a team March 7, 2024 21:34
Copy link

vercel bot commented Mar 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Mar 12, 2024 1:52pm

@octavia-squidington-iii octavia-squidington-iii added CDK Connector Development Kit community labels Mar 7, 2024
@blarghmatey
Copy link
Contributor Author

@marcosmarxm this is a follow-on to the changes from #35246

@blarghmatey
Copy link
Contributor Author

cc @dtiesling

@marcosmarxm
Copy link
Member

Thanks @blarghmatey I'm going to take a look today!

Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you bump the metadata version? This is a CDK change

@marcosmarxm marcosmarxm changed the title Fix tab delimiter configuration in CSV file type [airbyte-cdk] Fix tab delimiter configuration in CSV file type Mar 8, 2024
@blarghmatey blarghmatey requested a review from marcosmarxm March 8, 2024 18:28
@blarghmatey blarghmatey force-pushed the master branch 3 times, most recently from 5aabdc5 to 4bd1142 Compare March 11, 2024 13:36
@blarghmatey
Copy link
Contributor Author

@marcosmarxm do you think it will be possible to get this merged and included in a patch release of source-s3 today?

@marcosmarxm marcosmarxm requested a review from maxi297 March 11, 2024 13:38
@marcosmarxm
Copy link
Member

@maxi297 need your help here. @blarghmatey I'm going try to get this done asap.

@blarghmatey
Copy link
Contributor Author

Awesome, thanks! Let me know if there's anything I can help with.

@blarghmatey blarghmatey force-pushed the master branch 3 times, most recently from 0085576 to 32e92d1 Compare March 11, 2024 18:45
@marcosmarxm
Copy link
Member

@brianjlai can you take a look?

@blarghmatey blarghmatey force-pushed the master branch 2 times, most recently from ffa4d5f to a605491 Compare March 12, 2024 01:09
@brianjlai brianjlai self-requested a review March 12, 2024 03:16
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change makes sense and lines up with how we corrected "\\t"/r"\t" into \t in an older PR for legacy S3.

Thanks for the contribution @blarghmatey !

The tab delimiter is only allowed to be a single character long. In the web UI entering
a literal tab character (`\t`) yields an escaped tab character (`\\t`) in the stored
configuration. This ensures that the escaped tab character is passed through to the call
for `csv.register_dialect` as the proper literal tab `\t`.
@blarghmatey
Copy link
Contributor Author

@marcosmarxm it looks like your requested change might need to be updated for this to merge? Once this merges, what needs to be done to get the change published in source-s3?

@marcosmarxm
Copy link
Member

Running tests to get this merged. @brianjlai will handle the publish action to get this published.

@marcosmarxm marcosmarxm merged commit f679389 into airbytehq:master Mar 13, 2024
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants