Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: (CDK) (AsyncRetriever) - Improve UX on variable naming and interpolation #368

Conversation

bazarnov
Copy link
Contributor

@bazarnov bazarnov commented Feb 25, 2025

What

Resolving:

How

  • Renaming urls_extractor to download_target_extractor.
  • Renaming url_requester to download_target_requester.
  • Changing the interpolation context for polling_requester.path and download_requester.path to use creation_response and download_target respectively, instead of stream_slice.

User Impact

This update is considered a Breaking Change for the Customers who implemented the AsyncRetriever currently; and might require customers to adjust their code as per the new naming conventions and interpolation contexts.

Customers, who use the AsyncRetriever should change the interpolation they use as follows:

  • for polling_requester.path:
    • we switched to use the creation_response within the interpolation context, instead of relying on the stream_slice context.

      Before:

       `{{ stream_slice['create_job_response'].json()['id'] }}`
      

      After:

      `{{ creation_response['id'] }}`
      
  • for download_requester.path:
    • we switched to use the download_target within the interpolation context, instead of relying on the stream_slice context.

      Before:

       `{{stream_slice.extra_fields['url']}}` 
      

      After:

      `{{download_target}}`
      
  • for urls_extractor:
    • the name of the component has changed

      Before:

      `urls_extractor`
      

      After:

       `download_target_extractor`
      
  • for url_requester:
    • the name of the component has changed

      Before:

      `url_requester`
      

      After:

       `download_target_requester`
      

Summary by CodeRabbit

  • New Features

    • Enhanced asynchronous job processing by introducing new response fields that capture job creation, polling results, and final download targets.
  • Refactor

    • Updated naming conventions to clearly differentiate download targets from generic URLs, streamlining the asynchronous data retrieval flow.
  • Documentation

    • Revised diagrams and descriptions to reflect the updated asynchronous workflow and improved terminology for clearer user insights.

@bazarnov bazarnov self-assigned this Feb 25, 2025
@github-actions github-actions bot added bug Something isn't working security labels Feb 25, 2025
Copy link
Contributor

coderabbitai bot commented Feb 25, 2025

📝 Walkthrough

Walkthrough

This pull request updates the asynchronous job handling by enhancing the declarative schema and renaming key extractor references. Three new fields—creation_response, polling_response, and download_target—are added to the schema. Across the codebase, the field and variable urls_extractor is renamed to download_target_extractor in the schema, models, factory, requesters, and tests. Minor updates in HTTP request handling and documentation ensure that the extraction of the final download target from asynchronous responses is clearly defined.

Changes

File(s) Change Summary
airbyte_cdk/.../declarative_component_schema.yaml Added new fields: creation_response, polling_response, download_target; renamed urls_extractor to download_target_extractor in the schema definitions.
airbyte_cdk/.../models/declarative_component_schema.py
airbyte_cdk/.../parsers/model_to_component_factory.py
Renamed the urls_extractor attribute to download_target_extractor in the AsyncRetriever and updated component creation logic accordingly.
airbyte_cdk/.../requesters/http_job_repository.py
airbyte_cdk/.../requesters/http_requester.py
airbyte_cdk/.../requesters/README.md
Updated the AsyncHttpJobRepository to use download_target_extractor, simplified path extraction (switching to direct references such as creation_response['id'] and download_target), and adjusted the README sequence diagram terminology; added a type ignore for backoff strategies.
unit_tests/.../test_model_to_component_factory.py
unit_tests/.../test_http_job_repository.py
unit_tests/.../test_concurrent_declarative_source.py
Modified test configurations by renaming config keys to download_target_extractor and updating path references in polling and download requesters.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant JobRepository
    participant JobServer
    participant Extractor

    Client->>JobRepository: Initiate async job request
    JobRepository->>JobServer: Send creation request
    JobServer-->>JobRepository: Return creation_response
    JobRepository->>JobServer: Poll using creation_response
    JobServer-->>JobRepository: Return polling_response
    JobRepository->>Extractor: Extract download target from polling_response
    Extractor-->>JobRepository: Return download_target
Loading

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • maxi297 (wdyt?)
  • aaronsteers (wdyt?)

Tip

CodeRabbit's docstrings feature is now available as part of our Pro Plan! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

128-134: Enhanced interpolation context with extra fields from stream_slice

This change allows the path generator to utilize additional contextual information from the stream_slice's extra_fields, which enables more flexible path generation. This supports the improved variable naming and interpolation UX goals of the PR.

Would adding a debug log message when extra fields are included in the interpolation context be helpful for troubleshooting? Something like LOGGER.debug(f"Including extra fields in interpolation context: {stream_slice.extra_fields}") wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)

3673-3687: Documentation added for new interpolation variables

The documentation for the new interpolation variables is thorough and includes helpful examples. This will make it easier for users to understand how to use these variables in their configurations. One small thing - would the download_target examples be better without the URL prefix in the JSON examples? Something like "https://example.com/download/file.csv" instead of url: "https...", wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dbb6fe and f5b256e.

📒 Files selected for processing (9)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (4 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2 hunks)
  • airbyte_cdk/sources/declarative/requesters/README.md (2 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_job_repository.py (4 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_requester.py (2 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3 hunks)
  • unit_tests/sources/declarative/requesters/test_http_job_repository.py (3 hunks)
  • unit_tests/sources/declarative/test_concurrent_declarative_source.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (24)
airbyte_cdk/sources/declarative/requesters/README.md (3)

4-5: Updated naming convention for better semantic clarity

The change from urls_extractor to download_target_extractor improves the clarity of the component's purpose, making it more intuitive for users to understand that this extractor is specifically responsible for retrieving download targets from the polling response.


28-28: Simplified interpolation context for polling requester

The use of creation_response in the interpolation context makes the path more intuitive and aligns with the breaking changes described in the PR objectives.


35-35: Improved interpolation context for URL requester

Now using polling_response in the interpolation context, which is more clear and better reflects the data source.

airbyte_cdk/sources/declarative/requesters/http_requester.py (1)

88-88: Added type ignore comment for backoff_strategies

The type ignore comment addresses type checking issues with the backoff_strategies attribute. This is a safe addition that prevents type checking errors without affecting runtime behavior.

unit_tests/sources/declarative/test_concurrent_declarative_source.py (2)

302-302: Updated component name to download_target_extractor

This change aligns with the renaming in other parts of the codebase, ensuring consistency with the new naming convention.


310-310: Simplified path reference using creation_response

The path for polling_requester has been updated to use creation_response['id'] directly, which is more straightforward than the previous complex reference to stream_slice. This improves readability and matches the PR objectives.

airbyte_cdk/sources/declarative/requesters/http_job_repository.py (5)

46-46: Renamed attribute to download_target_extractor

This change is consistent with the naming updates in other files and better reflects the component's purpose.


219-223: Updated extra_fields to use download_target

The field name has been changed from 'url' to 'download_target', which aligns with the new naming convention and improves consistency throughout the codebase.


275-280: Enhanced stream slice with creation_response

Now extracting and including the creation_response in the extra_fields of the StreamSlice, which allows for more direct referencing in path interpolation.


287-292: Updated stream slice with polling_response

Similar to the creation_response change, this modification extracts the polling_response and includes it in the extra_fields, improving the UX for variable naming and interpolation.


300-300: Updated method call to use download_target_extractor

This change completes the renaming of the urls_extractor to download_target_extractor, ensuring consistency across the codebase.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

2760-2762: Rename from 'urls_extractor' to 'download_target_extractor' looks good!

The variable has been consistently renamed to match the PR objectives, which improves naming clarity by better reflecting what this component actually extracts (a download target rather than just URLs).


2772-2773: Consistent parameter rename in job_repository creation

The parameter name has been properly updated in the AsyncHttpJobRepository instantiation to use the renamed component, completing the renaming from urls_extractor to download_target_extractor.

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1782-1784: New interpolation variables added for AsyncRetriever UX improvement

These new variables (creation_response, polling_response, and download_target) provide more explicit and direct access to the data needed for path templating. This simplifies variable naming and improves clarity compared to the previous nested approach using stream_slice. Nice improvement to the developer experience!


3229-3229: Component renamed from urls_extractor to download_target_extractor

This rename makes the component's purpose more clear - it's extracting the download target rather than generic URLs. The name better aligns with its actual functionality, improving readability and maintainability. Good change!

unit_tests/sources/declarative/requesters/test_http_job_repository.py (3)

72-72: Path updated to use new creation_response variable

The path has been updated to use the new creation_response['id'] variable instead of the more verbose stream_slice['create_job_response'].json()['id']. This aligns with the PR objectives and makes the code more readable.


87-87: Path updated to use new download_target variable

The path has been updated to use the new download_target variable directly rather than accessing it through stream_slice.extra_fields['url']. This simplifies the template and improves clarity.


146-146: Parameter renamed from urls_extractor to download_target_extractor

The parameter has been renamed to match the component name changes in the schema file. This ensures consistency across the codebase and better reflects the component's purpose.

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (6)

3610-3610: The rename from urls_extractor to download_target_extractor improves clarity.

This change makes the property name more descriptive and clearly conveys its purpose - extracting the download target rather than generic "urls". This aligns with the PR objectives to improve variable naming.


3618-3618: Path interpolation simplified with creation_response.

The path interpolation now uses {{creation_response['id'] }} instead of what was likely a more complex nested structure like {{stream_slice['create_job_response'].json()['id'] }}. This is more readable and aligns with the PR objectives to improve UX on variable interpolation.


3638-3638: Direct download_target reference improves readability.

Using {{download_target}} instead of a more complex structure like {{stream_slice.extra_fields['url']}} makes the template more intuitive and straightforward, wdyt?


3644-3644: Consistent use of download_target in abort_requester path.

The abort_requester path now uses the same simplified download_target variable pattern as the download_requester, creating consistency across the code.


3650-3650: Simplified variable usage in delete_requester path.

The delete_requester path follows the same pattern improvement as the download and abort requesters, using the clearer download_target variable.


3684-3684: Updated assertion to match renamed property.

The assertion property name has been updated to match the renamed property from urls_extractor to download_target_extractor, ensuring tests remain consistent with the implementation.

@bazarnov bazarnov requested review from maxi297 and lmossman February 25, 2025 20:57
Copy link
Contributor

@lmossman lmossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had one comment, otherwise the declarative_component_schema changes look good to me! I'll leave the python changes to @maxi297 to review

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking because it seems functionally good, mostly renaming.

Should we rename url_requester to download_target_requester? Also, was there a plan to update source-amazon-ads, source-amazon-seller-partner, source-salesforce and source-sendgrid?

@bazarnov
Copy link
Contributor Author

Also, was there a plan to update source-amazon-ads, source-amazon-seller-partner, source-salesforce and source-sendgrid?

I wasn't planning to do it myself, but this is definitely need to be done.

@bazarnov
Copy link
Contributor Author

Should we rename url_requester to download_target_requester?

I think we can do this within this PR as well, on it now.

@bazarnov bazarnov requested review from maxi297 and lmossman February 26, 2025 12:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/requesters/http_job_repository.py (1)

283-294: Enhanced _get_download_targets method with more explicit response handling.

The refactored method now clearly captures and passes the polling response in the extra_fields, making it available for interpolation. One minor suggestion: should we add a type annotation for the return type (e.g., -> Iterable[str]) to further enhance code clarity? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8071e89 and 0582aed.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (5 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
  • airbyte_cdk/sources/declarative/requesters/README.md (3 hunks)
  • airbyte_cdk/sources/declarative/requesters/http_job_repository.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte_cdk/sources/declarative/requesters/README.md
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Check: 'source-pokeapi' (skip=false)
  • GitHub Check: Check: 'source-amplitude' (skip=false)
  • GitHub Check: Check: 'source-shopify' (skip=false)
  • GitHub Check: Check: 'source-hardcoded-records' (skip=false)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (11)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

2747-2756: Rename from url_requester to download_target_requester looks good!

This change aligns well with the PR objective to improve variable naming. The new name more clearly communicates the purpose of this component for handling download targets rather than just URLs, which should make the code more intuitive to work with.


2760-2762: Rename from urls_extractor to download_target_extractor is consistent.

This change maintains naming consistency with the other renamed variables and clearly communicates that this component extracts download targets, not just URLs. Good improvement!


2767-2773: Constructor parameter names updated consistently.

You've correctly updated all references in the job_repository initialization to match the new variable names. This maintains consistency throughout the code.


2747-2773: Reminder: This is a breaking change per PR description.

As noted in the PR objectives, this is a breaking change for customers using AsyncRetriever. Users will need to update their interpolation method to utilize creation_response instead of stream_slice for polling_requester.path and download_target instead of stream_slice for download_requester.path.

Does this change need any additional documentation updates beyond what's in the PR description to assist users with migration? If needed, we could provide some configuration examples showing before/after for common use cases, wdyt?

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)

1782-1784: Nice addition of interpolation contexts for the AsyncRetriever component.

These new interpolation contexts (creation_response, polling_response, and download_target) directly align with the breaking changes mentioned in the PR objectives, making the path variable configuration more intuitive. This enhances clarity by providing direct access to specific responses rather than digging through nested stream_slice structures.


3246-3247: Renaming from urls_extractor to download_target_extractor improves clarity.

This renaming better describes the component's actual purpose - extracting the final download target from the completed job response. The more precise terminology will make it easier for users to understand the component's function.


3673-3687: Well-documented new interpolation contexts with helpful examples.

The documentation clearly explains each new interpolation context with appropriate examples. The example for download_target is particularly good as it shows a realistic URL structure that users might encounter.

airbyte_cdk/sources/declarative/requesters/http_job_repository.py (4)

46-47: Renamed attributes align with schema changes for better consistency.

The renaming from urls_extractor to download_target_extractor and url_requester to download_target_requester maintains consistency with the changes made in the schema definition. This approach makes the codebase more maintainable by using consistent naming across all files.

Also applies to: 52-54


214-223: Updated method call and variable name in fetch_records for consistency.

The changes from _get_download_urls to _get_download_targets and updating the extra_fields to use "download_target" instead of "url" reflect the renaming pattern throughout the codebase. This provides a better UX by using more descriptive variable names.


275-280: Improved data structure in _get_create_job_stream_slice with explicit response capture.

Extracting and including the creation response in the extra_fields makes it directly accessible for interpolation in downstream components. This is a cleaner approach than relying on the nested structure within stream_slice.


296-300: Updated error message and extraction logic to match new naming.

The error message now correctly references download_target_requester instead of url_requester, and the extraction uses download_target_extractor for consistency. This code is now well-aligned with the renaming pattern throughout the codebase.

@bazarnov bazarnov merged commit a402288 into main Feb 26, 2025
26 checks passed
@bazarnov bazarnov deleted the baz/cdk/async-retriever-add-more-interpolation-change-url-extractor-to-download-extractor branch February 26, 2025 13:06
rpopov added a commit to rpopov/airbyte-python-cdk that referenced this pull request Mar 5, 2025
* main:
  fix: update cryptography package to latest version to address CVE (airbytehq#377)
  fix: (CDK) (HttpRequester) - Make the `HttpRequester.path` optional (airbytehq#370)
  feat: improved custom components handling (airbytehq#350)
  feat: add microseconds timestamp format (airbytehq#373)
  fix: Replace Unidecode with anyascii for permissive license (airbytehq#367)
  feat: add IncrementingCountCursor (airbytehq#346)
  feat: (low-code cdk)  datetime format with milliseconds (airbytehq#369)
  fix: (CDK) (AsyncRetriever) - Improve UX on variable naming and interpolation (airbytehq#368)
  fix: (CDK) (AsyncRetriever) - Add the `request` and `response` to each `async` operations (airbytehq#356)
  fix: (CDK) (ConnectorBuilder) - Add `auxiliary requests` to slice; support `TestRead` for AsyncRetriever (part 1/2) (airbytehq#355)
  feat(concurrent perpartition cursor): Add parent state updates (airbytehq#343)
  fix: update csv parser for builder compatibility (airbytehq#364)
  feat(low-code cdk): add interpolation for limit field in Rate (airbytehq#353)
  feat(low-code cdk): add AbstractStreamFacade processing as concurrent streams in declarative source (airbytehq#347)
  fix: (CDK) (CsvParser) - Fix the `\\` escaping when passing the `delimiter` from Builder's UI (airbytehq#358)
  feat: expose `str_to_datetime` jinja macro (airbytehq#351)
  fix: update CDK migration for 6.34.0 (airbytehq#348)
  feat: Removes `stream_state` interpolation from CDK (airbytehq#320)
  fix(declarative): Pass `extra_fields` in `global_substream_cursor` (airbytehq#195)
  feat(concurrent perpartition cursor): Refactor ConcurrentPerPartitionCursor (airbytehq#331)
  feat(HttpMocker): adding support for PUT requests and bytes responses (airbytehq#342)
  chore: use certified source for manifest-only test (airbytehq#338)
  feat: check for request_option mapping conflicts in individual components (airbytehq#328)
  feat(file-based): sync file acl permissions and identities (airbytehq#260)
  fix: (CDK) (Connector Builder) - refactor the `MessageGrouper` > `TestRead` (airbytehq#332)
  fix(low code): Fix missing cursor for ClientSideIncrementalRecordFilterDecorator (airbytehq#334)
  feat(low-code): Add API Budget (airbytehq#314)
  chore(decoder): clean decoders and make csvdecoder available (airbytehq#326)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants