fix: prevent raw decoder from closing before sync completion #366

ChristoGrab · 2025-02-25T18:07:47Z

What

We've run into an issue with the CsvDecoder closing files prematurely during syncs, leading to crashes when syncing and/or setting up sources:

ValueError: I/O operation on closed file

The problem seems to be that the connection is automatically closed once the last chunk is being processed, preventing the process from gracefully completing. By explicitly closing the connection AFTER we have yielded the data, we can ensure that the connection remains until all data has been parsed and read.

Note that not only csv fails with the same error but pandas as well which seems to indicate an issue regarding how urllib3 handles closing of IO rather than an issue with the Python csv standard lib.

Note

From docs
Sometimes you want to use io.TextIOWrapper or similar objects like a CSV reader directly with HTTPResponse data. Making these two interfaces play nice together requires using the auto_close attribute by setting it to False. By default HTTP responses are closed after reading all bytes, this disables that behavior:

Summary by CodeRabbit

Bug Fixes
- Improved streamed response handling to ensure that the data stream remains available during processing and is properly closed afterward, enhancing overall stability and resource management.
New Features
- Introduced a new test server class for testing CSV parsing capabilities of the CompositeRawDecoder.
- Added a new test function to validate the CSV parsing functionality in a controlled environment.
- Added a utility function to find available ports for testing purposes.

coderabbitai · 2025-02-25T19:11:06Z

📝 Walkthrough

Walkthrough

This pull request updates the decode method in the CompositeRawDecoder class to enhance resource management for streaming responses. It prevents the automatic closing of the raw response stream before parsing and ensures the stream is explicitly closed after parsing is complete. Additionally, it introduces new testing functionality to validate CSV parsing using a controlled server environment.

Changes

File	Change Summary
airbyte_cdk/.../composite_raw_decoder.py	Updated the `decode` method to set `response.raw.auto_close` to `False` before parsing and explicitly call `response.raw.close()` after parsing.
unit_tests/.../test_composite_decoder.py	Added `find_available_port` function, `TestServer` class, and `test_composite_raw_decoder_csv_parser_without_mocked_response` function for CSV parsing validation.

Possibly related PRs

feat: add Composite Raw Decoder #179: The changes in the main PR, which modify the decode method of the CompositeRawDecoder class, are directly related to the functionality introduced in the retrieved PR that adds the CompositeRawDecoder and its associated parsers.
fix: update csv parser for builder compatibility #364: The enhancements to the decode method are related to modifications in the parse method of the CsvParser class, as both involve handling the response stream and parsing logic within the same decoder framework.

Suggested reviewers

bazarnov
artem1205

Wdyt about adding any additional details or clarifications for the reviewers?

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1)
154-156: Good improvement to resource management!

This change properly addresses the premature closing issue by disabling auto_close and explicitly closing the response.raw stream after all parsing is complete. This should prevent the ValueError about I/O operations on closed files during sync operations.

Have you considered adding a try-finally block to ensure the raw stream always gets closed even if parsing raises an exception? Something like:
if self.is_stream_response():
    response.raw.auto_close = False
-    yield from self.parser.parse(data=response.raw)  # type: ignore[arg-type]
-    response.raw.close()
+    try:
+        yield from self.parser.parse(data=response.raw)  # type: ignore[arg-type]
+    finally:
+        response.raw.close()
This would ensure proper cleanup even in error scenarios. Wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4dbb6fe and 92c1914.

📒 Files selected for processing (1)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Check: 'source-shopify' (skip=false)

maxi297 · 2025-03-04T20:33:31Z

@maxi297 to add a test on that

maxi297 · 2025-03-04T21:22:02Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
1-258: ⚠️ Potential issue

Fix Ruff formatting issues in this file.

The pipeline is reporting Ruff formatting issues that need to be addressed:
[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.
Run ruff format on this file to fix the formatting issues and ensure the CI pipeline passes.

🧰 Tools

🪛 GitHub Actions: Linters

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.

🧹 Nitpick comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
215-226: Consider making the test more robust against port conflicts, wdyt?

The test correctly sets up a real HTTP server to verify the decoder doesn't prematurely close the response stream. However, using a hardcoded port (8080) might cause test failures if that port is already in use on the CI system or developer machine.

Could you consider using port 0 to let the OS assign an available port, then retrieve the assigned port?
-    httpd = HTTPServer(('localhost', 8080), TestServer)
+    httpd = HTTPServer(('localhost', 0), TestServer)
+    port = httpd.server_port
     thread = Thread(target=httpd.serve_forever, args = ())
     thread.start()
 
-    response = requests.get("http://localhost:8080", stream=True)
+    response = requests.get(f"http://localhost:{port}", stream=True)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92c1914 and 06ff322.

📒 Files selected for processing (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)

🧰 Additional context used

🪛 GitHub Actions: Linters

unit_tests/sources/declarative/decoders/test_composite_decoder.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-amplitude' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (3)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (3)

7-9: New imports support the HTTP server test implementation, looks good.

The addition of BaseHTTPRequestHandler, HTTPServer, and Thread imports supports the new test case that verifies the fix for premature raw decoder closing.

207-213: Simple test server implementation that serves CSV data, looks good!

This TestServer class provides a minimal HTTP server that returns a simple CSV response. It's a good approach for testing the decoder with a real HTTP response stream rather than just using mocks.

215-226: This test directly addresses the PR objective - good work!

This new test case directly verifies the fix for the issue described in the PR: "prevent raw decoder from closing before sync completion". By setting up a real HTTP server and response stream, it tests the actual behavior of the CompositeRawDecoder with CSV data in a way that would reveal if the file is closed prematurely.

The test is simple but focused on the specific issue, which is exactly what was needed based on the reviewer's comment requesting a test.

unit_tests/sources/declarative/decoders/test_composite_decoder.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
214-228: Good test for reproducing the closed file issue

This test does a great job at simulating the real-world scenario where ValueError: I/O operation on closed file was occurring. Using a real HTTP server instead of mocks is an excellent approach to catch the actual behavior.

A couple of suggestions to make this even more robust:

Would it be helpful to use a random port instead of hardcoded 8080 to avoid potential conflicts during testing? httpd = HTTPServer(('localhost', 0), TestServer) and then get the port with port = httpd.server_port

Maybe add an assertion on the actual content of the parsed result to ensure data integrity? Something like: assert result[0] == {'col1': 'val1', 'col2': 'val2'}

What do you think? Particularly for CI environments where port conflicts might arise.

227-228: Clean server shutdown implementation

Good job on the proper cleanup with both httpd.shutdown() and thread.join(). This ensures the test doesn't leave hanging threads or ports in use.

You might consider adding a timeout to the thread.join() to prevent tests from hanging if the server doesn't shut down properly, wdyt?
- thread.join()
+ thread.join(timeout=5)  # Add timeout to prevent hanging tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 06ff322 and f276a6e.

📒 Files selected for processing (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (7)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-amplitude' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)

🔇 Additional comments (4)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (4)

207-212: Well-structured test server implementation

Nice implementation of the TestServer class. It's simple and focused on just what's needed to test the CSV parsing functionality.

215-217: Clear docstring explaining the test's purpose

Great job clearly documenting the issue this test is addressing. The docstring effectively explains why this test is needed beyond the existing mocked tests.

219-221: Thread management looks good

The server setup and thread management approach is solid. You're properly creating the server and starting it in a background thread.

223-226: Core test functionality looks good

This is testing the key functionality - ensuring that the CSV parser can properly process a streamed response without encountering closed file errors.

artem1205 · 2025-03-05T10:19:16Z

We've run into an issue with the CsvDecoder closing files prematurely during syncs

during sync or during the test_read?

I can imagine that when we use test_tead and iterate by records (using next() we get the first N records and then wait) our connection is still alive and may be closed by the server.

@ChristoGrab , Could you share the link with this error in sync during the read?

artem1205 · 2025-03-05T15:02:27Z

UPD:

adding \n to the Maxime's test allows parser to work even without closing the response

csv.DictReader relies on newline characters (\n) to delimit rows.
If the last row in the CSV data doesn't end with a newline, csv.DictReader might attempt to read past the end of the data, leading to errors.

artem1205

LGTM!
approving to unblock, ~~we might need to introduce another buffer to the CSV parser in order to handle and check for \n at the end of the line.~~

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
1-267: 💡 Verification agent
🧩 Analysis chain

Fix formatting issues flagged by the pipeline.

The pipeline reports that Ruff formatting check failed. Please run 'ruff format' to fix the code style issues in this file.

🏁 Script executed:
#!/bin/bash
# Run formatter on the file to see what would change
ruff format --diff unit_tests/sources/declarative/decoders/test_composite_decoder.py
Length of output: 837
Action: Please run ruff format to adjust the file's formatting.

The diff from the formatter indicates that one file (specifically, around line 222 in unit_tests/sources/declarative/decoders/test_composite_decoder.py) would be modified by removing an extra newline. Could you run ruff format on this file to fix the style issues as flagged by the pipeline? wdyt?

🧰 Tools

🪛 GitHub Actions: Linters

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.

🧹 Nitpick comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)

207-212: Simple server implementation looks good, but could potentially be enhanced.

The TestServer class implements a minimal HTTP server that returns a CSV string without a trailing newline character, which appears intentional based on the test's docstring. This is a good approach for reproducing the specific issue.

Would you consider adding some basic error handling or logging to make debugging easier if the test fails? Just a thought - this is fine as is for the specific use case.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f276a6e and 5364059.

📒 Files selected for processing (2)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py (1 hunks)
unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

airbyte_cdk/sources/declarative/decoders/composite_raw_decoder.py

🧰 Additional context used

🪛 GitHub Actions: Linters

unit_tests/sources/declarative/decoders/test_composite_decoder.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Please run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms (10)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-amplitude' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Analyze (python)
GitHub Check: Pytest (Fast)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: SDM Docker Image Build
GitHub Check: preview_docs

🔇 Additional comments (3)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (3)

7-9: Appropriate imports for the new test functionality.

The added imports for HTTP server functionality and threading are well-aligned with the new test requirements. These are necessary to create a real HTTP server for testing the CSV parser without mocked responses.

214-224: Well-documented test case explaining the issue being fixed.

Excellent documentation explaining the rationale for this test and the specific conditions needed to reproduce the issue. This context is very valuable for future maintainers.

The commented-out CSV content is useful for reference, though I wonder if we might want to try reproducing with that exact content in a future test?

227-229: Consider using a dynamic port instead of hardcoded 8080.

Using port 8080 hardcoded could cause test failures if that port is already in use by another process. This could lead to intermittent CI failures that are hard to debug.

Would you consider using a dynamic port allocation approach instead? Maybe something like:
with socket.socket() as s:
    s.bind(('localhost', 0))
    port = s.getsockname()[1]
httpd = HTTPServer(("localhost", port), TestServer)
This would help prevent port conflicts, wdyt?

unit_tests/sources/declarative/decoders/test_composite_decoder.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

maxi297 · 2025-03-05T16:09:32Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)
231-234: Consider adding more assertions to validate CSV parsing.

The test currently only verifies that the result has a length of 1, but it doesn't check the actual content of the parsed CSV.

Would it be helpful to add more specific assertions to verify the parsed data structure? Perhaps:
  result = list(CompositeRawDecoder(parser=CsvParser()).decode(response))

  assert len(result) == 1
+ assert result[0] == {'col1': 'val1', 'col2': 'val2'}
This would ensure not just that some data was parsed, but that it was parsed correctly. wdyt?

225-237:
❓ Verification inconclusive

Consider adding a port finder to avoid conflicts.

Using a hardcoded port (8080) might lead to test failures if that port is already in use on the test system.

Would you consider implementing a dynamic port allocation? Something like:
- httpd = HTTPServer(("localhost", 8080), TestServer)
+ def find_free_port():
+     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+         s.bind(('', 0))
+         return s.getsockname()[1]
+ 
+ port = find_free_port()
+ httpd = HTTPServer(("localhost", port), TestServer)
This would make the test more reliable across different environments. What do you think?

🌐 Web query:
How to find a free port in Python for HTTP server testing?
💡 Result:

To find a free port in Python for HTTP server testing, there are several reliable methods depending on your use case:

1. Using sockets with OS-assigned ports

The most common method involves creating a socket and letting the OS assign a free port automatically via bind(('', 0)):
import socket

def get_free_port():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(('', 0))
        return s.getsockname()[1]
Advantages: Simple, returns an available port immediately[1][4]

Limitations: Port may be reclaimed by the OS after socket closure, creating a race condition[1][4]

Solution: Use the socket before closing it to reserve the port[1][7]

2. Atomic port allocation for servers

For HTTP servers, create and bind the socket first, then pass it to the server:
from http.server import HTTPServer

# Create and bind socket first
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(('', 0))
port = sock.getsockname()[1]

# Create server using existing socket
server = HTTPServer(('', port), YourHandlerClass, bind_and_activate=False)
server.socket = sock
server.server_bind = lambda: None  # Disable default binding
server.server_activate()
This prevents race conditions by keeping the port reserved[1][7].

3. Using testing frameworks

For pytest integration, pytest-httpserver automatically handles port allocation:
def test_server(httpserver):
    httpserver.expect_request("/test").respond_with_data("OK")
    # Access via httpserver.port
    assert requests.get(httpserver.url_for("/test")).status_code == 200
Automatically selects and reserves a port[5][10]

No manual port management required[10]

4. Built-in Python HTTP server

When using python -m http.server, specify 0 to get an OS-assigned port:
python -m http.server 0
Serving HTTP on 0.0.0.0 port 12345  # Random port shown here
Port is printed after server starts[8][9]

No race condition since server keeps port occupied[9]

Method Best Use Case Race Condition Risk

Socket binding Quick port checks High

Atomic server setup Production-grade servers None

pytest-httpserver Test environments None

Built-in HTTP server Manual testing None

For testing scenarios, Method 3 (pytest-httpserver) is recommended as it handles port management automatically. For manual server creation, Method 2 provides the safest approach by reserving the port before server initialization[1][5][9].

Citations:

1: https://gist.github.com/gabrielfalcao/20e567e188f588b65ba2

2: https://web-platform-tests.org/running-tests/from-local-system.html

3: https://gist.github.com/betrcode/0248f0fda894013382d7

4: https://www.scivision.dev/get-available-port-for-shell-script/

5: https://pytest-httpserver.readthedocs.io

6: https://www.redhat.com/en/blog/test-tcp-python-scapy

7: https://python-forum.io/thread-11375.html

8: https://developer.mozilla.org/en-US/docs/Learn_web_development/Howto/Tools_and_setup/set_up_a_local_testing_server

9: https://realpython.com/python-http-server/

10: https://dev.to/bowmanjd/two-methods-for-testing-https-api-calls-with-python-and-pytest-and-also-communicating-with-the-in-laws-388n
Dynamic Port Allocation Suggestion

Currently, the test uses a hardcoded port (8080) which could lead to conflicts if that port is already in use on the test system. Would you consider implementing a dynamic port finder to improve reliability? For example, you might replace the fixed port with something like:
-    httpd = HTTPServer(("localhost", 8080), TestServer)
+    def find_free_port():
+        import socket
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            s.bind(('', 0))
+            return s.getsockname()[1]
+    
+    port = find_free_port()
+    httpd = HTTPServer(("localhost", port), TestServer)
This approach, as verified by current best practices, should help avoid potential test failures. Alternatively, if you’re concerned about race conditions (i.e., the port might be reclaimed before the server starts), you might consider binding the socket first and passing it to the server for an atomic allocation. What do you think?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5364059 and e4311af.

📒 Files selected for processing (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (8)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-amplitude' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Analyze (python)
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Pytest (Fast)

🔇 Additional comments (2)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)

207-212: Clean implementation of the test server.

The TestServer class is well-structured and focused on its single responsibility - providing a simple CSV response. Nice job!

214-224: Great documentation on the test case background and motivation.

The comprehensive docstring really helps explain why this test is necessary and the history of the issue. This context is invaluable for future maintainers.

unit_tests/sources/declarative/decoders/test_composite_decoder.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
226-226: Consider implementing dynamic port allocation.

The hardcoded port (8080) might cause test failures if that port is already in use by another process. Would you consider implementing a utility to find an available port dynamically? This would make the test more robust in CI environments and when running locally.
- httpd = HTTPServer(("localhost", 8080), TestServer)
+ import socket
+ 
+ def find_available_port():
+     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+         s.bind(('localhost', 0))
+         return s.getsockname()[1]
+ 
+ port = find_available_port()
+ httpd = HTTPServer(("localhost", port), TestServer)
wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4311af and a79256f.

📒 Files selected for processing (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Analyze (python)

🔇 Additional comments (5)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (5)

7-9: Well-chosen imports for the new test.

The addition of BaseHTTPRequestHandler, HTTPServer, and Thread imports are appropriate for implementing the local HTTP server test. This approach allows for testing with actual HTTP responses rather than mocks.

207-212: Nice implementation of the TestServer class.

The TestServer implementation is focused and concise. I notice you're intentionally providing a CSV without a trailing newline character to reproduce the specific issue mentioned in the test docstring.

214-224: Excellent detailed docstring explaining the test purpose and issue context.

The docstring thoroughly explains the context, the problem being solved, and the reasoning behind the specific test implementation. This kind of documentation is invaluable for future maintainers who might need to understand why this test exists.

228-236: Great error handling and resource cleanup.

I appreciate the proper thread management with try-finally and thread.join(). This ensures resources are properly cleaned up even if an exception occurs during the test.

230-231: Ensure consistent port reference.

If you implement dynamic port allocation as suggested earlier, don't forget to update the port used in the request URL as well.
- response = requests.get(f"http://localhost:8080", stream=True)
+ response = requests.get(f"http://localhost:{port}", stream=True)
wdyt?

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
1-1: 💡 Verification agent
🧩 Analysis chain

Fix formatting issue flagged by pipeline.

The CI pipeline indicates there's a formatting issue:

Ruff formatting check failed. 1 file would be reformatted.

Would you consider running ruff format on this file to fix the style issues?

🏁 Script executed:
#!/bin/bash
# Check what formatting changes ruff would make
ruff format --diff unit_tests/sources/declarative/decoders/test_composite_decoder.py
Length of output: 581
Please run ruff format on this file to update the quote style – wdyt?

I verified the formatting diff and it shows that the only change required is to switch the quotes in the s.bind call from single quotes to double quotes:

Before:
s.bind(('localhost', 0))

After:
s.bind(("localhost", 0))

Could you run ruff format unit_tests/sources/declarative/decoders/test_composite_decoder.py so that the file meets the CI formatting standards?

🧰 Tools

🪛 GitHub Actions: Linters

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.

🧹 Nitpick comments (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (1)
232-244: Well-structured test with proper resource cleanup.

The test correctly:

Finds a dynamic port

Starts the server in a thread

Uses try-finally to ensure cleanup

Joins the thread with a timeout

This addresses previous review feedback about thread cleanup and error handling. Good job!

One small suggestion though - would you consider adding a small sleep after starting the thread to ensure the server is fully up before making the request? This might prevent occasional flaky test failures, wdyt?
 thread.start()
+import time
+time.sleep(0.1)  # Short delay to ensure server is ready
 try:

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a79256f and 2af9ed2.

📒 Files selected for processing (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (3 hunks)

🧰 Additional context used

🪛 GitHub Actions: Linters

unit_tests/sources/declarative/decoders/test_composite_decoder.py

[error] 1-1: Ruff formatting check failed. 1 file would be reformatted. Run 'ruff format' to fix code style issues in this file.

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-amplitude' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (Fast)
GitHub Check: Analyze (python)

🔇 Additional comments (4)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (4)

7-10: Good addition of necessary imports for the server-based test.

These imports support the new HTTP server implementation and dynamic port finding - both needed for testing with a real server rather than mocked responses.

26-29: Nice implementation of dynamic port finding!

This is a clean solution for finding an available port dynamically, which prevents potential port conflicts during test runs. The type ignore comment is also properly used.

214-219: Simple yet effective test server implementation.

The test server correctly serves a minimal CSV payload. I notice the CSV intentionally doesn't end with a newline character - which helps reproduce the specific edge case you're testing for.

221-231: Thorough docstring explaining the test purpose and reproduction steps.

The docstring clearly explains why this test is needed and the specific issue it's designed to reproduce. The commented-out data sample is helpful for future reference.

maxi297 · 2025-03-06T17:35:57Z

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (2)

228-228: Minor wording issue in docstring.

Looks like there's a small typo in the docstring - "this is the only we were able to reproduce locally" seems to be missing a word. Did you mean "the only way" perhaps? wdyt?

230-230: Consider externalizing the commented test data.

This commented line contains a large CSV string that might make the test harder to read. Would you consider moving it to a constant at the top of the file or to a separate fixture file to improve readability? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2af9ed2 and a28c3b3.

📒 Files selected for processing (1)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (3 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (9)

GitHub Check: Check: 'source-pokeapi' (skip=false)
GitHub Check: Check: 'source-amplitude' (skip=false)
GitHub Check: Check: 'source-shopify' (skip=false)
GitHub Check: Check: 'source-hardcoded-records' (skip=false)
GitHub Check: Pytest (All, Python 3.11, Ubuntu)
GitHub Check: Pytest (Fast)
GitHub Check: SDM Docker Image Build
GitHub Check: Pytest (All, Python 3.10, Ubuntu)
GitHub Check: Analyze (python)

🔇 Additional comments (4)

unit_tests/sources/declarative/decoders/test_composite_decoder.py (4)

26-29: Nice implementation of port finder!

This is a clean and effective way to dynamically find an available port for testing. Using a context manager ensures proper socket cleanup.

214-219: Simple and effective test server implementation!

The test server implementation is concise and serves its purpose well. It correctly responds with a minimal CSV payload that helps reproduce the issue.

237-241: Excellent use of try/finally for cleanup!

Great job implementing the try/finally pattern to ensure proper resource cleanup. This addresses the previous review comment about thread cleanup and error handling.

232-244: Consider adding server readiness check.

The test starts a server in a separate thread but immediately makes a request without ensuring the server is ready. In rare cases, this might lead to intermittent test failures.

Would you consider adding a simple readiness check before making the request? Something like:
thread.start()
+# Ensure server is ready before making request
+def is_server_ready():
+    try:
+        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
+            return s.connect_ex(('localhost', port)) == 0
+    except:
+        return False
+
+# Wait for server to be ready with timeout
+import time
+start_time = time.time()
+while not is_server_ready() and time.time() - start_time < 3:
+    time.sleep(0.1)
+
try:
    response = requests.get(f"http://localhost:{port}", stream=True)
This would make the test more robust against timing issues, wdyt?

ChristoGrab added 2 commits February 24, 2025 14:53

chore: read CSV in-memory

3e1a5fb

feat: prevent csv raw response from auto-closing during sync

bb470d8

ChristoGrab temporarily deployed to DockerHub February 25, 2025 18:11 — with GitHub Actions Inactive

ChristoGrab temporarily deployed to PyPi February 25, 2025 18:11 — with GitHub Actions Inactive

whoops remove in-memory read

92c1914

ChristoGrab temporarily deployed to DockerHub February 25, 2025 18:30 — with GitHub Actions Inactive

ChristoGrab marked this pull request as ready for review February 25, 2025 19:09

ChristoGrab changed the title ~~refactor: prevent raw decoder from closing before sync completion~~ fix: prevent raw decoder from closing before sync completion Feb 25, 2025

coderabbitai bot reviewed Feb 25, 2025

View reviewed changes

coderabbitai bot approved these changes Feb 25, 2025

View reviewed changes

maxi297 requested a review from artem1205 March 4, 2025 20:32

add test

06ff322

github-actions bot added bug Something isn't working security labels Mar 4, 2025

document test better

8fae49e

coderabbitai bot reviewed Mar 4, 2025

View reviewed changes

Auto-fix lint and format issues

f276a6e

maxi297 reviewed Mar 4, 2025

View reviewed changes

unit_tests/sources/declarative/decoders/test_composite_decoder.py Show resolved Hide resolved

maxi297 reviewed Mar 4, 2025

View reviewed changes

unit_tests/sources/declarative/decoders/test_composite_decoder.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 4, 2025

View reviewed changes

artem1205 approved these changes Mar 5, 2025

View reviewed changes

Improve documentation on the issue

5364059

coderabbitai bot requested changes Mar 5, 2025

View reviewed changes

unit_tests/sources/declarative/decoders/test_composite_decoder.py Outdated Show resolved Hide resolved

Update unit_tests/sources/declarative/decoders/test_composite_decoder.py

db951b2

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot approved these changes Mar 5, 2025

View reviewed changes

Auto-fix lint and format issues

e4311af

coderabbitai bot requested changes Mar 5, 2025

View reviewed changes

unit_tests/sources/declarative/decoders/test_composite_decoder.py Show resolved Hide resolved

unit_tests/sources/declarative/decoders/test_composite_decoder.py Outdated Show resolved Hide resolved

Fix ai suggestion

a79256f

coderabbitai bot reviewed Mar 5, 2025

View reviewed changes

maxi297 mentioned this pull request Mar 6, 2025

fix: (CDK) (AsyncRetriever) - Use the Nested Decoders to decode the streaming responses, instead of ResponseToFileExtractor #378

Merged

coderabbitai bot approved these changes Mar 6, 2025

View reviewed changes

find port dynamically

2af9ed2

coderabbitai bot reviewed Mar 6, 2025

View reviewed changes

Auto-fix lint and format issues

a28c3b3

coderabbitai bot reviewed Mar 6, 2025

View reviewed changes

maxi297 merged commit fe2f9a5 into main Mar 6, 2025
24 checks passed

maxi297 deleted the christo/csv-sync-fix branch March 6, 2025 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent raw decoder from closing before sync completion #366

fix: prevent raw decoder from closing before sync completion #366

ChristoGrab commented Feb 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2025 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

maxi297 commented Mar 4, 2025

maxi297 commented Mar 4, 2025 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

coderabbitai bot left a comment

artem1205 commented Mar 5, 2025

artem1205 commented Mar 5, 2025

artem1205 left a comment •

edited

Loading

coderabbitai bot left a comment

maxi297 commented Mar 5, 2025 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

1. Using sockets with OS-assigned ports

2. Atomic port allocation for servers

3. Using testing frameworks

4. Built-in Python HTTP server

coderabbitai bot left a comment

coderabbitai bot left a comment

maxi297 commented Mar 6, 2025 •

edited by github-actions bot

Loading

coderabbitai bot left a comment

Method	Best Use Case	Race Condition Risk
Socket binding	Quick port checks	High
Atomic server setup	Production-grade servers	None
pytest-httpserver	Test environments	None
Built-in HTTP server	Manual testing	None

fix: prevent raw decoder from closing before sync completion #366

fix: prevent raw decoder from closing before sync completion #366

Conversation

ChristoGrab commented Feb 25, 2025 • edited by coderabbitai bot Loading

What

Summary by CodeRabbit

Summary by CodeRabbit

coderabbitai bot commented Feb 25, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

maxi297 commented Mar 4, 2025

maxi297 commented Mar 4, 2025 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

artem1205 commented Mar 5, 2025

artem1205 commented Mar 5, 2025

artem1205 left a comment • edited Loading

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

maxi297 commented Mar 5, 2025 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

1. Using sockets with OS-assigned ports

2. Atomic port allocation for servers

3. Using testing frameworks

4. Built-in Python HTTP server

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

maxi297 commented Mar 6, 2025 • edited by github-actions bot Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

ChristoGrab commented Feb 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

maxi297 commented Mar 4, 2025 •

edited by github-actions bot

Loading

artem1205 left a comment •

edited

Loading

maxi297 commented Mar 5, 2025 •

edited by github-actions bot

Loading

maxi297 commented Mar 6, 2025 •

edited by github-actions bot

Loading